You can use BN254 first, it is included in precompile.
example:
You can use BN254 first, it is included in precompile.
example:
Hi everyone, I saw @ralexstokes tweet on the hash to curve matter a few weeks ago and started an implementation here: GitHub - ethyla/bls12-381-hash-to-curve
Hash to field is already working nicely and passes all tests from the rfc. Costs currently 30.000 - 45.000 gas so could be called “efficient”.
Two issues remain. I can’t test the hash to curve implementation as foundry doesn’t implement the precompiles yet and I found no simple way to setup a local node with them activated to test using hardhat.
Also I can’t clear the cofactor yet for the g2 implementation, as the scalar I need to multiply by is way to big for the precompile. I started looking into the Budroni-Pintore optimization, but I fear it will be very expensive gas wise.
EIP 2537 is defined to clear a cofactor for field-to-curve mappings
Hi all, there was some request above for exposing the BLS12-381 pairing product, and after some discussion with core devs and others, I’m currently leaning towards leaving them out.
Please take a look here with my summary/rationale and please post here or there if I’m missing something: Update EIP-2537: rename PAIRING to PAIRING_CHECK; introduce PAIRING_PRODUCT precomiple by zhenfeizhang · Pull Request #8309 · ethereum/EIPs · GitHub
I’m currently adding EIP-2537 support to Constantine and at the same-time providing metering for the worst-case scenario using the optimized constant-time routines and detailed operation counts.
See benchmarks: EIP-2537 - BLS12-381 precompiles for the EVM by mratsim · Pull Request #368 · mratsim/constantine · GitHub
and detailed operation counts constantine/metering/eip2537.md at eip2537 · mratsim/constantine · GitHub (note: the metering slows down execution by ~5x)
However, while subgroup checks are mandatory for pairings, they are not for:
For addition, this is not a problem for Jacobian coordinates, however if using Projective coordinates, in particular the complete formulas from Complete addition formulas for prime order elliptic curves there might be an assumption of being in the prime order subgroup. This needs test vectors with a point not in subgroup.
For multiplication, endomorphism acceleration is widely used in BLS12-381 libraries, for a minimum of ~30% speedup. It will give wrong result if the point is not in the correct subgroup.
The spec needs to spell out that implementation should not use endomorphism acceleration.
(Or that endomorphism acceleration is mandatory, and a test should be done to check that all implementations, give the same wrong result.)
This needs test vectors with a point not in subgroup.
For multi-scalar multiplication, we will have the same issue as scalar multiplication. This needs test vectors with points not in subgroup.
To be clear because 33% one way is 50% another, it’s 50us vs 75us per operation on my machine.
As an aside, the EIP should pick either the additive notation or the multiplicative notation but not mix both, i.e. replace the name “multiexponentiation” with the “multi-scalar-multiplication” (MSM).
I can open 2 PRs for what I think is the sensible choice:
Tagging @asanso re test vectors.
You saved me a lot of wasted time, thanks.
I picked it up again this week and now have a working implementation. It passes the rfc tests against a modified geth.
Though some more cleanup and optimizations are needed.
Thanks a lot @mratsim for the great analysis.
W,r.t the subgroup check I totally agree with you about the GLV optimization.
Now my take would be to enforce subgroup check for all the operations (see PR 8322).
The analysis done by @MariusVanDerWijden shows though that this would not be possible for BLS12_G1ADD and BLS12_G2ADD (too expensive) but feasible for the other operations.
Now it might be a bit “ugly” to force subgroup check for all but BLS12_G1ADD and BLS12_G2ADD but IMHO it’s the best way to go. Thoughts ?
I’ll split my reply into an analysis and factual part and then my own opinion piece, then open up on something related to EIP-7667 on gas cost.
I see 3 different paths:
Now, for a more informed decision, we’re okay with the pricing of regular scalar-multiplication as double-and-add, but is endomorphism acceleration important for MSM?
When optimizing for G1, on my machines beyond 2^10 points (1024), endomorphisms were slower and so are deactivated: constantine/constantine/math/elliptic/ec_multi_scalar_mul.nim at 976c8bb215a3f0b21ce3d05f894eb506072a6285 · mratsim/constantine · GitHub
One extra remark is that on G1, endomorphism cut the scalar size from 255 bits to 128 bits. On G2, you cut the scalar size from 255 bit to 64 bit, further benchmarks confirm that similarly, there is a threshold, where endomorphism acceleration doesn’t really help.
With endomorphism:
1 If not properly tested, the “no change” approach is likely to lead to state split if people start using point on curve but not in subgroup. If properly tested, the only difficulty is if some libraries do not allow selection of scalar multiplication algorithm and use endomorphism by default.
3 is similar to 1 in the happy path. In the unhappy path, will users respect a “Before calling the scalar-mul precompile, input MUST be subgroup-checked or coming from clear_cofactor
or mapping_to_curve
)”? If not, libraries can’t use the endomorphism accelerated path and we might as well use 1.
2 sounds like the reasonable choice.
Source code: constantine/constantine/math/elliptic/ec_multi_scalar_mul.nim at 976c8bb215a3f0b21ce3d05f894eb506072a6285 · mratsim/constantine · GitHub
Now for pricing the subgroup checks, I’ll use Scott, https://eprint.iacr.org/2021/1130.pdf, but there is a later note from El Housni:
El Housni latest paper on subgroup checks:
Better discussed here probably: EIP-7667: Raise gas costs of hash functions - #6 by mratsim
but here is the gist, some operations are very expensive in zkEVMs and zkVMs like hash functions, and the Verge roadmap endgame is having a snarkified EVM (text: Annotated Ethereum Roadmap and https://twitter.com/VitalikButerin/status/1741190491578810445)
So EIP-7667 proposes to raise gas cost of hash functions. But should BN254 and BLS12-381 also reflect that?
I think decision on whether to enforce or not subgroup checks should be biased on the use modes, namely “is there any use case that doesn’t eventually want to have everything in the main subgroup?”. So far it looks like if we do clear cofactors in “map to curve” implementations (both fp → G1 and fp_2 → G2) then all the cases would be happy to work with points in the subgroup.
So, at this moment I would be 99% for inclusion of subgroup checks into all relevant precompiles (so everything except additions and map-to-curve). Also, it’ll eventually someone from mistake or misuse.
The only legitimate use-case I have for no subgroup check is if several consecutive operations are needed and the subgroup check cost can only be paid once.
For example, verifying KZG commitments would incur that: constantine/constantine/commitments/kzg_polynomial_commitments.nim at 976c8bb215a3f0b21ce3d05f894eb506072a6285 · mratsim/constantine · GitHub.
But this is only a matter of gas pricing.
The non-legitimate use-case would be malicious:
Some benches from Rollcall discussion
original: eip2537_constantine_bench.csv · GitHub
Reproduction
git clone https://github.com/mratsim/constantine
cd constantine
git checkout eip2537
CC=clang nimble bench_eip2537_subgroup_checks_impact
One other things that came out chatting with @dankrad is this. `POINT_EVALUATION_PRECOMPILE_GAS in EIP-4844 is about 50000. Now the POINT_EVALUATION_PRECOMPILE is about 2 pairings. But if we do the equivalent of 2 pairings in EIP-2537 is 151000. Isn’t it really inconsistent?
Thoughts @mratsim @shamatar @ralexstokes ?
In my noisy bench (browser, discord, telegram running), I can confirm your observations.
This is the details of operation cost for BLS12-381 primitives
And this is for EIP-4844:
Pairing is 415us, verify_kzg_proof is 779us.
verify_kzg_proof is indeed a 2-way multi-pairing: constantine/constantine/ethereum_eip4844_kzg.nim at 976c8bb215a3f0b21ce3d05f894eb506072a6285 · mratsim/constantine · GitHub
2*415/779 = 1.065x
hence I concur that having a ratio of 3 between both is way to big of a difference.
However, if we focus at multipairing closely in Constantine, a dedicated multi-pairing function grows much slower.
It grows by about 120us per extra pair so ~30%.
Note: every 8 pairs, there is a slight cost bump as my multi-pairing batch size is 8
That said, in my analysis the cost bump is less than 0.1% per pairing
I can do a repricing proposal based on 30 Mgas/s on my PC.
If wanted I can also compare to BN254 precompile.
here is a repricing, using the number above.
Disclaimers:
Operation + Algorithm | Curve | Throughput (ops/s) | Gas cost (target 30 Mgas/s) | Latency (ns/op) | CPU cycle count (approx) |
---|---|---|---|---|---|
EC Add G1 | EC_Prj[Fp[BLS12_381]] | 2732240.437 | 10.98 | 366 | 1206 |
EC Add vartime G1 | EC_Prj[Fp[BLS12_381]] | 4975124.378 | 6.03 | 201 | 664 |
EC Double G1 | EC_Prj[Fp[BLS12_381]] | 4219409.283 | 7.11 | 237 | 781 |
EC Add G1 | EC_Jac[Fp[BLS12_381]] | 2188183.807 | 13.71 | 457 | 1507 |
EC Add vartime G1 | EC_Jac[Fp[BLS12_381]] | 5780346.821 | 5.19 | 173 | 572 |
EC Double G1 | EC_Jac[Fp[BLS12_381]] | 4566210.046 | 6.57 | 219 | 723 |
EC Add G2 | EC_Prj[Fp2[BLS12_381]] | 943396.226 | 31.80 | 1060 | 3494 |
EC Add vartime G2 | EC_Prj[Fp2[BLS12_381]] | 1869158.879 | 16.05 | 535 | 1763 |
EC Double G2 | EC_Prj[Fp2[BLS12_381]] | 1564945.227 | 19.17 | 639 | 2107 |
EC Add G2 | EC_Jac[Fp2[BLS12_381]] | 810372.771 | 37.02 | 1234 | 4066 |
EC Add vartime G2 | EC_Jac[Fp2[BLS12_381]] | 2207505.519 | 13.59 | 453 | 1493 |
EC Double G2 | EC_Jac[Fp2[BLS12_381]] | 2000000 | 15.00 | 500 | 1648 |
EC ScalarMul 255-bit G1 | EC_Prj[Fp[BLS12_381]] | 16451.427 | 1 823.55 | 60785 | 200212 |
EC ScalarMul vartime 255-bit G1 | EC_Prj[Fp[BLS12_381]] | 19455.253 | 1 542.00 | 51400 | 169301 |
EC ScalarMul 255-bit G1 | EC_Jac[Fp[BLS12_381]] | 16266.511 | 1 844.28 | 61476 | 202489 |
EC ScalarMul vartime 255-bit G1 | EC_Jac[Fp[BLS12_381]] | 20193.861 | 1 485.60 | 49520 | 163106 |
EC ScalarMul 255-bit G2 | EC_Prj[Fp2[BLS12_381]] | 8268.768 | 3 628.11 | 120937 | 398339 |
EC ScalarMul vartime 255-bit G2 | EC_Prj[Fp2[BLS12_381]] | 9304.836 | 3 224.13 | 107471 | 353983 |
EC ScalarMul 255-bit G2 | EC_Jac[Fp2[BLS12_381]] | 8717.332 | 3 441.42 | 114714 | 377842 |
EC ScalarMul vartime 255-bit G2 | EC_Jac[Fp2[BLS12_381]] | 10393.498 | 2 886.42 | 96214 | 316905 |
Miller Loop BLS12 | BLS12_381 | 5241.695 | 5 723.34 | 190778 | 628376 |
Final Exponentiation BLS12 | BLS12_381 | 4379.089 | 6 850.74 | 228358 | 752155 |
Pairing BLS12 | BLS12_381 | 2408.223 | 12 457.32 | 415244 | 1367710 |
Hash to G1 (SSWU - Draft #14) | BLS12_381 | 23078.698 | 1 299.90 | 43330 | 142721 |
Hash to G2 (SSWU - Draft #14) | BLS12_381 | 7504.015 | 3 997.86 | 133262 | 438933 |
Subgroup check | EC_Jac[Fp[BLS12_381]] | 30690.851 | 977.49 | 32583 | 107323 |
Subgroup check | EC_Jac[Fp2[BLS12_381]] | 26205.451 | 1 144.80 | 38160 | 125690 |
Miller Loop BLS12 | BLS12_381 | 5222.423 | 5 744.46 | 191482 | 630693 |
Final Exponentiation BLS12 | BLS12_381 | 4385.138 | 6 841.29 | 228043 | 751118 |
Pairing BLS12 | BLS12_381 | 2408.269 | 12 457.08 | 415236 | 1367683 |
Pairing BLS12 batched: 1 | BLS12_381 | 2408.014 | 12 458.40 | 415280 | 1367824 |
Pairing BLS12 non-batched: 2 | BLS12_381 | 1200.394 | 24 991.79 | 833060 | 2743882 |
Pairing BLS12 batched: 2 | BLS12_381 | 1882.654 | 15 934.95 | 531165 | 1749524 |
Pairing BLS12 non-batched: 3 | BLS12_381 | 801.33 | 37 437.76 | 1247925 | 4110348 |
Pairing BLS12 batched: 3 | BLS12_381 | 1530.618 | 19 599.93 | 653331 | 2151906 |
Pairing BLS12 batched: 4 | BLS12_381 | 1301.712 | 23 046.57 | 768219 | 2530312 |
Pairing BLS12 batched: 5 | BLS12_381 | 1124.613 | 26 675.84 | 889195 | 2928783 |
Pairing BLS12 batched: 6 | BLS12_381 | 992.763 | 30 218.69 | 1007290 | 3317752 |
Pairing BLS12 batched: 7 | BLS12_381 | 885.784 | 33 868.30 | 1128943 | 3718451 |
Pairing BLS12 batched: 8 | BLS12_381 | 804.54 | 37 288.39 | 1242946 | 4093950 |
verify_kzg_proof (EIP-4844) | BLS12_381 serial | 1284.248 | 23 359.97 | 778666 | 2564722 |
Now the gas price I suggest assuming we want to allow mobile CPUs from 10 years ago to be competitive.
Old benchmark of Constantine on i5-5257U (mobile Broadwell 2.7GHz without ADCX, ADOX instructions for bigint): Multipairing by mratsim · Pull Request #165 · mratsim/constantine · GitHub
The CPU is about 2.5x slower (single-threaded) than 7840U with Constantine. We assume that all modern BLS12-381 backend are within 20% of Constantine, so multiply ideal gas cost on 7840U by 3 and round up generously. Also we use the constant-time jacobian coordinates as reference as every library but Constantine, Halo2 and Icicle use Jacobian coordinates.
For pairing cost, the formula used is:
<Miller Loop> * k * 0.8 + <Final exponentiation>
, the 0.8 is due to acceleration tricks when batching multiple miller loops.
Operation + Algorithm | Curve | Throughput (ops/s) | Gas cost (target 30 Mgas/s) | Current gas | Suggested gas (CPU 2015, 2.5x slower, framework 20% slower, => x3) | Subgroup checks cost | Total gas cost |
---|---|---|---|---|---|---|---|
EC Add G1 | Prj G1 | 2732240.437 | 10.98 | ||||
EC Add vartime G1 | Prj G1 | 4975124.378 | 6.03 | ||||
EC Double G1 | Prj G1 | 4219409.283 | 7.11 | ||||
EC Add G1 | Jac G1 | 2188183.807 | 13.71 | 500 | 40 | 40 | |
EC Add vartime G1 | Jac G1 | 5780346.821 | 5.19 | ||||
EC Double G1 | Jac G1 | 4566210.046 | 6.57 | ||||
EC Add G2 | Prj G2 | 943396.226 | 31.80 | ||||
EC Add vartime G2 | Prj G2 | 1869158.879 | 16.05 | ||||
EC Double G2 | Prj G2 | 1564945.227 | 19.17 | ||||
EC Add G2 | Jac G2 | 810372.771 | 37.02 | 800 | 110 | 110 | |
EC Add vartime G2 | Jac G2 | 2207505.519 | 13.59 | ||||
EC Double G2 | Jac G2 | 2000000 | 15.00 | ||||
Subgroup check | Jac G1 | 16451.427 | 1 823.55 | 3000 | 3000 | ||
Subgroup check | Jac G2 | 19455.253 | 1 542.00 | 3500 | 3500 | ||
EC ScalarMul 255-bit G1 | Prj G1 | 16266.511 | 1 844.28 | ||||
EC ScalarMul vartime 255-bit G1 | Prj G1 | 20193.861 | 1 485.60 | ||||
EC ScalarMul 255-bit G1 | Jac G1 | 8268.768 | 3 628.11 | 12000 | 5500 | 3000 | 8500 |
EC ScalarMul vartime 255-bit G1 | Jac G1 | 9304.836 | 3 224.13 | ||||
EC ScalarMul 255-bit G2 | Prj G2 | 8717.332 | 3 441.42 | ||||
EC ScalarMul vartime 255-bit G2 | Prj G2 | 10393.498 | 2 886.42 | ||||
EC ScalarMul 255-bit G2 | Jac G2 | 5241.695 | 5 723.34 | 45000 | 10300 | 3500 | 13800 |
EC ScalarMul vartime 255-bit G2 | Jac G2 | 4379.089 | 6 850.74 | ||||
Miller Loop BLS12 | 2408.223 | 12 457.32 | |||||
Final Exponentiation BLS12 | 23078.698 | 1 299.90 | |||||
Pairing BLS12 | 7504.015 | 3 997.86 | |||||
Hash to G1 (SSWU - Draft #14) | 30690.851 | 977.49 | |||||
Hash to G2 (SSWU - Draft #14) | 26205.451 | 1 144.80 | |||||
Miller Loop BLS12 | 5222.423 | 5 744.46 | 18000 | ||||
Final Exponentiation BLS12 | 4385.138 | 6 841.29 | 20500 | ||||
Pairing BLS12 | 2408.269 | 12 457.08 | 108000 | 38500 | 6500 | 45000 | |
Pairing BLS12 batched: 1 | 2408.014 | 12 458.40 | |||||
Pairing BLS12 non-batched: 2 | 1200.394 | 24 991.79 | |||||
Pairing BLS12 batched: 2 | 1882.654 | 15 934.95 | 151000 | 49300 | 13000 | 62300 | |
Pairing BLS12 non-batched: 3 | 801.33 | 37 437.76 | |||||
Pairing BLS12 batched: 3 | 1530.618 | 19 599.93 | 194000 | 63700 | 19500 | 83200 | |
Pairing BLS12 batched: 4 | 1301.712 | 23 046.57 | 237000 | 78100 | 26000 | 104100 | |
Pairing BLS12 batched: 5 | 1124.613 | 26 675.84 | 280000 | 92500 | 32500 | 125000 | |
Pairing BLS12 batched: 6 | 992.763 | 30 218.69 | 323000 | 106900 | 39000 | 145900 | |
Pairing BLS12 batched: 7 | 885.784 | 33 868.30 | 366000 | 121300 | 45500 | 166800 | |
Pairing BLS12 batched: 8 | 804.54 | 37 288.39 | 409000 | 135700 | 52000 | 187700 | |
verify_kzg_proof (EIP-4844) | 1284.248 | 23 359.97 | 50000 | 70000 | (priced in) | 70000 |
Agree with the premise of this, we should bring the pricing of these in-line with current KZG proof costs.
I notice that unlike EIP-196, EIP-2537 does not allow truncated / virtually padded with zeros.
Encoding
Field elements and scalars are encoded as 32 byte big-endian numbers. Curve points are encoded as two field elements
(x, y)
, where the point at infinity is encoded as(0, 0)
.Tuples of objects are encoded as their concatenation.
For both precompiled contracts, if the input is shorter than expected, it is assumed to be virtually padded with zeros at the end (i.e. compatible with the semantics of the
CALLDATALOAD
opcode). If the input is longer than expected, surplus bytes at the end are ignored.The length of the returned data is always as specified (i.e. it is not “unpadded”).
Is this intentional? I wonder if we can make a retroactive EIP to enshrine the same for EIP-196.
I finished fully implemented EIP-2537 in Constantine in EIP-2537 - BLS12-381 precompiles for the EVM by mratsim · Pull Request #368 · mratsim/constantine · GitHub and now can provide accurate benchmarks that take into account (de)serialization and other miscellaneous costs.
First on AMD Ryzen 7840U (2023, low power / ultraportable version of their flagship 7840U, TDP 28W down to 15W, 3.3GHz turbo 5.1GHz) with assembly and Clang.
Second on i5-5257U (Macbook Pro 2015, low power dual core Broadwell, TDP 15W down to 7.5W, 2.3GHz turbo 2.9GHz) without assembly (it doesn’t have ADOX/ADCX instructions as well), but we do use the add-with-carry intrinsics. This is a 9 year old x86 CPU.
Finally on a Raspberry Pi 4 (2019, TDP 10W, 1.5GHz) without assembly, and without any add-with-carry intrinsics. No add-with-carry makes bigint operations up to 3x more costly has add-with-carry need to be replaced by main addition, comparison/carry check, adding the carry.
See also compiler codegen Compiler Explorer and Tracking compiler inefficiencies · Issue #357 · mratsim/constantine · GitHub
This remark is confirmed with end-to-end test.
Besides MSMs all operations can achieve 60Mgas/s on a 9 year old very low-powered CPU (15W!), without assembly. and even 230Mgas/s for mapping to G2. This strongly suggests that those operations are overcosted.
I think that 9 year old, 15W CPU is significantly outdated and achieving 20Mgas/s without assembly on it should be fine.
Hence I suggest the following price changes:
Primitive | Current price | Suggested price | Ryzen 7840U 28W 3.3GHz turbo 5.1GHz - clang+ASM - current | Ryzen 7840U 28W 3.3GHz turbo 5.1GHz - clang+ASM - repriced | Intel i5-5257U 15W 2.3GHz turbo 2.8GHz - clang+noASM+intrinsics - current | Intel i5-5257U 15W 2.3GHz turbo 2.8GHz - clang+noASM+intrinsics - repriced |
---|---|---|---|---|---|---|
BLS12_G1ADD | 500 | 125 | 185.60 MGas/s | 46.4 | 79.24 MGas/s | 19.81 |
BLS12_G1MUL | 12000 | 3800 | 144.80 MGas/s | 45.9 | 52.94 MGas/s | 16.8 |
BLS12_G2ADD | 800 | 170 | 218.28 MGas/s | 46.325 | 86.30 MGas/s | 18.33 |
BLS12_G2MUL | 45000 | 6100 | 346.23 MGas/s | 46.9 | 113.23 MGas/s | 15.35 |
BLS12_MAP_FP_TO_G1 | 5500 | 1600 | 161.09 MGas/s | 46.86 | 59.91 MGas/s | 17.43 |
BLS12_MAP_FP2_TO_G2 | 75000 | 5000 | 702.17 MGas/s | 46.8 | 231.85 MGas/s | 15.46 |
The rationale for those changes is targeting a CPU 2~3 years older than my own for 30Mgas/s, CPUs are progressing on average by 10~15%/year, something that is 66% perf is reasonable (so around 45Mgas/s). But that said that CPU is still a low-power CPU, so instead of targeting 45Mgas on it we can target 40Mgas or even 35Mgas.
In any case it is clear that G2 functions are severely overcosted, especially mapping to G2.
For pairings, the evolution of cost for multipairing is stable but the price is 5x the actual cost.
Hence I suggest moving from 43000*k + 65000
to 8600*k + 13000
The discounts are too steep and need adjustment.
Primitive | Current cost | Suggested cost | Ryzen 7840U 28W 3.3GHz turbo 5.1GHz - clang+ASM - current | Ryzen 7840U 28W 3.3GHz turbo 5.1GHz - clang+ASM - repriced | Intel i5-5257U 15W 2.3GHz turbo 2.8GHz - clang+noASM+intrinsics - current | Intel i5-5257U 15W 2.3GHz turbo 2.8GHz - clang+noASM+intrinsics - repriced |
---|---|---|---|---|---|---|
G1MSM 2 | 2 * 12000 * 888 / 1000 = 21312 | 2 * 3800 * 999 / 1000 = 7592 | 121.74 Mgas/s | 43.37 Mgas/s | 39.46 Mgas/s | 14.06 Mgas/s |
G1MSM 4 | 4 * 12000 * 641 / 1000 = 30768 | 4 * 3800 * 919 / 1000 = 13969 | 102.15 Mgas/s | 46.38 Mgas/s | 31.77 Mgas/s | 14.42 Mgas/s |
G1MSM 8 | 8 * 12000 * 453 / 1000 = 43488 | 8 * 3800 * 813 / 1000 = 24715 | 81.67 Mgas/s | 46.41 Mgas/s | 28.10 Mgas/s | 15.97 Mgas/s |
G1MSM 16 | 16 * 12000 * 334 / 1000 = 64128 | 16 * 3800 * 728 / 1000 = 44262 | 67.23 Mgas/s | 46.40 Mgas/s | 22.78 Mgas/s | 15.72 Mgas/s |
G1MSM 32 | 32 * 12000 * 268 / 1000 = 102912 | 32 * 3800 * 668 / 1000 = 81229 | 59.01 Mgas/s | 46.58 Mgas/s | 20.42 Mgas/s | 16.12 Mgas/s |
G1MSM 64 | 64 * 12000 * 221 / 1000 = 169728 | 64 * 3800 * 630 / 1000 = 153216 | 51.40 Mgas/s | 46.40 Mgas/s | 18.18 Mgas/s | 16.41 Mgas/s |
G1MSM 128 | 128 * 12000 * 174 / 1000 = 267264 | 128 * 3800 * 602 / 1000 = 292813 | 42.37 Mgas/s | 46.42 Mgas/s | 15.22 Mgas/s | 16.65 Mgas/s |
G2MSM 2 | 2 * 45000 * 888 / 1000 = 79920 | 2 * 6100 * 999 / 1000 = 12188 | 274.23 Mgas/s | 41.82 Mgas/s | 85.47 Mgas/s | 13.03 Mgas/s |
G2MSM 4 | 4 * 45000 * 641 / 1000 = 115380 | 4 * 6100 * 919 / 1000 = 22424 | 168.31 Mgas/s | 32.71 Mgas/s | 69.02 Mgas/s | 13.41 Mgas/s |
G2MSM 8 | 8 * 45000 * 453 / 1000 = 163080 | 8 * 6100 * 813 / 1000 = 39674 | 177.98 Mgas/s | 42.77 Mgas/s | 55.77 Mgas/s | 13.57 Mgas/s |
G2MSM 16 | 16 * 45000 * 334 / 1000 = 240480 | 16 * 6100 * 728 / 1000 = 71053 | 150.39 Mgas/s | 44.43 Mgas/s | 48.60 Mgas/s | 14.36 Mgas/s |
G2MSM 32 | 32 * 45000 * 268 / 1000 = 385920 | 32 * 6100 * 668 / 1000 = 130394 | 136.80 Mgas/s | 46.22 Mgas/s | 44.21 Mgas/s | 14.94 Mgas/s |
G2MSM 64 | 64 * 45000 * 221 / 1000 = 636480 | 64 * 6100 * 630 / 1000 = 245952 | 119.28 Mgas/s | 46.09 Mgas/s | 39.68 Mgas/s | 15.33 Mgas/s |
G2MSM 128 | 128 * 45000 * 174 / 1000 = 1002240 | 128 * 6100 * 602 / 1000 = 470042 | 101.59 Mgas/s | 47.64 Mgas/s | 33.94 Mgas/s | 15.92 Mgas/s |
EIP-4844 introduced a point evaluation precompile that is mostly a double pairing at 50000 gas: EIP-4844: Shard Blob Transactions
On my machine that is 1283.318 operations/s so 64.17Mgas/s
I can introduce benchmarks later but they seem mispriced by a factor over 10x for multiplication and about 10x for pairings.