EIP-2537 (BLS12 precompile) discussion thread

chfast · October 11, 2024, 9:40am

Can you point to the source code of this?

rodiazet · October 11, 2024, 9:44am

I’ve just verified once again and blst which we used for the benchmarks uses fast group check. (not naive multiplication by group order)
G1: blst/src/map_to_g1.c at master · supranational/blst · GitHub
G2: blst/src/map_to_g2.c at master · supranational/blst · GitHub

Paper:

Can you provide please code you used for the benchmarks?

chfast · October 11, 2024, 1:58pm

There is a flaw in the spec in the context of the BLST library.
In the inputs to multiplication (both MUL and MSM) the spec allows scalars bigger than the main subgroup order q, e.g. 2^256-1. In the BLST implementation this case is handled by slower non-GLV multiplication implementation “added for formal completeness”. In other words, users can bypass GLV multiplication by sending nonsensical inputs.

        if (check_mod_256(val.s, BLS12_381_r))  /* z^4 is the formal limit */
            POINTonE1_mult_glv(out, a, val.s);
        else    /* should never be the case, added for formal completeness */
            POINTonE1_mult_w5(out, a, scalar, nbits);

github.com

supranational/blst/blob/v0.3.13/src/e1.c#L524-L527


      
          if (check_mod_256(val.s, BLS12_381_r))  /* z^4 is the formal limit */
              POINTonE1_mult_glv(out, a, val.s);
          else    /* should never be the case, added for formal completeness */
              POINTonE1_mult_w5(out, a, scalar, nbits);

Found by @rodiazet.

jwasinger · October 11, 2024, 2:16pm

The behavior in gnark is to reduce a given scalar by the curve base field modulus when deserializing it: gnark-crypto/ecc/bls12-381/fr/element.go at master · Consensys/gnark-crypto · GitHub .

I assume implementations should do the same here to avoid that case.

Edit: i have to add this to allow ethmagis to let me repost deleted text…

shamatar · October 11, 2024, 4:51pm

The main motivation for subgroup checks during the discussion was to allow no-brainer use of GLV optimization for multiplication implementation. GLV is very beneficial for e.g. secp256k1 that doesn’t need subgroup checks, but may be inefficient (I’ll give the example at the end why) for curves that require such check and where inputs are not trusted. After extra consideration below I may actually agree that subgroup checks are useless for computational purposes and should become a part of the application that uses such precompiles rather than part of the precompiles themself, of course with the exception of the pairing precompile, where subgroup checks are mandatory as otherwise pairing operation is undefined. But we may open a can of worms for one of the cases of BLS signature aggregation if implementors are not careful enough, and I do not have an upfront answer about some security features there.

Before continuing, for original table generation I used for the EIP was generated via quite convoluted procedure of doing multiparameter fitting for EIP1962 for all BLS12 curve family, and then BLS12-381 parameters were plugged in there. It was done quite a few years ago and indeed didn’t have subgroup checks included into such benchmarks as they were originally not in the scope of the precompile.

Here are the benchmarks I quickly did using eip1962 code

test bench::eip_2537::bench_eip_2537_fast_subgroup_g1                                   ... bench:      47,731.77 ns/iter (+/- 2,404.57)
test bench::eip_2537::bench_eip_2537_multiexp_2                                         ... bench:     272,435.40 ns/iter (+/- 4,215.22)
test bench::eip_2537::bench_eip_2537_multiexp_3                                         ... bench:     328,689.55 ns/iter (+/- 3,044.60)
test bench::eip_2537::bench_eip_2537_multiexp_4                                         ... bench:     373,639.55 ns/iter (+/- 9,547.45)
test bench::eip_2537::bench_eip_2537_multiexp_5                                         ... bench:     424,831.20 ns/iter (+/- 7,974.20)
test bench::eip_2537::bench_eip_2537_multiexp_6                                         ... bench:     459,172.40 ns/iter (+/- 9,935.16)
test bench::eip_2537::bench_eip_2537_multiexp_7                                         ... bench:     494,158.30 ns/iter (+/- 24,647.01)
test bench::eip_2537::bench_eip_2537_multiexp_8                                         ... bench:     529,041.70 ns/iter (+/- 22,740.11)
test bench::eip_2537::bench_eip_2537_part_of_subgroup_check                             ... bench:      40,565.00 ns/iter (+/- 674.59)
test bench::eip_2537::bench_eip_2537_raw_g1_multiplication                              ... bench:     139,856.95 ns/iter (+/- 2,869.93)

And a short explanation of tradeoffs and computational details under the hood for BLS12-381 curve.

First of all, subgroup checks can not be batched, and indeed our MSM gas formulas should have a linear term in them if subgroup checks are mandatory (in the same sense as linear slope after 128 points in the input). Technically how fast subgroup check works: one applies few endomorphisms (technically endomorphism does one field multiplication with X coordinate of the point) few times, subtracts them (EC point subtraction), and then subtracts from such result a product of the original point and scalar [0x0000000055555555, 0x396c8c005555e156] (LE words). This is fixed scalar and it allows optimizations of operations via addition chains, but in general let’s consider it as multiplication by 128 bit (half-width) scalar. You can see how long it takes in the benchmark that does naive double-and-add multiplication by such scalar. So subgroup check speed is dominated by speed of such operation, unless there came even faster check that I’m not aware of. From the benchmarks you can see that fast subgroup check time is almost equal to the multiplication time by such special scalar (bench_eip_2537_part_of_subgroup_check) and that it’s about two times faster that multiplication by u256::MAX (bench_eip_2537_raw_g1_multiplication).

Here I’ll give the example why for computational purposes subgroup check is actually not beneficial by itself indeed. In the degenerate case of multiexp of 1 point (and MSM via Peppinger algorithm will be discounted for large number of points) the best multiplication strategy if we know that point is in the main subgroup (we did subgroup check!) is to apply GLV and then degrade a task of point * 256 bit scalar multiplication to MSM of two (point * 128 bit scalar) pairs. For the single point GLV is usually 30-35% faster than naive multiplication, so we can say that subgroup check is roughly equal to 128 bit multiplication and single point MSM is ~180 bit multiplication, so total cost of single multiplication is roughly as it would be 310 bits scalar by point multiplication. So indeed for BLS12-381 combined it’s still more expensive than single multiplication by 256 bit scalar. For large size MSM if we assume roughly the same amortization scaling for MSM it may be always true that MSM cost without subgroup checks 256 * amortization_fn(number of points) is smaller than cost for the case with subgroup checks 128 * number_of_points + 128 * amortization_fn(number_of_points * 2) (because we can do GLV for MSM too).

But let’s get back to the case of BLS signature verification. There are usually two cases there

every public key is verified upfront with proof of knowledge of exponent (effectively - providing a valid signature for such public key during setup phase, called “proof of possession” in IETF standard). In such case sane implementation will and MUST check that public key is in G1 subgroup, and signature’s aggregated public key is just pk1 + pk2 + ... without randomization. I believe that it’s the mode people will use most
public keys are not verified, and so signature aggregation must be done via randomized linear combination pk1 + r * pk2 + r^2 * pk3 + ... where randomness r would come via commitment to the set of public keys. Here please pay attention - even though resulting linear combination may fall into G1, it doesn’t mean that all public keys are in G1, and probability of that is high due to BLS12-381 curve subgroups’ structure. Application designer must(!) take extra considerations here whether it’s acceptable or not. I couldn’t fine an example why for any sane cause it should be acceptable, but if we do check for subgroups for MSM, we can brick contracts that are poorly written. Other side - if we ignore sugroups, then security of such contracts is questionable. Maybe it’s just wise to not try to solve it at the precompile design layer

shamatar · October 11, 2024, 4:56pm

Blockquote
There is a flaw in the spec in the context of the BLST library.
In the inputs to multiplication (both MUL and MSM ) the spec allows scalars bigger than the main subgroup order q , e.g. 2^256-1 . In the BLST implementation this case is handled by slower non-GLV multiplication implementation “added for formal completeness”. In other words, users can bypass GLV multiplication by sending nonsensical inputs

It’s just another can of worms that can happen - if we do not require subgroup checks, then we may get into the situation when too high level API is called by final implementation and such API (as in the example of blst above) may have assumptions about point being on curve or not. I will also add that it’s much cheaper to reduce the scalar to the proper range than do non-optimized routine there, and caller will have a hard time figuring this. Maybe we should just say that even though ABI allows arbitrary 256 bit scalars, implementations are advised and even may be required to reduce the scalar to canonical range unless they control the internals

mratsim · October 14, 2024, 12:02pm

They actually can, see: constantine/constantine/named/constants/bls12_381_subgroups.nim at 2c3de8738d031394dfb78e3fc839cf5d12e42b92 · mratsim/constantine · GitHub

func isInSubgroup*(P: EC_ShortW[Fp[BLS12_381], G1]): SecretBool =
  ## Returns true if P is in 𝔾1 subgroup, i.e. P is a point of order r.
  ## A point may be on a curve but not on the prime order r subgroup.
  ## Not checking subgroup exposes a protocol to small subgroup attacks.
  ##
  ## Warning ⚠: Assumes that P is on curve
  # Implementation: Scott, https://eprint.iacr.org/2021/1130.pdf
  #   A note on group membership tests for 𝔾1, 𝔾2 and 𝔾T
  #   on BLS pairing-friendly curves
  #   P is in the 𝔾1 subgroup iff ϕ(P) == [-u²](P)
  var t0{.noInit.}, t1{.noInit.}: typeof(P)

  # [-u²]P
  t0.pow_bls12_381_x(P)
  t1.pow_bls12_381_minus_x(t0)

  # ϕ(P)
  t0.x.prod(P.x, BLS12_381.getCubicRootOfUnity_mod_p())
  t0.y = P.y
  t0.z = P.z

  return t0 == t1

func isInSubgroup*(P: EC_ShortW[Fp2[BLS12_381], G2]): SecretBool =
  ## Returns true if P is in 𝔾2 subgroup, i.e. P is a point of order r.
  ## A point may be on a curve but not on the prime order r subgroup.
  ## Not checking subgroup exposes a protocol to small subgroup attacks.
  ##
  ## Warning ⚠: Assumes that P is on curve
  # Implementation: Scott, https://eprint.iacr.org/2021/1130.pdf
  #   A note on group membership tests for 𝔾1, 𝔾2 and 𝔾T
  #   on BLS pairing-friendly curves
  #   P is in the 𝔾1 subgroup iff ψ(P) == [u](P)
  var t0{.noInit.}, t1{.noInit.}: typeof(P)
  t0.pow_bls12_381_x(P) # [u]P
  t1.frobenius_psi(P)   # ψ(P)

  return t0 == t1

A subgroup check is a scalar multiplication by a ~64 or ~128 bit number. Meaning we can use MSM on the subgroup checks to batch many at once and get sub-linear behavior.

Then there is a question of an attacker exploiting batched subgroup checks by forging inputs that cancel each others out when summed. Ideally we scale both side by a random factor local to the implementation that a malicious party cannot now but it’s yet an extra cost. (see Security of BLS batch verification - Cryptography - Ethereum Research )

shamatar · October 14, 2024, 12:59pm

You kind of self-answered why it’s not batchable. Second, keep in mind that non-main BLS12-381 subgroups have small size. So if we assume P to be a generator of the small subgroup (~13 bits or so, I do not remember), and Q - of the main subgroup, then if I submit a point in the form for P + Q and your random challenge is 0 mod small_subgroup_size, then result of (P + Q) * random_scalar = Q * random_scalar, and your test would pass, but point is not in the main subgroup. As subgroup size is small, it’s easy to craft such point.

sylvain · January 30, 2025, 7:43pm

The current version of the EIP mentions :

[…] the existing BN254 precompile that only provides 80 bits of security.

Where this security level comes from? Is there a new optimization of previous attacks? As far as I know the security level was estimated a bit under 100 bits (see draft-irtf-cfrg-pairing-friendly-curves-11)

bbjubjub · August 26, 2025, 6:33pm

Now that this EIP is live on Mainnet, I was able to verify a BLS signature onchain last week! An interesting takeaway that I didn’t completely elaborate on in the blogpost is that due to gas pricing on L1, if you wanted to post signatures frequently, you would be incentivized not to use point compression, which saves 50% bandwidth, because the cost of processing compressed points onchain more than makes up for it. Now I could see how this makes sense in the spirit of EIP-7623 because we don’t want to push the bandwidth trade-off too far, but I’m not completely convinced. Thoughts?