PQ on EVM: Stop Mixing Native, ZK and Protocol Enforcement

Clarifying PQ Verification on EVM: We’re Mixing Enforcement Models

Reading recent comments (@paulangusbark, @rdubois-crypto, @SirSpudlington), I think we’re unintentionally conflating fundamentally different enforcement lanes under one “PQ on EVM” umbrella.

Until we separate these lanes explicitly, gas comparisons become structurally misleading.

This isn’t about Falcon vs Dilithium vs ML-DSA.
It’s about architecture.


The Core Problem

Right now in the thread, we’re mixing:

  • Falcon precompile discussions

  • Dilithium (EIP-8051) conversations

  • ML-DSA Solidity POCs

  • ZK-from-PQ constructions

  • AA-layer validation flows

These are not the same enforcement model.
Yet they’re often benchmarked or discussed as if they were.


Three Distinct Enforcement Lanes

:one: L1 Native Solidity Verification

(Measured Upper Bound — Not a Deployment Claim)

This lane answers one question:
What is the ceiling if Ethereum provides zero protocol support?

Example (measured):

  • ML-DSA-65 verify() POC in pure Solidity
    68,901,612 gas

  • Inner primitive (PreA compute_w_fromPacked_A_ntt)
    1,499,354 gas

This is obviously not mainnet-viable.
But now we have a quantified upper bound.

That number turns abstract arguments about “precompile savings” into measurable deltas.

If Falcon512 verifies in ~7–15M gas natively — that’s useful data too.

But this lane must be explicitly tagged:

enforcement_lane = L1_native_upper_bound

Otherwise people interpret these numbers as deployment proposals instead of architectural stress tests.


:two: L1 Realistic Enforcement Today

ZK-from-PQ

On Ethereum mainnet today, enforceable PQ typically means:

  1. PQ signature verified off-chain

  2. ZK proof generated

  3. Ethereum verifies the proof

In this case, the benchmark is:

proof_verification + calldata

This is not PQ signature verification.
It is proof verification.

This lane must be tagged:

enforcement_lane = L1_ZK_from_PQ

Otherwise we compare native-PQ gas to ZK-proof gas — which are entirely different primitives.


:three: L2 / Protocol-Native PQ

If PQ verification is integrated as:

  • a precompile

  • a system contract

  • a protocol primitive

then gas becomes meaningful in a deployment sense.

This lane should be tagged:

enforcement_lane = protocol_native

This is where real architectural optimization happens.

But without knowing the upper bound (lane 1), we don’t know what a precompile actually saves.


Why This Separation Matters

Without explicit enforcement tagging, we risk:

  • Comparing native-PQ gas to ZK-proof gas

  • Confusing stress-tests with deployment targets

  • Optimizing for the wrong architectural layer

  • Mixing AA-surface validation with protocol-level primitives

  • Turning “gas numbers” into misleading signals

The disagreement in this thread isn’t about algorithms.
It’s about which enforcement lane we’re implicitly assuming.


Proposal: Minimal Structural Metadata

If we want comparable PQ benchmarking across Falcon, Dilithium, ML-DSA, hybrids, etc., every benchmark row should include:

  • surface (ERC-1271 / validateUserOp / protocol)

  • wiring_lane (FIPS-SHAKE / Keccak / hybrid)

  • enforcement_lane (L1_native_upper_bound / L1_ZK_from_PQ / protocol_native)

  • optionally key_storage_assumption (software_resident / TPM/HSM compatible)

Only then are comparisons structurally honest.

Without this metadata, we’re benchmarking architectures, not algorithms — but labeling them as algorithm comparisons.


Clarification

To be explicit:

  • Native L1 PQ verification is not viable today.
    That’s precisely why measuring it as an upper bound is useful.

  • Measurement ≠ Deployment.

The goal isn’t to push ML-DSA on L1.
The goal is to make enforcement assumptions explicit.


Open to Alignment

If there’s interest, I’m happy to align on a minimal shared benchmark harness with explicit lane tagging.

The ecosystem doesn’t need competing benchmark threads.
It needs a structurally comparable one.

Well you are right, model shouldn’t be mixed. However i would like to correct some of the claims made here.

  • gas cost is far lower, we have a 4M falcon512 verification, 11M dilithium44 and we will push verification for larger sizes (falcon1024 and dilithium65) under the 16M limit soon.

It is doable and done onchain NOW. We uploaded a pq hybrid account, fully functionnal with a Aave contract as demonstration here:
visionary-nougat-217eaa.netlify.app

For L1 costs is a too high for day to day transaction, but totally ok to protect high value accounts.
Here is an overview of the demonstration during pqts, it is live in Sepolia and Arbitrum Sepolia. On arbitrum it would be an acceptable cost.

1 Like

For the cost, with the 0.05 gwei today (Sunday 8/02, when i post) we have a dilithium tx on L1 cost at 1.25$ and falcon at 0.42$ (must admit we shouldn’t hope gwei to stay at this height).

Gwei won’t stay so low, but Gas cost is supposed to be less and less as L1 scale successfully like the last months.

So we have a first practical solution. Now let’s decrease its cost with precompiles.

Thanks — this is a useful clarification, and I agree with your practical point:
it is already deployable today for high-value accounts, and precompiles are the right direction for cost reduction.

My core point is about benchmark semantics, not denying feasibility.

To avoid confusion, we should keep these lanes explicitly separated when quoting numbers:

  1. native Solidity verification,
  2. ETH-optimized Solidity variants,
  3. protocol precompile verification.

All three are valid, but they represent different enforcement/trust surfaces, so direct gas/cost comparisons can be misleading unless lane-labeled.

If we publish each result with a minimal schema
(algo+params, surface/lane, gas, fee assumptions, chain context, trust boundary),
we get apples-to-apples comparability and a cleaner path to standardization.

So: +1 on “practical now”, +1 on “precompiles next” — and +1 on strict lane labeling so the ecosystem compares the same thing.