EIP-7830: Contract size limit increase for EOF

,

Discussion topic for EIP-7830.

Update Log

External Reviews

None as of 2024-11-29.

Outstanding Issues

None as of 2024-11-29.

1 Like

Prior relevant discussion topics:

1 Like

Spreadsheet calculating costs to deploy.

summary:
24KiB today - ~ 5.4MGas
64KiB under EIP-7623 - ~ 15.8M gas
Largest contract in 30MB - ~ 121.9 KiB
Largest contract in 36MB - ~ 146.9 KiB

JUMPDEST-analysis is required for legacy contracts, and many of the algorithms performing it are not linear and/or have unknown unknowns

I have seen this claim that JUMPDEST analysis is non-linear but can someone explain why it would be?

The code below appears to implement this and it is linear.

From a strict algorithm analysis perspective it is O(n), but there are in essence multiple linear aspects with different “constants” that the interaction varies so much based on the content it appears non-linear because we are magnifying one of the linear cases to a level not seen in typical organic use.

Typical contracts are full of non-jumpdest/non-push targets. The best attack contracts were exclusively jumpdest, exclusively push, or a deep mix of both jumpdest and push of various sizes. There was not a single algorithm that didn’t exhibit a large computational cost jump between all jumps and all jumpdests (algos boil down to you either mark safe or unsafe jumps, and the cost was in the marking). So a good attack vector was all jumpdest/push1 where the immediate is jumpdest. This particular algorithm cited is weak against the all jumpdest case, but excels in the all push(jumpdest) case.

Since we now charge a nominal gas per-32-bytes for CREATE[2] now this scenario puts the worst attack scenario for each client in line with the GPS (gas per second) of other operations, and is no longer a better denial-of-service vector than “SHA3 lots of data,” which has been one point to anchor GPS against. Without that cost executing CREATEs in a tight loop with a good attack contract significantly degrades GPS to an unacceptable level.

1 Like

Thanks, I think I understand the situation now. I think a summary could be useful for others:

  • A JUMP instruction can only be executed if the target is a JUMPDEST. To know if that is the case requires analyzing all of the code up to the target. This is necessary to recognize PUSH instructions and skip over their immediates, which could otherwise be confused for JUMPDESTs.
  • This is called JUMPDEST analysis.
  • Originally this analysis was not directly reflected in gas metering. Specifically, a CREATE instruction could trigger the analysis all of its init code for a gas cost independent of init code size. The init code has to be generated dynamically and repeatedly reused for the worst effect.
  • This was fixed in EIP-3860 in Shanghai (2023).
  • Asymptotically the cost of this analysis is linear in the code size. However, in concrete terms there can be a significant difference between the worst, best, and average cases.
  • Resource pricing and limits have to consider the worst case, as this is what an attack would abuse. As a result, the average case is unfairly penalized with higher costs and/or lower limits.
  • EOF improves the situation further by making it impossible to generate code dynamically, allowing analysis to be done once at deploy time and never at run time, eliminating the above worst case attack vector, and enabling an increase of code size limit.
1 Like

I haven’t caught up on all the discussion but something about this seems really off to me. We’re talking about the costs incurred by execution clients, right? And the concern is that a linear pass through all contract code (to compute a pc to instruction map) before message calling a contract – a map whose computation can be cached by the way – is too expensive? This is surprising to me, so I want to check it before getting more into the details of performance trade-offs strictly below the regime of “a linear pass is ok”.

1 Like

It’s also about when the pass occurs. For EOF it occurs at contract creation, and at no other time, and all metadata needed for execution is encoded in the container.

For legacy the jumpdests are metadata derived from the code, a list of which jumpdests are valid and which are not. Clients need do generate this meatata prior to execution. Naïve implementations will check as part of the CALL, most clients cache once calculated. Clients could persist this cache but I am not aware of any that do. Initcode is particularly tricky because there is a very high likelyhood it will never be executed again.

So while the cost is linear with size the attack modes are impactful. Clients would need to commit to disk caching jumpdest analysis to remain on the safe side.

So clients can easily handle non-EOF here by doing that caching as you said. I think this removes the argument here that EOF is needed for increasing the code size limit.

My impression so far although I’m not entirely up to date is that almost every pro-EOF argument I see has some similarly weak justification. I am concerned about having to implement EOF (I’m maintaining a formal EVM spec) when there isn’t actually a good reason for it.

One thing to note is that clients caching the jumptables is something the protocol does not require, whereas EOF validation is something the protocol requires.

Which EVM spec are you maintaining? You will need to add a new section about container validation, but the remainder of the changes are mostly tweaks on existing opcodes to support the design goals of things such as removing dynamic jumps, making stronger guarantees about the operand stack, removing code introspection and removing gas introspection.

The attacks use CREATE, the bytecode can be generated on the fly, there is no caching that can be done, at best heuristics. The chain needs to protect against worst case scenario so the parameters (size limit) are tuned for that. I don’t see how caching mitigates that attack.

As far as I can tell, pricing JUMPDEST analysis on CREATE more accurately is the only real mitigation.

Verifereum, a formal EVM spec in higher-order logic.

Since we can’t actually get rid of legacy code, the spec (and all the clients for that matter) needs to implement all the things EOF is removing anyway, and then additionally implement all the EOF checks etc. It just doesn’t improve things as far as I can tell the way it would if we were actually making breaking changes. But a blockchain is not a good place to make breaking changes - better to build better tooling around the existing EVM in my opinion.

OK this might be true (I haven’t independently analysed whether this attack is a serious threat, but I am happy to assume it is for now), in which case I would support repricing as the way to address it.

I’m not sure I understand this. It’s still just a single pass to compute the locations of all the jumpdests in a contract, no?

Are these results actually recorded anywhere? The algorithm looks single pass. Just iterate over all the bytes in the bytecode, skip PUSH opcodes and mark jumpdests.

If the question is how big of an array you need to store all the jumpdests, this is just a matter of pricing. And it seems that it’s already paid for by the initcode cost of 2 gas per 32 bytes anyways. (If the argument is that this cost is too low, maybe with some execution resource usage numbers, we could raise it to 3 gas per 32 bytes or something).

1 Like

If there was an algorithm where the cost was independent of the code bieng analyzed, does this change anything?

I’ve performed benchmarks of JUMPDEST analysis. The results are fairly similar for both PUSH1-heavy and JUMPDEST-heavy bytecodes. The benchmarks show a strong linear correlation between analysis time and bytecode size.

The max cost is ~46.5ns per word (on a 5-year old laptop), which is arguably a bit high (and like I said, could be addressed by the gas schedule), but I didn’t spend any time optimizing the code, I just wanted to demonstrate linearity.

1 Like

btw there is a nice png in there too for those who don’t have the time to click into the repo:

1 Like

Draft EIP to simply remove the jumpdest analysis

Where’s the discussion top in “remove jumpdest analysis?” Several of my Ipsilon teammates think it would be dangerous and reckless, but I don’t want to derail this EIP’s discussion.

2 Likes