EIP-3690: EOF - JUMPDEST Table

This is the discussion topic for

I think it is important that EVM code directly defines EVM execution without the need for external tables. (Consider the difficulty in modifying the Yellow Paper to express this proposal.) Especially tables that the human coder will need fancy tools to encode.

Tables that provide auxiliary information (e.g. useful for validation or optimization,) or that are created at init-time make more sense to me.

Also, it seems the effect of making JUMPDEST undefined is to make almost all existing code undefined. Why can’t we just leave JUMPDEST be?

It would seem to be more useful to introduce the static jumps and jumptables that @gumb0 is working on and the static version of EIP-2315 that I am working on. Dynamic jumps can then be deprecated going forward, jumpdests will be unnecessary, and init-time jumpdest analysis and runtime checking would only be needed for legacy code.

Further, think (correct me if I am wrong) that Solidity almost always uses jumps statically - that is, the value on the top of the stack is constant. And I was telling @holiman recently that,

“I think it actually isn’t that hard to determine which jumps have a constant argument and avoid checking their destination at runtime. Just maintain a stack during validation, marking stack slots created by PUSH or PC as const, carrying the const-ness through swaps and dups, and unmarking when top of stack becomes the result of a non-const operator (arithmetic, calls, gas, etc.).”

If so, then most existing code can be shown at init-time to have only valid jumps, and jumpdest tables can created at init-time for the few exceptions. If desired, programs with invalid jumps can be deprecated so that no jumpdest tables or runtime checking is needed.

If my above comment wasn’t confused and confusing enough, I have more confusion.

Currently existing contracts require no validation of correctness, but every time they are executed, a list must be built containing all the valid jump-destinations. This is an overhead which can be avoided, albeit the effect of the overhead depends on the client implementation.

As I indicated, we won’t need jumpdest tables at all going forward if we deprecate JUMPs and replace them with static jumps and subroutines, and/or do a static analysis to validate jump destinations at init time. (And of course this also saves the cost of checking each jump at runtime.)

Failing that, if we take out JUMPDEST instructions then the only necessarily invalid jump destinations are inside of immediate data - so a (typically smaller?) table of invalid locations is what is needed.

And there may be reasons to leave JUMPDEST instructions in - they do help mark programmer’s intent, make it easier to delimit basic blocks, and serve as a place to charge for gas and stack checking once we can do that again.

Regardless, the cost of validating jumps or of creating a table of invalid or valid jumps is (I think) proportional to the size of the code. So why can’t we just do it at init-time and charge for gas according to the size of the code?

Also, why is the “jumpdests load” column empty for all but the worst case? It seems the table still needs to be decoded, parsed, and validated. And possibly rewritten into whatever form the client prefers to use at runtime.

Finally, this change puts an implicit bound on “initcode analysis” which is now limited to jumpdests section loading of max size of 0xffff. The legacy code remains vulnerable.

It seems that going forward we can explicitly limit the size of the initcode. Does this not take care of the vulnerability? Or am I still not understanding the vulnerability?

Well I think this goal would be limiting too much what we can achieve with EOF. Even the simplest separation of code and data sections (proposed in EIP-3540) already requires for EVM to access something (data section) outside of the executed code.

No, it doesn’t affect any existing code, the new rules are only for EOF-formatted code.

The motivations for making JUMPDEST undefined (as opposed to leaving it noop) are: cleaning up opcode list, eliminating confusion of having both a JUMPDEST section and an opcode, not allowing users to deploy obviously stupid (or at least useless) things, generally having less unneeded flexibility to avoid future backwards compatibility problems.

I think overall this is a valid critique: if we were sure that getting rid of dynamic jumps alltogether was around the corner, then this proposal would be unneded complication. But if we expect that dynamic jumps are going to exist for a while, this proposes an optimizing improvement to how they work.

All this is still possible with this proposal without having JUMPDEST bytes inside code section.

How is this different from what we do now?

The motivation for this proposal is getting rid of repeating the same analysis for each execution.

Yes, this paragraph refers to EIP-3540 encoding code section size as uint64, and therefore implicitly setting the code size limit (and taking care of vulnerability). This doesn’t affect legacy initcode, which still can have unlimited size.

OK - I’m starting to understand.

Simply separating code into sections isn’t the issue - though I’m not sure how the data section will be accessed by the code – via CODECOPY?

The issue is that putting the jumpdests into an encoded table separate from the code is a much less clear way to present the same information as leaving them as operators within the code - less clear for the reader, and more difficult to write. And the only reason we *have * to have jump destinations is to check dynamic jump locations at runtime.

So I do think deprecating or constraining the existing JUMP opcodes is the way to go, and should be the focus.

As for not changing the meaning of JUMPDEST - I’d like to put off the “two interpreters” problem for as long as we can. That is, as much as possible, existing code should not require the interpreter to execute the same opcodes in different ways, and should continue to be valid according to the new rules.

The motivation for this proposal is getting rid of repeating the same analysis for each execution.

Yes. It seems that going forward whatever analysis we do – jumpdests, gas, stack – can be done at creation time rather than at execution time. If I understand things, the initcode is the exception, and I’m suggesting (as has @holiman) that we put an explicit limit on initcode size to bound the vulnerability.

Note: there is another way to solve the need for a jump table, which is to change the PUSH data to LEB128. With the high bit always set – since we know how many bytes we need – so that JUMPDEST will never be encoded. Then the JUMP logic can just check that the destination is a JUMPDEST rather than use a table.

This of course breaks the “two interpreters rule”, but not much at the code level (more complex decoding for pushing, one more test before jumping), and not at all at the assembler level.

We assume the same worst case figure (9.36) for all cases.

1 Like

I think the best way to deal with jumdest taebles is not to have them at all.

I think the JUMPDEST instruction itself serves to help delineate basic blocks, and provides a place for consolidated gas checking.

To pull together my scattered thoughts - I strongly object to this EIP, and to this general approach. It makes code too hard to write, and too hard to read, and does not fit the simple state machine model of EVM execution. We have proposals on the table that make jumpdest tables unnecessary, and I believe those proposals or work in that direction is the way to go.

1 Like

I think jumpdest tables could be useful for wasm-style function tables. If we want to tame dynamic jumps we can associate each such jump with a table of valid destinations. That would make safety analysis tractable.

I’ve added an EOF container section with a jumptable to EIP-3779 which gives valid jump destinations for every dynamic jump. This extends the range of valid programs without sacrificing tractable validation.