EIP-4750: EOF Functions

This is the discussion topic for

We have considered alternative formats for representing section types in EOF container, but decided not to overload EIP text with it. Hereā€™re some ideas:

0. Version proposed in EIP: one type section with types for all code sections

format, magic, version, type_section_header, (code_section_header)+, [data_section_header], 0, type_section_contents, (code_section_contents)+, [data_section_contents]
type_section_header := 3, number_of_code_sections * 2 # section kind and size
type_section_contents := 0, 0, code_section_1_inputs, code_section_1_outputs, code_section_2_inputs, code_section_2_outputs, ..., code_section_n_inputs, code_section_n_outputs

1. Version with multiple type sections, one type section per each code section:

format, magic, version, (type_section_header)+, (code_section_header)+, [data_section_header], 0, (type_section_contents)+, (code_section_contents)+, [data_section_contents]
type_section_header := 3, 2 // section kind and size
type_section_0_contents := 0, 0
type_section_n_contents := code_section_n_inputs, code_section_n_outputs

This is less compact than proposed version, requiring n bytes just for each type section kind.

2. Version where all types are encoded inline in type section headers, removing the need for type section contents:

format, magic, version, (type_section_header)+, (code_section_header)+, [data_section_header], 0, (code_section_contents)+, [data_section_contents]
type_section_n_header := 3, code_section_n_inputs, code_section_n_outputs

This would be most compact and require fewer reads to get each section type, but it violates section definition from EIP-3540 (its definition requires each section header to contain only section size), so we decided against it for consistency.

3. Version where instead of designated type sections we encode inputs and outputs number as two first bytes of each code section.

In this case it makes sense to introduce a new kind of code section for this - ā€œtyped code sectionā€ - but the first code section would remain regular untyped one:

format, magic, version, code_section_header, (typed_code_section_header)+, [data_section_header], 0, code_section_contents, (typed_code_section_contents)+, [data_section_contents]
typed_code_section_header := 3, size
type_code_section_contents := inputs, outputs, <executable_bytes>

Downsides of this: having more than one kind of code sections might be confusing, having non-code bytes inside code sections would mean we have to be careful to not consider them executable (i.e. code bounds are [section_start+2, section_end]), PC=0 corresponds to offset section_start+2.


Overall we donā€™t feel very strongly about picking one version over others, if anyone has good arguments for alternative format, please let us know.

1 Like

Adding TAILJUMPF or TAILCALLF (with obvious specs, i.e. consumes current stack as arguments, called function has to return the same amount of values as the current one) may be worth a thought as an eventual extension to this.

1 Like

What are the advantages of this approach over EIP-2315? It would seem to be both less efficient and ā€“ by moving each function into its own section ā€“ get in the way of further optimization.

The meta question is, What do want to do with additional code sections? To me they seem most useful as a way of linking in library code as modules with defined interfaces.

Leaving the meta-question asideā€¦

My biggest concern is that we wind up with new exceptional halting conditions (and new machine state and code to enforce them) when Iā€™m trying to get rid of them. However, Iā€™m pretty sure they can be enforced at validation time instead along the lines of EIP-3779.

My second biggest concern is that you canā€™t do tail call optimization. But thatā€™s the price we pay for some useful structure. Thatā€™s part of why Iā€™ve come to like having, in Intelā€™s terminology, both subroutines and procedures. These are well-defined procedures.

  1. Version where instead of designated type sections we encode inputs and outputs number as two first bytes of each code section.

Iā€™d prefer something like this. It could generalize nicely to a more flexible section header.

having non-code bytes inside code sections would mean we have to be careful to not consider them executable

If the first byte is a new opcode the rest can be encoded as the immediate data of that opcode.

I agree it might be less efficient comparing to 2315 because base pointer is saved additionally in the return stack, and this is a price to pay for more runtime correctness guarantees. I.e. 4750 approach guarantees that callee cannot read callerā€™s stack, while 2315 allows this.

And yes, in the future we should be able to get rid of these runtime underflow checks by using 3779-style validation. Then inefficiency goes away, too.

Tail call optimization should be possible with a special new opcode like TAILCALLF as @ekpyron noted above.

Overall 2315 approach is less restricted and I guess allowing more funky optimizations.

And 4750 is more strict, with more runtime checks, which allows for simpler reasoning about bytecode and its structure, fewer edge cases in protocol rules, possibly easier to audit compilersā€™ code.

I can also see both approaches possibly co-existing (less restricted ā€œsubroutinesā€ inside restricted ā€œproceduresā€), if compiler authors would find this complexity worthwhile.

I like this idea more than just bytes with a special meaning inside code section. (but this wastes precious opcode space)

Iā€™ve roughly sketched out an extension to this proposal ā€“ EOF - Modules - HackMD ā€“ that allows for multiple entry points to each code section, mainly by having one type section for each code section. Iā€™ve called these procedures ā€“ per Procedures for the EVM - HackMD ā€“ to distinguish them from the Simple Subroutines for the EVM - HackMD they are built on, and from the EIP-4750 functions defined here.

Iā€™ve made a PR.

I closed this in favor of EIP-5450: EOF - Stack Validation. Thanks!

I still prefer that EOF code sections represent Modules containing multiple procedures rather than being a single Function. This allows for low-level optimizations within a module, but no control flow between modules except via defined interfaces. In my opinion modules provide a more useful level of packaging.

Multiple entry points can also be added in a future upgrade, so they are not at all a showstopper for me. Letā€™s just keep in mind that they do allow for inter-procedural optimizations, which single-entry code sections impede. Modules could also support linking libraries of separately-compiled code sections into programs, which is a traditional purpose of object file formats. Iā€™ve closed this PR.

And 4750 is more strict, with more runtime checks, which allows for simpler reasoning about bytecode and its structure, fewer edge cases in protocol rules, possibly easier to audit compilersā€™ code.

From my point of view leaving checks until runtime makes reasoning more difficult ā€“ you donā€™t know for sure that a program wonā€™t halt in those ways ā€“ but with EIP-5450: EOF - Stack Validation the constraints can mostly be checked at validation time. So I think this proposal should be made to require 5450, and most all of the places that call for an exceptional halt should be changed to use ā€œMUSTā€.

Hey, I notice that this EIP doesnā€™t include any requirement that when using JUMPF the function being jumped to has the same number of outputs as the current function. That seems like it could have some pretty odd results. I suspect there should be such a requirement.

It is validated at deploy-time, see Code Validation section of the spec:

  1. Code section is invalid in case an immediate argument of any JUMPF is such that type[callee_section_index].outputs != type[caller_section_index].outputs, i.e. it is allowed to only jump to functions with the same output type.

Oh, I see, I missed that, thanks!

Deprecating JUMPDEST analysis

For my understanding, does this refer to deprecating the JUMPDEST op-code itself, or just in reference to a change in how Ethereum client-implementations do JUMPDEST analysis?

The JUMPDEST analysis is what is deprecated, replaced with code and stack validaiton.

JUMPDEST becomes a NOOP code inside of EOF code (zero stack impact and no external changes on invocation).

Roger thatā€“makes sense to me, thank you for clarifying! :slight_smile: