EIP-4750: EOF Functions

This is the discussion topic for

1 Like

We have considered alternative formats for representing section types in EOF container, but decided not to overload EIP text with it. Here’re some ideas:

0. Version proposed in EIP: one type section with types for all code sections

format, magic, version, type_section_header, (code_section_header)+, [data_section_header], 0, type_section_contents, (code_section_contents)+, [data_section_contents]
type_section_header := 3, number_of_code_sections * 2 # section kind and size
type_section_contents := 0, 0, code_section_1_inputs, code_section_1_outputs, code_section_2_inputs, code_section_2_outputs, ..., code_section_n_inputs, code_section_n_outputs

1. Version with multiple type sections, one type section per each code section:

format, magic, version, (type_section_header)+, (code_section_header)+, [data_section_header], 0, (type_section_contents)+, (code_section_contents)+, [data_section_contents]
type_section_header := 3, 2 // section kind and size
type_section_0_contents := 0, 0
type_section_n_contents := code_section_n_inputs, code_section_n_outputs

This is less compact than proposed version, requiring n bytes just for each type section kind.

2. Version where all types are encoded inline in type section headers, removing the need for type section contents:

format, magic, version, (type_section_header)+, (code_section_header)+, [data_section_header], 0, (code_section_contents)+, [data_section_contents]
type_section_n_header := 3, code_section_n_inputs, code_section_n_outputs

This would be most compact and require fewer reads to get each section type, but it violates section definition from EIP-3540 (its definition requires each section header to contain only section size), so we decided against it for consistency.

3. Version where instead of designated type sections we encode inputs and outputs number as two first bytes of each code section.

In this case it makes sense to introduce a new kind of code section for this - “typed code section” - but the first code section would remain regular untyped one:

format, magic, version, code_section_header, (typed_code_section_header)+, [data_section_header], 0, code_section_contents, (typed_code_section_contents)+, [data_section_contents]
typed_code_section_header := 3, size
type_code_section_contents := inputs, outputs, <executable_bytes>

Downsides of this: having more than one kind of code sections might be confusing, having non-code bytes inside code sections would mean we have to be careful to not consider them executable (i.e. code bounds are [section_start+2, section_end]), PC=0 corresponds to offset section_start+2.

Overall we don’t feel very strongly about picking one version over others, if anyone has good arguments for alternative format, please let us know.

1 Like

Adding TAILJUMPF or TAILCALLF (with obvious specs, i.e. consumes current stack as arguments, called function has to return the same amount of values as the current one) may be worth a thought as an eventual extension to this.

1 Like

What are the advantages of this approach over EIP-2315? It would seem to be both less efficient and – by moving each function into its own section – get in the way of further optimization.

The meta question is, What do want to do with additional code sections? To me they seem most useful as a way of linking in library code as modules with defined interfaces.

Leaving the meta-question aside…

My biggest concern is that we wind up with new exceptional halting conditions (and new machine state and code to enforce them) when I’m trying to get rid of them. However, I’m pretty sure they can be enforced at validation time instead along the lines of EIP-3779.

My second biggest concern is that you can’t do tail call optimization. But that’s the price we pay for some useful structure. That’s part of why I’ve come to like having, in Intel’s terminology, both subroutines and procedures. These are well-defined procedures.

  1. Version where instead of designated type sections we encode inputs and outputs number as two first bytes of each code section.

I’d prefer something like this. It could generalize nicely to a more flexible section header.

having non-code bytes inside code sections would mean we have to be careful to not consider them executable

If the first byte is a new opcode the rest can be encoded as the immediate data of that opcode.

I agree it might be less efficient comparing to 2315 because base pointer is saved additionally in the return stack, and this is a price to pay for more runtime correctness guarantees. I.e. 4750 approach guarantees that callee cannot read caller’s stack, while 2315 allows this.

And yes, in the future we should be able to get rid of these runtime underflow checks by using 3779-style validation. Then inefficiency goes away, too.

Tail call optimization should be possible with a special new opcode like TAILCALLF as @ekpyron noted above.

Overall 2315 approach is less restricted and I guess allowing more funky optimizations.

And 4750 is more strict, with more runtime checks, which allows for simpler reasoning about bytecode and its structure, fewer edge cases in protocol rules, possibly easier to audit compilers’ code.

I can also see both approaches possibly co-existing (less restricted “subroutines” inside restricted “procedures”), if compiler authors would find this complexity worthwhile.

I like this idea more than just bytes with a special meaning inside code section. (but this wastes precious opcode space)

I’ve roughly sketched out an extension to this proposal – EOF - Modules - HackMD – that allows for multiple entry points to each code section, mainly by having one type section for each code section. I’ve called these procedures – per Procedures for the EVM - HackMD – to distinguish them from the Simple Subroutines for the EVM - HackMD they are built on, and from the EIP-4750 functions defined here.

I’ve made a PR.

I’ve started pulling in relevant parts of EIP-615 to ensure that stack underflow and alignment are validated:

The only other change I’d beg for is longer names to distinguish them from all of the other CALL opcodes. CALLFN, CALLFUN, CALLFUNC … ?

@gumb0 @chfast @axic