EVM Object Format (EOF)

It gives us the possibility in the future to relax the strict ordering the header sections. If in the types 2 code header we used the value from the type 1 section, there would always be a requirement that the type header must precede the code header.

1 Like

Published a lengthier discussion starter about a large changeset called “EOFv2”:

1 Like

Adding a comment here as an acknowledgement that we’re following along with this EIP at Art Blocks as it relates to how we now do on-chain storage of artists’ generative art scripts using our BytecodeStorage library.

We have this issue on our end tracking this and have filed issues on the two common “SSTORE2” library implementations that we are aware of flagging this as well:

No response to this comment is expected, just flagging here for visibility.

Hi everyone,

I would like to submit an EOF-related proposal. (Previously posted on the R&D Discord, but now here upon recommendation.) I have read the EIPs and have hopefully not missed anything.

EOF1 contracts can only DELEGATECALL EOF1 contracts

Motivation:
Currently contracts can selfdestruct in three different ways (directly through SELFDESTRUCT, indirectly through CALLCODE and indirectly through DELEGATECALL). EIP 3670 disables the first two possibilities, however the third possibility remains. Allowing EOF1 contracts to only DELEGATECALL other EOF1 contracts allows the following strong statement: EOF1 contract can never be destructed.

Specification:
When an EOF1 contract performs a DELEGATECALL the target contract has to be EOF1. If it is not EOF1 (e.g. it is EOF0 or EOF2), the DELEGATECALL exceptionally halts. Hence, (among other things) all the gas passed along is consumed and 0 is pushed onto the stack. DELEGATECALL to an empty code also fails.

Security Implications:
Attacks based on SELFDESTRUCT simply disappear for EOF1 contracts. These include:

Backwards Compatibility:
No backwards compatibility is broken as EOF is newly introduced. In theory EOF1 contracts could use EOF0 libraries using DELEGATECALL but that seems relatively far fetched.

Complexity:
The check is relatively simple. Hence, no changes to the gas cost of DELEGATECALL would be needed and the implementation overhead should not be prohibitive.

Please let me know if I should provide more clarifications, expand, write a PR or move this elsewhere.

2 Likes

I think the biggest takeaway from this is if we don’t restrict DELEGATECALL then it becomes an escape hatch to do any features we banned in EOFv1. SELFDESTRUCT and CALLCODE are the only ones it really impacts right now, but it establishes a pattern.

PC, JUMP, and JUMPI are not escapable because their scope is only on the EVM code.

Note that if the calling contract doesn’t matter then a regular CALL could be used to access the features at extra cost. If we banned ECRECOVER from EOFv1 all we would be doing is just increasing the cost of that precompile to include a cold account load for the host contract. Currently none of the precompiles depend on the caller so this is actually the current state for all precompiles. We couldn’t ban a precompile in EOF and expect contracts not to find a way to use it.

So I’m personally in favor of this.

1 Like

During the Edelweiss interop there have been numerous discussions around EOF. We explored the idea how to properly reduce introspection, and discussed potential roadmaps. This document tries to give a glimpse into that process:

Would it make sense to have the code section sizes in the type section rather than in the header?

At the time of writing, the header is dynamically sized based on the number of code sections, this complicates header validation. In addition, this splits up function metadata a bit. We need to check the header to get the function’s size, then the type section to get the functions’ inputs, outputs, and max stack depth, then finally the code section to get the function’s instructions.

If the code size (u16) is stored in the type section, then we could have all of the function metadata in the same place.


Current container:

container  := header, body
header := magic, version, kind_type, type_size, kind_code, num_code_sections, code_size+, kind_data, data_size, terminator
body := type_section, code_section+, data_section
type_section := (inputs, outputs, max_stack_height)+

Proposed container:

container  := header, body
header := magic, version, kind_type, type_size, kind_code, num_code_sections, kind_data, data_size, terminator
body := type_section, code_section+, data_section
type_section := (inputs, outputs, max_stack_height, size)+

With the proposed schema, the header will always be 13 bytes, simplifying header parsing and allowing functions to be validated as the type section is parsed without having to refer back to the header.

It seems like this wasn’t posted here, but since December we had a "Unified EOF Specification (Unified EOF specification - HackMD) describing the changes of EIP-3540/3670/4200/4750/5450/6206.

After the Edelweiss Interop discussions we have posted a rollout discussion document.

This week the above two have been merged into a single specification: the “Mega EOF Endgame Specification”.

It explains all the changes needed to achieve banning of code and gas introspection/observability.

For anyone interested joining the discussions, there are bi-weekly calls called “EOF Implementers Call”, the next one is EOF Implementers Call #11 · Issue #748 · ethereum/pm · GitHub

I think this was discussed during the header format discussions in December, but I can’t remember the reasons, perhaps @gumb0 or @matt can?

It would bring certain benefits for sure, the reasons we decided against something like this were, I think:

  • Desire to keep section headers definition general (and future-proof) enough, avoiding very special treatment of some sections. So currently we have generally just two kinds of sections: single-instance section, defined by one size in the header, and multiple-instance sections defined by array of sizes.
  • It seems like a useful property to be able to find the start and end of any section after parsing only the header, without the need to parse any of the section bodies. The information about the structure of the container is encapsulated in the header.

Note also that with the new creation instructions proposal we extend the format with another array of sections - container sections - and it is similarly defined as number + array of sizes in the header.

Is the type section strictly meant to contain subroutine metadata, or can it be extended to allow for arbitrary multi-instance sections? If the latter, then adding container sections can follow the same pattern.

While finding the section start/end from just the header seems useful, it seems negligible to parse the header and subsequent type section for this particular use case, and a dynamic header seems to be more of a challenge for single pass parsers than a benefit. It can be done either way, but the current way seems more complex for implementors.

Howdy!

I was looking into "Mega EOF Endgame" Specification - HackMD + EIP-3540: EOF - EVM Object Format v1 and it was a bit unclear to me when the following items from the megathread are targetted for EIP-wise:

  • If the target account of EXTCODECOPY is an EOF contract, then it will copy 0 bytes.
  • If the target account of EXTCODEHASH is an EOF contract, then it will return 0x9dbf3648db8210552e9c4f75c6a1c3057c0ca432043bd648be15fe7be05646f5 (the hash of EF00, as if that would be the code).
  • If the target account of EXTCODESIZE is an EOF contract, then it will return 2.

Is the EIP that this is targeted for still TBD?

IIUC this is not the behavior effective as of EIP-3540 itself based on this section from EIP:

  • EXTCODECOPY/EXTCODESIZE/EXTCODEHASH with the EOF target contract - works as with legacy target contract
    • EXTCODESIZE returns the size of entire target container
    • EXTCODEHASH returns the hash of entire target container
    • EXTCODECOPY can copy from target’s code section
    • EXTCODECOPY can copy from target’s data section
    • EXTCODECOPY can copy from target’s EOF header
    • EXTCODECOPY can copy entire target container
    • Results don’t differ when executed inside legacy or EOF contract

However, I’m not sure if there are other EIPs targeted for the same hard fork that would have this impact or if that is a later stage,

Not sure if this is better suited to ask here or in the Core Devs Discord, so opted to post here and cross-link in the Discord – my apologies if I missed it in my search here and there is a better place to post this.

Correct, these changes are not EIPified yet.

They are targeted for the same fork, so this part if EIP-3540 can be viewed as outdated.

Fantastic – thank you for clarifying!

Including here for broader visibility our plans for upgrading/migrating our contracts for on-chain art storage to support the EOF v1 hardfork plans.

Primarily sharing this here for visibility, perhaps for other on-chain art teams who may stumble upon this EIP discussion thread, but if any folks have feedback as to how we may be misunderstanding the EOF hardfork path here, please do reach out here in Discourse, via Twitter (I’m @purphat), or in the ETH R&D Discord (I’m purplehat.eth#7327)

Hi all, is the EOF abbreviation really good fit?

EOF = end of file in IT for decades.

Apologies to jump into the discussion with such a pittance, but new devs might have a lot of issues when searching for tutorials, articles, SO, etc…

PS: not advocating “FOE” :wink:

1 Like

Hi all,

I was just looking at the dependencies for EIP3540, and it seems that EIP3540 requires EIP5450 which requires EIP4200 and EIP4750. That means EOFv1 couldn’t be launched without also including: (1) static branching opcodes (RJUMP, etc); (2) subroutine calls (CALLF, RETF). The reason for this is that the stack validation algorithm relies on static control flow. This is necessary because of the strict requirement that unreachable code is not permitted.

It seems to me that you could cut things differently to reduce the dependencies whilst still achieving the overall goal. Here’s an alternative:

Launch EIP3540 (EOFv1) and EIP5450 (Stack Code Validation) together on their own (i.e. without EIP4200 & EIP4750). In a subsequent fork, launch EIP4200 and EIP4750 (and perhaps e.g. EIP663) as non-breaking changes to EOFv1.

By “non-breaking changes” I mean that no EOF version change is required. We are just updating EOFv1 after the fact to make the release more incremental.

The current code validation rules (EIP5450) are a problem though. We need a code validation algorithm that does not rely on static branching. Instead, it needs to ensure arbitrary (junk) bytes are not permitted within a code section (otherwise e.g EIP4200 is a breaking change). That’s easy enough, and doesn’t need reachability (and is e.g. what I believe the JVM does). Something like this:

Every byte within a code section is either a valid opcode for an instruction or part of the immediate operands of an instruction.

This prevents arbitrary bytes from existing within a code section. Yes, it means instructions may be unreachable. But it also means that EIP4200 or EIP4750 can be deployed after the fact as non-breaking changes. Furthermore, code validation requires a single linear scan (i.e. does not require a worklist algorithm as per EIP5450).

What compilers would target code-validated only EOF? how would that reduce total bytes?

I think the minimally viable variant is

  • EOF Container
  • Code Validation
  • Stack Validation
  • Static Jumps
  • Code Sections (with CALLFI/JUMPFI)
  • EIP-663 - EXCHANGE, EXCHANGE2, and DUPN
  • Data Opcodes
  • Easy opcode bans: CALLCODE, SELFDESTRUCT, PC
    For EOF1 we would still permit CREATE[2] from memory. And a restriction that only EOF containers can CREATE[2] EOF contracts that I think was already in most big-EOF code.

Then we could do a follow on with EOF2 in a future hard fork, possibly wire it in early so L2s and such can use it:

  • Ban Code Introspections
    • Ban EXTCODE opcodes
    • Ban EXTCODE into EOF2 contracts
    • Add FACTORYCREATE and EXTCREATE opcodes (neè CREATE3/4)
  • Ban Gas Observability
    • Ban old CALL series
    • Add LCALL, LDELEGATECALL, LSTATICCALL

This would be all the “breaking” changes (container and validation) and the MVP for compiler use in EOF1. Every compiler ask would ideally be in EOFv1 at launch.

EOF2 would just be (mostly) a different selection of opcodes with the same validation core logic. The subcontainer section would be added, but we could also activate FACTORYCREATE in EOF1 as well.

So we can do it in two steps, with only one set of validator logic between the two. Simply a data-driven switch to change the opcode table for validation.

What compilers would target code-validated only EOF?

Surely, it doesn’t matter whether or not they target it by default at the beginning? Presumably they would have an option for early adopters. The point of code-validated only EOF is to get EOF through the process and deployed.

Once this is done we can start rolling out non-breaking changes, such as static jumps and DUPN, SWAPN. With these instructions deployed, compilers would start to target EOF by default (as it would be in their interest). Even CALLF and RETF could be rolled out as a non-breaking change (provided that JUMP/JUMPDEST was retained). Then, later, JUMP / JUMPDEST are retired in a breaking change taking us to EOFv2 (possibly with other goodies such as removing code/gas introspection piggybacked on board).

All I am saying is that your MVP is not actually the MVP. There is another (smaller) option. The nice thing about this option is that it allows us to roll out non-breaking changes and, in doing so, incentivise compiler writers to adopt. It means they can take their time and there is less pressure. Anyway, its just a thought.