EIP-7834: Separate Metadata Section for EOF

Discussion topic for EIP-7834

Update Log

External Reviews

None as of 2024-12-09.

Outstanding Issues

None as of 2024-12-09.

Summary

This EIP proposes a separate section for contract metadata in the EOF. Initially discussed at EOF Implementers Call #62 · Issue #1192 · ethereum/pm · GitHub.

Solidity appends contract metadata in the runtime bytecode in legacy bytecode (see playground.sourcify.dev). The current practice already makes it difficult to locate these in nested contracts and makes source-code verification difficult.

The tentative practice in EOF for any “metadata” in the Solidity compiler is to add it in the beginning of the data_section. However, changes to the size of data_section do affect the contract’s code by shifting DATALOADN offsets. This makes it impossible to verify two identical contracts with different metadata sizes.

Solidity’s default metadata is just one current example. Other kinds of metadata might be needed by any tooling or contract developer in the future. It is good practice to have a separate section for contract metadata that is unreachable by the code, and any changes to its size do not affect the actual code.

3 Likes

Discussed this somewhat offline in the discord and on an EOF call, and wanted to point out that it increases EVM complexity without any in-protocol benefit. Metadata can be implemented using other in-EVM approaches. For example, in Vyper, metadata about the runtime code lives in the initcode. This serves two purposes: it makes it unreachable from runtime code, and also reduces runtime code size (which is both expensive and subject to the EIP-170 hard limit).

@kuzdogan pointed out that that might be inconvenient for off-chain tooling to index, so another approach would be issuing an event (like log VyperMetadata(metadata_bytes)) at the time of contract creation for indexers to consume. This is slightly more expensive than the initcode approach, but still cheaper than storing in runtime code, and also achieves the unreachability goal as logs also exist as a a kind of metadata within the EVM which is unreadable from EVM code. The point is that there are in-EVM approaches which work today for this, so the goals of this EIP can be achieved without burning a new section kind.

With the EIP-7834 section we can still use the vyper approach to metadata. The initcode container is the container that has the metadata section, and the deployed contracts do not have the metadata section. Then users can go back through chain history to look at the contract creation to get the metadata. Whether compilers attach metadata to the initcode container or the runtime container is a choice the compiler will make.

I do think there is net-positive protocol benefit to this section. Right now we don’t have any place to associate with code data that is not accessible durring execution and can be used for whatever purpose that the code producer wants. The first pass would be for compilers to store their validation information, but I also see that regulated L2s may need to attach “legends” to their contracts that regulators may want for their own purposes. If we continue to drop these into the execution-accessible data section then we make the math more complex as to where to go for extra aux-data, and it makes it possible to create aux-data that could “mask” the metadata.

In it’s current form, where it is a segment of the EOF code that is ignored at runtime and treated as opaque, it frees up a lot more design space further down the supply chain.

1 Like

In reference to the alternative approach @charles-cooper pointed out:

In my opinion, it is important to attach the metadata to the runtime bytecode. Both other approaches, putting the metadata in the initcode and in an event, require indexing the chain if you want to access the metadata. This is inconvenient for tooling and verifiers, but, much more importantly, it is also inconvenient and not always feasible for users.

Users need to have easy access to the metadata or integrity hash. If they download the source code from a service like Sourcify, they should be able to verify themselves that the source they downloaded is correct. For this, they need the metadata or integrity hash. If it is not in the runtime code, users who don’t index the chain themselves would have to depend on external services. So, attaching the metadata to the runtime code is also a matter of enabling users not to have to trust third parties.

it increases EVM complexity without any in-protocol benefit

I disagree with the above point. Having one less section might seem less complexity on the protocol, however, this basically pushes the complexity to the application layer and tooling. Any convention to store metadata in the data_section requires coordinated approaches across compilers and tooling on how to decode this section, and again, this directly affects contract’s code.

The whole point of the EOF upgrade is to bring structure to code and this EIP helps achieve that by handling a missing use case. It also provides a larger design space for anyone to do whatever they want in the metadata_section without affecting the rest of the code.

One can say that EOF makes things more “complex” too. Maybe, but for good reasons to make other parts of the Ethereum stack simpler.

an event (like log VyperMetadata(metadata_bytes))

I don’t fully understand how this would be done. Do you mean the compiler my default adds a LOG to the contract code. Is there a possibility this gets in conflict with user defined logs? And potentially someone else can emit the same log with a different integrity value?

Logs option might be worth considering for this specific use case too. My feeling is still it’s worth keeping the contract’s metadata closer to it’s actual code. And also even if we go full logs route for compiler metadata, I feel there are still use cases that we don’t know yet that would use the metadata_section and benefit from having that section that’s isolated from the code and data.

1 Like