EIP-2926: Chunk-Based Code Merkleization

Discussion topic for

Code merkleization, along with binarification of the trie and gas cost bump of state accessing opcodes, are considered as the main levers for decreasing block witness sizes in stateless or partial-stateless Eth1x roadmaps. Here we specify a fixed-sized chunk approach to code merkleization, outline how the transition of existing contracts to this model would look like, and pose some questions to be considered.

I think it would be worth explicitely forbidding 0xffffffffas a key for a chunk. This fits into the practically we won’t ever have this problem, but in theory, long enough code would overwrite the metadata entry.

Cool to see the EIP, just gave it a read-through. At the risk of bike shedding :nauseated_face: I’d like to float the idea of removing the RLP from the spec. It looks like the only places that RLP is used are:

  • RLP([METADATA_VERSION, codeHash, codeLength])
  • RLP([firstInstructionOffset, C.code])

LEB128 looks suitable for METADATA_VERSION, codeLength, and firstInstructionOffset. codeHash can just be fixed length bytes. C.code can also just be the raw bytes, allowing us to just serialize these using concatenation:

  • LEB128(METADATA_VERSION) || codeHash || LEB128(codeLength)
  • LEB128(firstInstructionOffset) || C.code

If there is negative sentiment towards LEB128, codeLength could be a fixed size (?4 bytes?) big endian, and METADATA_VERSION & firstInstructionOffset could just be encoded as single bytes.

I think SSZ is a good candidate here for simplifying the spec.

REF: SSZ specification

There are a number of ways the data structure could be modeled so I’ll just start off with a suggestion:

codeRoot = ssz.hash_tree_root(merklizedCode)
merklizedCode = Container[metaData, code]
metaData = Container[version, codeHash, codeLength]
codeHash = keccak(raw_bytecode)  # is this correct?
version = uint8
codeHash = bytes[32]  # uint8[32]
codeLength = uint32
code = List[Container[uint8, bytes[32]]]

This eliminates any need for the spec to specify how each of these individual things are serialized, as well as leaning on the existing SSZ merklization rules.