Proposed Verkle tree scheme for Ethereum state
Edit 2021.06.07: edited the leaf structure to fit account headers and code and a few storage slots together
This document describes a proposal for how concretely the Ethereum state can be represented in a Verkle tree.
See Verkle trie for Eth1 state - HackMD for notes of how Verkle tries work.
Desiderata
- Short witness length for accounts or storage, even under “attack”. This necessitates:
- An “extension node”-like scheme where the bulk of an address or storage key can be stored as part of a single node, instead of always going 32 layers deep
- Hashing account addresses and storage keys to prevent attackers from filling the storage in locations that are close enough to each other that branches to them become very long without doing a very large amount of brute force computation
- Maximum simplicity. Particularly, it would be ideal to be able to describe the result as a single Verkle tree
- Forward-compatibility (eg. ability to add more objects into an account header in the future)
- Code for an account should be stored in one or a few subtrees, so that a witness for many code chunks can be minimally sized
- Frequently-accessed values (eg. balance, nonce) should be stored in one or a few subtrees, so that a witness for this data can be minimally sized
- Data should be by default reasonably distributed across the entire state tree, to make syncing easier
An updated proposal, which puts account data closer together to reduce the witness size per-account-access:
The proposed scheme can be described in two different ways:
- We can view it as a “trie of commitments”, with two additional simplifications:
- Only leaf nodes containing extended paths, no “internal” extension nodes
- The leaves of the top trie are only commitments (that are Kate commitments of the same type as the commitments used in the trie), and not “a hash pointing to a header which contains two values, a bytearray and a tree” as is the status quo today
- We can view it as a single tree, where there are internal extension nodes but they can only extend up to the 31 byte boundary (tree keys MUST be 32 bytes long)
These two perspectives are completely equivalent. We will focus on (2) for the rest of the description, but notice that if you take perspective (1) you will get a design where each account is a subtree.
The Verkle tree structure used, from the first perspective, will be equivalent to the structure described here . From the second perspective, it would be equivalent to the structure described in that document, except that instead of (key, value) leaf nodes, there would be intermediary nodes that extend up to the 31 byte boundary.
We define the spec by defining “tree keys”, eg. if we say that the tree key for storage slot Y of account X is some value f(X, Y) = Z
, that means that when we SSTORE to storage slot Y in account X, we would be editing the value in the tree at location Z
(where Z
is a 32-byte value).
Note also that when we “store N
at position P
in the tree”, we are actually storing hash(N) % 2**255
. This is to preserve compatibility with the current 32-byte-chunk-focused storage mechanism, and to distinguish “empty” from “zero” (which is important for state expiry proposals).
Parameters
Parameter | Value |
---|---|
VERSION_LEAF_KEY |
0 |
BALANCE_LEAF_KEY |
1 |
NONCE_LEAF_KEY |
2 |
CODE_KECCAK_LEAF_KEY |
3 |
CODE_SIZE_LEAF_KEY |
4 |
HEADER_STORAGE_OFFSET |
64 |
CODE_OFFSET |
128 |
VERKLE_NODE_WIDTH |
256 |
MAIN_STORAGE_OFFSET |
256**31 |
It’s a required invariant that VERKLE_NODE_WIDTH > CODE_OFFSET > HEADER_STORAGE_OFFSET
and that HEADER_STORAGE_OFFSET
is greater than the leaf keys. Additionally, MAIN_STORAGE_OFFSET
must be a power of VERKLE_NODE_WIDTH
.
Header values
The tree keys for this data are defined as follows:
def get_tree_key(address: Address, tree_index: int, sub_index: int):
# Asssumes VERKLE_NODE_WIDTH = 256
return (
hash(address + tree_index.to_bytes(32, 'big'))[:31] +
bytes([sub_index])
)
def get_tree_key_for_version(address: Address):
return get_tree_key(address, 0, VERSION_LEAF_KEY)
def get_tree_key_for_balance(address: Address):
return get_tree_key(address, 0, BALANCE_LEAF_KEY)
def get_tree_key_for_nonce(address: Address):
return get_tree_key(address, 0, NONCE_LEAF_KEY)
# Backwards compatibility for EXTCODEHASH
def get_tree_key_for_code_keccak(address: Address):
return get_tree_key(address, 0, CODE_KECCAK_LEAF_KEY)
# Backwards compatibility for EXTCODESIZE
def get_tree_key_for_code_size(address: Address):
return get_tree_key(address, 0, CODE_SIZE_LEAF_KEY)
Code
def get_tree_key_for_code_chunk(address: Address, chunk_id: int):
return get_tree_key(
address,
(CODE_OFFSET + chunk) // VERKLE_NODE_WIDTH,
(CODE_OFFSET + chunk) % VERKLE_NODE_WIDTH
)
Chunk i
contains a 32 byte value, where bytes 1…31 are bytes i*31...(i+1)*31 - 1
of the code (ie. the i’th 31-byte slice of it), and byte 0 is the number of leading bytes that are part of PUSHDATA (eg. if part of the code is ...PUSH4 99 98 | 97 96 PUSH1 128 MSTORE...
where |
is the position where a new chunk begins, then the encoding of the latter chunk would begin 2 97 96 PUSH1 128 MSTORE
to reflect that the first 2 bytes are PUSHDATA).
Storage
def get_tree_key_for_storage_slot(address: Address, storage_key: int):
if storage_key < (CODE_OFFSET - HEADER_STORAGE_OFFSET):
pos = HEADER_STORAGE_OFFSET + storage_key
else:
pos = MAIN_STORAGE_OFFSET + storage_key
return get_tree_key(
address,
pos // VERKLE_NODE_WIDTH,
pos % VERKLE_NODE_WIDTH
)
Note that storage slots in the same size VERKLE_NODE_WIDTH
range (ie. a range the form x*VERKLE_NODE_WIDTH ... (x+1)*VERKLE_NODE_WIDTH-1
) are all, with the exception of the HEADER_STORAGE_OFFSET
special case, part of a single commitment. This is an optimization to make witnesses more efficient when related storage slots are accessed together. If desired, this optimization can be exposed to the gas schedule, making it more gas-efficient to make contracts that store related slots together (however, Solidity already stores in this way by default).
Additionally, storage slots 0 ... (CODE_OFFSET - HEADER_STORAGE_OFFSET - 1)
are stored in the first commitment (so you can think of them as “editable extra header fields”); this is a further optimization and allows many simple contracts to fully execute by loading only a single commitment.