Proposed Verkle tree scheme for Ethereum state

IIUC, we would logically store the hash in the tree for the purposes of generating the verkle commitment, but we would in practice store the original value in whatever on-disk key/value store is used.

This is exactly correct.

So calling SELFDESTRUCT would take O(num_used_slots) to clear them all out. The cleanest resolution to that problem is to say that we should disable SELFDESTRUCT before verkles go live. (which I’m in favor of doing anyway)

There’s a way to do it without that: have a “number of times self-destructed” counter in the state and mix it in with the address and storage key to compute the tree path. But that’s ugly in a different way, and yes, we should just disable SELFDESTRUCT.

So it seem like we either:

  • Get rid of SELFDESTRUCT which based on some recent discussion in ACD looks like it may be complex due to the interplay with CREATE2 and “upgradeable” contracts.
  • Keep SELFDESTRUCT but replace the actual state clearing with an alternate mechanism that just abandons the state. When combined with the state expiration schemes, this is effectively like deleting it since it will become inaccessible and eventually expire.
  • Keep SELFDESTRUCT and determine how we can clear the state in a manner that doesn’t have O(n) complexity.

Right, those are basically the three choices. My preference is strongly toward the first, and we acknowledge that upgradeable contracts through a full-reset mechanism were a mistake.

3 Likes

:100:

… and if DELEGATECALL isn’t appealing enough as an upgrade mechanism, let’s figure out how to improve that. (rather than how to keep in-place contract morphing via SELFDESTRUCT).

I don’t quite understand how this proposal achieves this. The initial 3 bytes are taken directly from the account address. So if an account has a 1 GB storage tree, then the subtree starting with those three bytes will store that 1 GB of data, much more than 100 GB / 2^24 = ~ 6 kB that an average subtree at that level stores.

As you know, address[:3] guarantees deduplicatation of an account’s the first 3 witness chunks. So I think the motivation is to optimize witness size. So just a clever trick.

I was wondering: is there any reason not to do address[:2] or address[:4]? More generally, what is the motivation for the “reasonably distributed” property (edit: how does it make syncing easier edit2: and how does it trade-off with the “short witness length” property)?

It is clever to use hash(same thing) + bytes([i]), chunk_lo, or s_key_lo to keep related values as neighbors in the tree.

A further optimization: the account version/balance/nonce/codehash/codesize commitment can also store, for example, the first ~100 code chunks and the smallest ~100 storage keys. But this optimization adds complexity, so I won’t push it.

Is there a reason for s_key_hi to be a 32-byte integer? The most significant byte is always \x00, so it seems that it could be omitted.

@dankrad @poemm I’m not super attached to address[:3] specifically. There’s a tradeoff: address[:4] makes witnesses slightly smaller, but it makes data less well-distributed, as then it would be a 1/2**32 slice instead of a 1/2**24 slice that could have a large number of keys. Meanwhile address[:2] slightly reduces the witness size advantage, at the cost of improving distribution. It’s possible that the optimal thing to do is to just abandon the address[:x] thing entirely; it would not be that bad, because in a max-sized block the first ~1.5 bytes are saturated anyway, and it would make the even-distribution property very strong (only deliberate attacks could concentrate keys).

No big reason. I guess making it a 32-byte integer is slightly more standardized, as we have plenty of 256-bit ints but no other 248-bit ints. It doesn’t make a significant difference either way.

I guess we mean account here instead of block?

So account’s basic data (header + code) is chunked into 256*32 byte chunks for generating a commitment per chunk and storage chunked into 65536 * 32 bytes
i.e. basic data’s leaf is Commitment of poly that take these 256 evaluations i.e.
Commitment(P| P(w^0)=Hash(0th 32 byte)... P(w^255)=Hash(255th 32 Byes) i.e. commitment of some P= 255 max degree poly in w

and storage data’s leaf is a poly which takes these 65536 valuations i.e.
Commitment(P| P(w^0)=Hash(0th 32 byte).... P(w^655355)= Hash (65535th 32 byte) i.e. commitment of some P= 65535 max degree poly in w ?

and rest of the commitments in the verkel tree corresponding to polys which take 256 max evaluations (hence 256 max degree) of their children.

Sorry it’s late in my timezone and I’m too tired to connect a few concepts I haven’t touched in a few years.

Any reason why you’ve ruled out using a trie instead? It sounds like there’s a concern for an attacker to build out an imbalanced tree and debilitate ease of access.

What about a self-balancing trie where rebalancing process includes a random/pseudo-random hash that would limit predictability of structure on rebalance.

This seems nice because you keep sub-trees, search is almost always optimal (or can be reconfigured based on assembly), AND assembly of the tree itself is quick (meaning the rebalancing act isn’t too expensive)

I also like that with deterministic locations …you could jump without having to traverse from the beginning (so even quicker)

Ethereum uses Tries from the very beginning
I think those 2 can give u an idea about the status quo, and the problem of proof size
https://eips.ethereum.org/EIPS/eip-2584

This new Verkle Tree/Trie I haven’t complete reading yet, but has the augmented tree property where u can get the proof only from nodes along the path without the need to fetch all siblings along the path
( Kate Commitments r associative functions)

.
Anyways, I believe this is a shorter to the point note about the data structure part, if u can’t read all of the above at once (like me)
https://vitalik.ca/general/2021/06/18/verkle.html

1 Like

Have u read this old thread from 2017 I think?
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-May/014337.html
I know it’s about Bitcoin, but maybe it can add some useful insight about the Cryptographic merits of the idea



.
I don’t know, could be I’m wrong & they’re not related methodologies?

1 Like

Thanks Shymaa! Appreciate your respectful answer and the resources you condensed for me to be brought up to speed.

Since we are discussing Prague inclusion, I’ll point out that the associated EIP is EIP-6800.