Hey, this is great! Then if solving the problem within geth is within reach, this EIP should definitely be dropped. A not yet done fix in a client should not be the cause for temporary EVM repricing. Changing EVM rules breaks the immutability of smart contracts and the blockchain and can only be be the last resort.
200-300ms for the validation, and then given you accept that block another 200-300ms for executing transactions of a new candidate and then 14400-14600ms for solving that new PoW (assuming avg. 15sec block time). So EVM calculation seem to reach 2-4% according to that 200-300ms numbers on day to day work for miners. Questions regarding these tests:
What is the reference hardware for this? (RAM and CPU frequency/cores, storage)
there general agreement on the reference
You mentioned disk io, does the most recent state data set fit in memory?
How big is that data set in your measurements?
Should it fit memory? Should it not fit?
Are the tools to re-execute this test and get the same numbers somewhere in git for others to reproduce? If not can you make them available and point us there? Would love to try.
At 2-4% it makes sense to look at options to make geth execution faster but these seem to be specific to geth and shouldnāt cause EVM repricing. Just regarding disk IO there are multiple options each of which would solve this. At least:
Data prefetching and
Optimistic parallel transaction execution and of course
Use a hash table for O(1) access
These would alleviate all io read concerns at this time and probably also reduce from that 2-4%. Do you know what the progress has been on this on geth over the last half year or so?
Absence of evidence is not evidence. As an outsider I have no indication that this assumption for the other clients is right or wrong. I think Iām just asking for proper process. There should be a minimum quorum of the other clients to positively confirm or reject that this must be fixed in the EVM and is not a client implementation detail before the EVM is modified. I guess this is an Ethereum governance question.
You have done excellent work with these measurements, and I wish more client teams would take an example and do similar bench-marking to come to best practices on how to implement an efficient client and find consensus on EVM pricing/mispricing. And these measurements show an issue that needs to be addressed, an O(1) operation such as SLOAD should not regress in such a way and needs to be fixed - but IMHO this is a bug with the client and needs to be fixed there.
Letās take a step back again, I think my comment was misinterpreted. The SLOAD/BALANCE and other opcodes like EXTCODEHASH are not inherently O(1), due to the fact that the ethereum state is represented in a trie. Lookup of any key in the trie is inherently dependent on the number of lookups that needs to be made. Thus, 1884 uses the term trie-size-dependent opcodes.
Now, when I said Geth is considering it, I was referring to this PR: https://github.com/ethereum/go-ethereum/pull/20152 . That PR introduces an additional acceleration data structure, which adds an extra 11-12 Gb of disk requirements for a flat-database of the trie leaves. During runtime, it maintains an in-memory tree of diffs that from time to time get āflattenedā towards the disk layer. This difflayer is needed to handle reorgs. This lookup structure is an optimization, and weāre not certain how good we can make it and how soon we can have it done.
The issue that EIP-1884 tries to handle is valid now ā or even, if you look at the numbers, it was measured up to around blocks 6.5M - and weāre approaching 9M now. So 1884 is sorely needed, and might even be too conservative.
Data prefetching already is done here, by executing the following block as weāre executing the current.
Interesting. Could you please share your evidence for that (imo pretty bold) claim?
Thatās why we discuss EIPs within the AllCoreDevs forum, and decide on what to include on publically broadcasted ACD meetings, where all client developers are present. Within that forum, there has been consensus reached that this is real problem that needs to be addressed on the platform layer.
SLOAD is also a lookup into a trie (although an individual per-account trie), inherently not O(1).
Regarding my benchmarks, itās available here. The datapoints for three differents runs are there, all executed on a m5d.2xlarge aws instance. The benchmarkers were done on a couple of separate branches, but the relevant commit that makes geth dump out stats is here.
thanks for taking the time to go through this, I honestly feel a bit bad for taking so much of your time away from developing eth code. Also Iām bit worried that this is now a conversation of two individuals, not sure how this can bloom into Ethereum governance consensus.
This looks like some really good work. That it adds storage and is additional acceleration data makes sense. In general database terms you could probably call it an index that you are adding. The additional amount of 12 Gb data though makes me curious what is that % of the total state data for adding this index? Is that 100%? Is one full world state (excluding history) currently 12 Gb heavy?
This sounds like the index for geth is still a couple of month out. It this also true for the other clients?
Thanks for the link, this is great! From looking at it I think the prefetcher can not be very effective, as it is just running one block ahead. If it has to do real disk io the prefetch will not be faster than the ārealā loop after it and the ārealā loop will catch up to the prefetcher because apart from IO they are both doing the same work. I think you will end up in a case where the āprefetcherā is usually overrun by the ārealā runner and then they work in parallel. So in terms of IO throughput increase it is max 2x. A simple alternative could be to prefetch all the transaction destinations - In fact you could do this even for the current block. E.g. before executing the block, loop the transactions and send the IO scheduler a message to prefetch each transaction destinations data. Then run the block.
The key is that we must make use of the high parallelism of SSDs and execute 8 - 16 reads commands at the same time all the time if we want to use the hardware efficiently.
An additive alternative approach would be the optimistic parallel transaction execution ā but also much more complex. In this scenario you execute all transactions in the block at the same time, but in isolated states. After the transactions finish one would need to compare the reads and writes each transaction has done and decide whether they are mergeable in any order ā Those that were not mergeable (e.g. because two transactions modified the same balance or the same storage location), would need to be executed in a second parallel step. This would probably a bigger rewrite but would not only use IO more efficiently but would allow effective use of multiple CPUs when processing transactions.
The evidence is the hardware capacity here vs. the theoretic work we ask it to do. I have yet to understand how big all contract storage of a recent block is in Gb, so here without that knowledge just disk io based:
Letās assume the ideal worst case: You got 8M gas and all of that goes into SLOAD. At SLOAD=200 gas you would get a maximum 40,000 SLOAD calls in a block. Each SLOAD call wants to read 32 bytes of data. In total that is 40,000 x 32 bytes => 1.2mb of data that need to be read from the disk ā worst case.
The SSD (nvme in case of m5d.2xlarge) on the other hand gives >100,000 IOPS or ~400mb random reads per second if we can send it read commands asynchronously. So from a hardware perspective we are able to randomly read the required 1.2mb within ~3 milliseconds.
So we have plenty of runway in terms of raw IO power here, but might be limited by the sequential access patterns, data fragmentation or other things. It would be interesting indeed to see the actual amount of disk IO going on during these tests and what part of the 200-300ms you mentioned before are disk wait ā and what part are in-memory tree traversals and others.
The call notes state āGeneral concensus that there are no objections from any of the clientsā ā This seems to be inline with what youāre saying, but is also with what Iām afraid of here ā This EIP is going to break tons of contracts, just because nobody had the time or motivation to closely analyze & evaluate itās impact or necessity. My question is primarily a Ethereum Governance question. Is it intentional that EIPs that break smart contract immutability are passed through because it is the path of least resistance for the client teams? E.g. in this case changing some hard-coded gas values is easier than implementing and index seems to be the reason for this EIP to be accepted.
Sweet! Did you sync over the network or did you have the chain data local? Do you have the rough setup / command line history to reproduce this?
Although this has been answered extensively, Iād like to highlight one more reason why the above bold highlights are not exactly correct.
Block headers require the recording of a state root.
This state root is calculated from the trie of account storage roots.
Every accountās storage root is itself calculated from a trie of stored data.
The backing data store can be whatever a node implementation chooses, but if the impl-n is to verify (or propose!) block headers and state changes, then it must be able to compute this.
I suppose this is what @holiman meant to convey when writing:
āEthereum state is represented in a trieā is not gethās choice, itās a protocol-level choice (made years ago).
This is described in the yellowpaper section 4.1 in a tad more detail, although thereās not much more information than the bullet points above.
You have to differentiate here between modifying operations and non-modifying operations. For a modifying operations such as SSTORE you indeed have to have a trie in order to update the merkle nodes correctly. But read operations like SLOAD and BALANCE do not need to to update trie nodes after the operation, hence you can shortcut the trie for read and just use hashmap. SSTORE though - because of the need to recalculate the merkle tree will stay more expensive.
I know that there have been efforts to introduce constant-time storage reads (SLOAD) in geth. Iām curious as to why these efforts failed. There was for example Turbogeth with this interesting quote:
State reads to require only 1 database lookup. With the change (3), and a small addition, where for each short node, we also write out the value node separately, with the full path, we can read values from the state without expanding the trie at all. I have implemented this change in this commit, but it is a bit messy. Subsequent profiling sessions showed that state reading is now negligible and does not even appear in the report.
Long term constant-time SLOAD and BALANCE calls should be achievable in all implementations. I would just hope for it to arrive quicker then it might actually take. Once it arrives though we should be able to drop the EVM gas prices for SLOAD back down again.
Making the testing more repeatable would be good, also Iām still confused as to whether the reference to gas price adjustment should be the cold-sync times or the live block processing. They are different challenges.
I have reviewed this EIP everything looks great but I have one note.
EIP-150 is listed as a requirement. But when reading the text I see that EIP-150 affected the analysis of EIP-1884 (DRAFT) but it does not affect the prescription of EIP-1884 (DRAFT). Therefore I think it is appropriate to demote EIP-150 from a prerequisite to a reference.