EIP 1884: Repricing for trie-size-dependent opcodes

MatthiasEgli · September 11, 2019, 1:07pm

Is there a fundamental need for geth to lookup data in a SLOAD operation in a patricia merkle tree? It’s obvious that it is necessary to create the tree to calculate the storage tree root, but why can’t there be a constant-time lookup cache layer for reading these values? According to a recent analysis by Péter (here: https://twitter.com/peter_szilagyi/status/1166633058348556288) the raw uncompressed size of storage data is 15.32 GB, potentially allowing a reduction to below 9GB, which should enable a quite efficient cache.

If this is the case, then the increase in SLOAD time is more due to the implementation decisions taken by the client which can be fixed without a network upgrade. Given the current (relatively high due to recent client optimizations) gas cost for SSTORE, adding a slight cost with additional caching here while gaining considerable lookup speed for SLOAD should increase the performance overall significantly.

MatthiasEgli · September 17, 2019, 11:08am

After collecting more feedback on this from many involved like Péter Szilagyi, @holiman and @AlexeyAkhunov (turbo-geth) it is clear that actually a lot of work is currently going into getting constant-time lookup into geth, either as a side-effect of a new sync protocol or in the form of a new database layout.

I know that it is pretty late in the process, but knowing that this will be fixed client-side in the foreseeable future which might even require to make it a lot cheaper again, combined with the concerns from major projects and from new projects we are in contact with (which are afraid to choose Ethereum due to being less able to rely on it still working for them in the future - I know there are a lot of arguments for and against this, but it is a fact that EIP-1884 is being used as an argument against Ethereum), why not focus on the client implementations and drop this quite contested EIP?

holiman · September 17, 2019, 12:15pm

To clarify a bit on that (and sorry I didn’t answer earlier).
So keeping a db on the side is very nice, on paper. In reality, the problem is reorgs. So if you have a flat db, you are incapable of reverting a few blocks.

So what you wind up with is multiple layers of flat databases, where the bottom layer might be a couple of hundred blocks back. That one is on disk, and there are overlays in memory. To actually lookup a value, you need to investigate the in-memory layers first, in case the value has been changed in the last N blocks. Then eventually you hit disk and obtain the last stored value.

This is a promising approach, and somewhat of a necessity for the future new sync protocol which @karalabe is working on, but it’s still work in progress, and will probably not be a magic bullet so solve the lookup problem. It’s basically research at this point

jochem-brouwer · September 19, 2019, 6:38am

For consistency, can we rename BALANCE to EXTBALANCE to comply with the other EXT* opcodes and their non-EXT equivalents?

holiman · September 19, 2019, 7:49am

For consistency, can we rename BALANCE to EXTBALANCE to comply with the other EXT* opcodes and their non- EXT equivalents?

I’d love for that to happen… I didn’t want to do that right away in 1884, though, since it might give the impression of being a new opcode, whereas it in fact is just a UI thing, not related to the consensus rules.

axic · September 19, 2019, 12:53pm

EIP-1803: Rename opcodes for clarity tries to do all this renaming, but it doesn’t seem to be tied to hard forks.

epheph · September 19, 2019, 6:57pm

It seems this conversation has stalled, but it still seems extremely important. There is a strange, small incentive to use contracts for storage of even single word and an enormous benefit for 2 or more words. The gas savings becomes tremendous; loading (4) 32-byte words would cost 3,200 via SLOAD and 712 gas via EXTCODECOPY, ~80% gas savings.

If this is storage data shared with other contracts, retrieving this data will be even MORE cost-effective, since their CALL and EXTCODECOPY will cost basically same, but the CALL still has to SLOAD.

If these incentives remain out of alignment, I do expect another GasToken-like project to emerge (as @jochem-brouwer implies). These savings are even greater than GasToken: they save up around 80% (for 4 storage variables, reasonable number of state variables for many contracts) while GasToken is limited to 50%, they don’t require holding and minting balances for yourself or on behalf of users and don’t require you to provide an oversized GasLimit with each transaction

wjmelements · September 30, 2019, 8:17pm

EIP1884 should allow EXTBALANCE to cost the same as SELFBALANCE in the case the parameter is the call address, but this is not mentioned in the spec.

wjmelements · September 30, 2019, 8:54pm

Hello, I am an engineer who has been exploiting irregularities in the fee structure to save my customers gas. My project is TrueUSD, which has a market cap of $190m. Our tokens have wasted tons of EVM space (>30 MB so far) because of poor design decisions in the past. This looks like another.

I plan to exploit the following issues at scale if 1884 goes through as-is.

As others have pointed out, with this change is cheaper to read data from EXTCODECOPY than to read it from local state with SLOAD. Reading a word from data costs 800 per word while reading a word from code costs 700, plus 3 per word.

It is not only cheaper to read data from code, but also to write it. After the 32000 fixed cost of CREATE, writing a word costs 6400 in code but 20000 for state.

So, under this scheme, if a contract wants to update a group of fields about a user, and that group is larger than 2, they should use external code. If updating is sufficiently less-common than reading, then all data should be externalized into code.

Unlike the GasToken exploit, there is no incentive to clean up contract code used in this way, and there is no easy way to assess fees to the polluters. Punishing good behavior (SLOAD) could result in an explosion of the state space much sooner than expected, and without enough time to plan intelligent mitigation.

The fix for this while keeping the proposed cost increases is to also increase EXTCODECOPY cost per word to something reasonable like 800.

A better fix would be to find a way to make the SLOAD costs constant and not logarithmic. It is not good that this increase predicts future increases as the network grows. If SLOAD is not scalable then surely Ethereum is not either.

adlerjohn · October 1, 2019, 4:32pm

Could you clarify why this is? Selfdestructing a contract (a la GST2) results in a gas refund, just as clearing a word with SSTORE does.

Agusx1211 · October 1, 2019, 10:15pm

The fix for this while keeping the proposed cost increases is to also increase EXTCODECOPY cost per word to something reasonable like 800.

This would solve the unbalance when reading using EXTCODECOPY, but the exploit is still possible, when a contract is called the contract bytecode is loaded without incurring into any gas costs, this can be used to store and read data using bytecode (PUSH32 + MSTORE + RETURN see:
EVM Istanbul storage pricing. or how to hack the EVM to spend half… | by Agusx1211 | Medium)

A better fix would be to find a way to make the SLOAD costs constant and not logarithmic.

I agree, the SLOAD execution cost can’t be determined by the trie size while ignoring all the other cases where the EVM has to perform exactly the same operation will lead to devs exploiting the system.

MatthiasEgli · October 1, 2019, 10:52pm

Related to the repricing, how much time should a node at most spend on computing a block? Asked differently, if it takes more than X seconds to compute a block, for which X would this cause issues for the network?

Initially, 1 GAS represented around 1’000ns on an unknown machine which was used to determine the initial gas cost, and the block gas limit was at 3.1415m (that number looks familiar…). So this seems to indicate that ~3 seconds were deemed acceptable to compute a block, but of course this might all be coincidence.

Today, we are often achieving 20m GAS/second on recent hardware using new versions of geth, and 10m GAS/second on the “ordinary” (ssd-powered) cloud instance out there, so ~1 second to compute todays 10m gas limit blocks.

Obviously, any X above or close to the target block time of 13 seconds is a certain show-stopper. But what about blocks which require 5 seconds on recent powerful hardware today? In the end it is about when the network starts to either slow down due to longer block times created by the miners or when nodes used by applications (e.g. Infura, but of course there are many other ones out there) can’t keep up anymore. Does anyone have experience or suggestions regarding the “minimum supported throughput on the minimum supported hardware”?

wjmelements · October 2, 2019, 5:54am

You’re right here: if updates happen often enough it would make sense to selfdestruct data contracts when you replace them, but in other contexts it would be worse. Adding in the code to ensure that only you can self-destruct your data is more expensive than prefixing it with the invalid opcode. In the case where you never update the data, there is no benefit to cleanup. In the case where reads happen sufficiently more often than updates, the differential read cost could overtake the write savings. As with GST2, expected gasPrice should play a role in your calculation.

wjmelements · October 2, 2019, 5:55am

Ooh, I hadn’t considered that.

wjmelements · October 2, 2019, 6:04am

Besides the block time, you also have to consider the block propagation time. A node will not pass along a block unless it is valid, and the network is not fully-connected, so a block must be validated by many nodes before all of the miners will adopt it. Block validation time is a key component of uncle rate.

PaoloRebuffo · October 2, 2019, 9:24am

Given the success of ethereum it is likely that more “repricing” operations will be needed in the future.
I therefore wonder if it is more productive to focus on giving a price to states, ( to complete the work of Alexey Ahkunov) and drop this EIP.

jochem-brouwer · October 2, 2019, 1:39pm

This is not part of the spec. Balance gas will get bumped to the new gas amount even if it is called on the current address.

adlerjohn · October 2, 2019, 3:12pm

A common misconception, but a false one nonetheless. Clients only verify that sufficient PoW was done (i.e., they only validate the header) before propagating the block.

github.com/openethereum/parity-ethereum

Propagate new blocks after verifying PoW but without waiting for them to be imported

opened 02:08PM - 17 Sep 18 UTC

closed 10:30AM - 28 Nov 18 UTC

AlexeyAkhunov

F7-optimisation 💊 M4-core ⛓

- **Parity Ethereum version**: all versions - **Operating system**: all operati…ng systems - **Network**: ethereum / ropsten Go-etherum currently propagates new blocks straight after it checks that Proof Of Work is valid. This makes block propagation speed independent of how fast the clients are able to process them fully. Current version of Parity imports blocks (which includes executing all transactions, computing the new state root, and committing the new state) before it propagates them. This issue requests the change so that Parity also propagates after PoW check. Propagating after PoW check does not present DoS attack vector, because producing a block with valid PoW is quite expensive, so they cannot be created in large enough quantities to enable such an attack. Miners will have to properly import the blocks before building on top of them, therefore, this change will not lead to invalid block branches growing.

wjmelements · October 3, 2019, 8:14am

That seems unfair to existing contracts.

wjmelements · October 3, 2019, 9:38am

I have submitted a proposed adjustment to EIP 1884 that would solve the EXTCODECOPY issue and the SELFBALANCE discrepancy.