EIP 1884: Repricing for trie-size-dependent opcodes

Hey guys. I just realized something when I was thinking about using contracts in combination with CREATE2 to store large amounts of data (reading them from external contracts gets cheaper if you read a lot of data - see this article (not mine) about someone who researched this exact approach).

The problem relies on the fact that the proposed increase of SLOAD is from 200 -> 800. This is more than EXTCODECOPY (700 gas + 3 gas per word). This means that it is cheaper even reading a full slot (32 bytes / 8 words) using EXTCODECOPY. Correct me if I am wrong but this would cost 700 + 32/4 * 3 = 724 gas to read 32 bytes from an external contract. This does not seem rational to me and if this happens this might have the unintended side effects of people going to SSTORE data in other contracts (FYI: SSTORE equivalent: deploy a contract where the code is the storage, the actual storage of this contract would be empty) especially if those things are going to be read a lot (as opposed to writing it a lot which is pretty expensive). This might get some effects which were unintended like, for example, the GasToken which “abuses” the gas refund counter. We might hence see people deploying read-only contracts if this EIP gets deployed as-is because reading it is cheaper (and it will get much more cheaper if you start reading for example 2 slots - 748 gas as opposed to 1600 gas!).

Proposed solution would be to either bump EXTCODECOPY or to lower the proposed 800 gas for SLOAD.

3 Likes

Strange indeed. Maybe not a desired behaviour but loading contract bytecode is indeed much cheaper becase code is not stored in the Patricia tree.

1 Like

I was not aware that this was in fact cheaper so it is good that this is cleared up (note: not exactly aware about the storage location of the contract code / storage slots and the cost of looking these up - I assumed those were about the same). I do wonder if the EIP proposers are aware of this semi-weird gas pricing though, as this might bring these unintended (?) side effects (a la GasToken) at the Istanbul fork.

@holiman? I believe this is safe (although a bit awkward if it becomes a common practice and it may lead to the following EIPs having to deal with even stranger legacy contracts).

Is there a fundamental need for geth to lookup data in a SLOAD operation in a patricia merkle tree? It’s obvious that it is necessary to create the tree to calculate the storage tree root, but why can’t there be a constant-time lookup cache layer for reading these values? According to a recent analysis by Péter (here: https://twitter.com/peter_szilagyi/status/1166633058348556288) the raw uncompressed size of storage data is 15.32 GB, potentially allowing a reduction to below 9GB, which should enable a quite efficient cache.

If this is the case, then the increase in SLOAD time is more due to the implementation decisions taken by the client which can be fixed without a network upgrade. Given the current (relatively high due to recent client optimizations) gas cost for SSTORE, adding a slight cost with additional caching here while gaining considerable lookup speed for SLOAD should increase the performance overall significantly.

1 Like

After collecting more feedback on this from many involved like Péter Szilagyi, @holiman and @AlexeyAkhunov (turbo-geth) it is clear that actually a lot of work is currently going into getting constant-time lookup into geth, either as a side-effect of a new sync protocol or in the form of a new database layout.

I know that it is pretty late in the process, but knowing that this will be fixed client-side in the foreseeable future which might even require to make it a lot cheaper again, combined with the concerns from major projects and from new projects we are in contact with (which are afraid to choose Ethereum due to being less able to rely on it still working for them in the future - I know there are a lot of arguments for and against this, but it is a fact that EIP-1884 is being used as an argument against Ethereum), why not focus on the client implementations and drop this quite contested EIP?

To clarify a bit on that (and sorry I didn’t answer earlier).
So keeping a db on the side is very nice, on paper. In reality, the problem is reorgs. So if you have a flat db, you are incapable of reverting a few blocks.

So what you wind up with is multiple layers of flat databases, where the bottom layer might be a couple of hundred blocks back. That one is on disk, and there are overlays in memory. To actually lookup a value, you need to investigate the in-memory layers first, in case the value has been changed in the last N blocks. Then eventually you hit disk and obtain the last stored value.

This is a promising approach, and somewhat of a necessity for the future new sync protocol which @karalabe is working on, but it’s still work in progress, and will probably not be a magic bullet so solve the lookup problem. It’s basically research at this point

2 Likes

For consistency, can we rename BALANCE to EXTBALANCE to comply with the other EXT* opcodes and their non-EXT equivalents?

1 Like

For consistency, can we rename BALANCE to EXTBALANCE to comply with the other EXT* opcodes and their non- EXT equivalents?

I’d love for that to happen… I didn’t want to do that right away in 1884, though, since it might give the impression of being a new opcode, whereas it in fact is just a UI thing, not related to the consensus rules.

1 Like

EIP-1803: Rename opcodes for clarity tries to do all this renaming, but it doesn’t seem to be tied to hard forks.

1 Like

It seems this conversation has stalled, but it still seems extremely important. There is a strange, small incentive to use contracts for storage of even single word and an enormous benefit for 2 or more words. The gas savings becomes tremendous; loading (4) 32-byte words would cost 3,200 via SLOAD and 712 gas via EXTCODECOPY, ~80% gas savings.

If this is storage data shared with other contracts, retrieving this data will be even MORE cost-effective, since their CALL and EXTCODECOPY will cost basically same, but the CALL still has to SLOAD.

If these incentives remain out of alignment, I do expect another GasToken-like project to emerge (as @jochem-brouwer implies). These savings are even greater than GasToken: they save up around 80% (for 4 storage variables, reasonable number of state variables for many contracts) while GasToken is limited to 50%, they don’t require holding and minting balances for yourself or on behalf of users and don’t require you to provide an oversized GasLimit with each transaction

2 Likes

EIP1884 should allow EXTBALANCE to cost the same as SELFBALANCE in the case the parameter is the call address, but this is not mentioned in the spec.

Hello, I am an engineer who has been exploiting irregularities in the fee structure to save my customers gas. My project is TrueUSD, which has a market cap of $190m. Our tokens have wasted tons of EVM space (>30 MB so far) because of poor design decisions in the past. This looks like another.

I plan to exploit the following issues at scale if 1884 goes through as-is.

As others have pointed out, with this change is cheaper to read data from EXTCODECOPY than to read it from local state with SLOAD. Reading a word from data costs 800 per word while reading a word from code costs 700, plus 3 per word.

It is not only cheaper to read data from code, but also to write it. After the 32000 fixed cost of CREATE, writing a word costs 6400 in code but 20000 for state.

So, under this scheme, if a contract wants to update a group of fields about a user, and that group is larger than 2, they should use external code. If updating is sufficiently less-common than reading, then all data should be externalized into code.

Unlike the GasToken exploit, there is no incentive to clean up contract code used in this way, and there is no easy way to assess fees to the polluters. Punishing good behavior (SLOAD) could result in an explosion of the state space much sooner than expected, and without enough time to plan intelligent mitigation.

The fix for this while keeping the proposed cost increases is to also increase EXTCODECOPY cost per word to something reasonable like 800.

A better fix would be to find a way to make the SLOAD costs constant and not logarithmic. It is not good that this increase predicts future increases as the network grows. If SLOAD is not scalable then surely Ethereum is not either.

2 Likes

Could you clarify why this is? Selfdestructing a contract (a la GST2) results in a gas refund, just as clearing a word with SSTORE does.

1 Like

The fix for this while keeping the proposed cost increases is to also increase EXTCODECOPY cost per word to something reasonable like 800.

This would solve the unbalance when reading using EXTCODECOPY, but the exploit is still possible, when a contract is called the contract bytecode is loaded without incurring into any gas costs, this can be used to store and read data using bytecode (PUSH32 + MSTORE + RETURN see:
https://medium.com/@agusx1211/evm-istambul-storage-pricing-5befaac32403)

A better fix would be to find a way to make the SLOAD costs constant and not logarithmic.

I agree, the SLOAD execution cost can’t be determined by the trie size while ignoring all the other cases where the EVM has to perform exactly the same operation will lead to devs exploiting the system.

4 Likes

Related to the repricing, how much time should a node at most spend on computing a block? Asked differently, if it takes more than X seconds to compute a block, for which X would this cause issues for the network?

Initially, 1 GAS represented around 1’000ns on an unknown machine which was used to determine the initial gas cost, and the block gas limit was at 3.1415m (that number looks familiar…). So this seems to indicate that ~3 seconds were deemed acceptable to compute a block, but of course this might all be coincidence.

Today, we are often achieving 20m GAS/second on recent hardware using new versions of geth, and 10m GAS/second on the “ordinary” (ssd-powered) cloud instance out there, so ~1 second to compute todays 10m gas limit blocks.

Obviously, any X above or close to the target block time of 13 seconds is a certain show-stopper. But what about blocks which require 5 seconds on recent powerful hardware today? In the end it is about when the network starts to either slow down due to longer block times created by the miners or when nodes used by applications (e.g. Infura, but of course there are many other ones out there) can’t keep up anymore. Does anyone have experience or suggestions regarding the “minimum supported throughput on the minimum supported hardware”?

You’re right here: if updates happen often enough it would make sense to selfdestruct data contracts when you replace them, but in other contexts it would be worse. Adding in the code to ensure that only you can self-destruct your data is more expensive than prefixing it with the invalid opcode. In the case where you never update the data, there is no benefit to cleanup. In the case where reads happen sufficiently more often than updates, the differential read cost could overtake the write savings. As with GST2, expected gasPrice should play a role in your calculation.

Ooh, I hadn’t considered that.

Besides the block time, you also have to consider the block propagation time. A node will not pass along a block unless it is valid, and the network is not fully-connected, so a block must be validated by many nodes before all of the miners will adopt it. Block validation time is a key component of uncle rate.