EIP 1884: Repricing for trie-size-dependent opcodes

holiman · September 17, 2019, 12:15pm

To clarify a bit on that (and sorry I didn’t answer earlier).
So keeping a db on the side is very nice, on paper. In reality, the problem is reorgs. So if you have a flat db, you are incapable of reverting a few blocks.

So what you wind up with is multiple layers of flat databases, where the bottom layer might be a couple of hundred blocks back. That one is on disk, and there are overlays in memory. To actually lookup a value, you need to investigate the in-memory layers first, in case the value has been changed in the last N blocks. Then eventually you hit disk and obtain the last stored value.

This is a promising approach, and somewhat of a necessity for the future new sync protocol which @karalabe is working on, but it’s still work in progress, and will probably not be a magic bullet so solve the lookup problem. It’s basically research at this point

jochem-brouwer · September 19, 2019, 6:38am

For consistency, can we rename BALANCE to EXTBALANCE to comply with the other EXT* opcodes and their non-EXT equivalents?

holiman · September 19, 2019, 7:49am

For consistency, can we rename BALANCE to EXTBALANCE to comply with the other EXT* opcodes and their non- EXT equivalents?

I’d love for that to happen… I didn’t want to do that right away in 1884, though, since it might give the impression of being a new opcode, whereas it in fact is just a UI thing, not related to the consensus rules.

axic · September 19, 2019, 12:53pm

EIP-1803: Rename opcodes for clarity tries to do all this renaming, but it doesn’t seem to be tied to hard forks.

epheph · September 19, 2019, 6:57pm

It seems this conversation has stalled, but it still seems extremely important. There is a strange, small incentive to use contracts for storage of even single word and an enormous benefit for 2 or more words. The gas savings becomes tremendous; loading (4) 32-byte words would cost 3,200 via SLOAD and 712 gas via EXTCODECOPY, ~80% gas savings.

If this is storage data shared with other contracts, retrieving this data will be even MORE cost-effective, since their CALL and EXTCODECOPY will cost basically same, but the CALL still has to SLOAD.

If these incentives remain out of alignment, I do expect another GasToken-like project to emerge (as @jochem-brouwer implies). These savings are even greater than GasToken: they save up around 80% (for 4 storage variables, reasonable number of state variables for many contracts) while GasToken is limited to 50%, they don’t require holding and minting balances for yourself or on behalf of users and don’t require you to provide an oversized GasLimit with each transaction

wjmelements · September 30, 2019, 8:17pm

EIP1884 should allow EXTBALANCE to cost the same as SELFBALANCE in the case the parameter is the call address, but this is not mentioned in the spec.

wjmelements · September 30, 2019, 8:54pm

Hello, I am an engineer who has been exploiting irregularities in the fee structure to save my customers gas. My project is TrueUSD, which has a market cap of $190m. Our tokens have wasted tons of EVM space (>30 MB so far) because of poor design decisions in the past. This looks like another.

I plan to exploit the following issues at scale if 1884 goes through as-is.

As others have pointed out, with this change is cheaper to read data from EXTCODECOPY than to read it from local state with SLOAD. Reading a word from data costs 800 per word while reading a word from code costs 700, plus 3 per word.

It is not only cheaper to read data from code, but also to write it. After the 32000 fixed cost of CREATE, writing a word costs 6400 in code but 20000 for state.

So, under this scheme, if a contract wants to update a group of fields about a user, and that group is larger than 2, they should use external code. If updating is sufficiently less-common than reading, then all data should be externalized into code.

Unlike the GasToken exploit, there is no incentive to clean up contract code used in this way, and there is no easy way to assess fees to the polluters. Punishing good behavior (SLOAD) could result in an explosion of the state space much sooner than expected, and without enough time to plan intelligent mitigation.

The fix for this while keeping the proposed cost increases is to also increase EXTCODECOPY cost per word to something reasonable like 800.

A better fix would be to find a way to make the SLOAD costs constant and not logarithmic. It is not good that this increase predicts future increases as the network grows. If SLOAD is not scalable then surely Ethereum is not either.

adlerjohn · October 1, 2019, 4:32pm

Could you clarify why this is? Selfdestructing a contract (a la GST2) results in a gas refund, just as clearing a word with SSTORE does.

Agusx1211 · October 1, 2019, 10:15pm

The fix for this while keeping the proposed cost increases is to also increase EXTCODECOPY cost per word to something reasonable like 800.

This would solve the unbalance when reading using EXTCODECOPY, but the exploit is still possible, when a contract is called the contract bytecode is loaded without incurring into any gas costs, this can be used to store and read data using bytecode (PUSH32 + MSTORE + RETURN see:
EVM Istanbul storage pricing. or how to hack the EVM to spend half… | by Agusx1211 | Medium)

A better fix would be to find a way to make the SLOAD costs constant and not logarithmic.

I agree, the SLOAD execution cost can’t be determined by the trie size while ignoring all the other cases where the EVM has to perform exactly the same operation will lead to devs exploiting the system.

MatthiasEgli · October 1, 2019, 10:52pm

Related to the repricing, how much time should a node at most spend on computing a block? Asked differently, if it takes more than X seconds to compute a block, for which X would this cause issues for the network?

Initially, 1 GAS represented around 1’000ns on an unknown machine which was used to determine the initial gas cost, and the block gas limit was at 3.1415m (that number looks familiar…). So this seems to indicate that ~3 seconds were deemed acceptable to compute a block, but of course this might all be coincidence.

Today, we are often achieving 20m GAS/second on recent hardware using new versions of geth, and 10m GAS/second on the “ordinary” (ssd-powered) cloud instance out there, so ~1 second to compute todays 10m gas limit blocks.

Obviously, any X above or close to the target block time of 13 seconds is a certain show-stopper. But what about blocks which require 5 seconds on recent powerful hardware today? In the end it is about when the network starts to either slow down due to longer block times created by the miners or when nodes used by applications (e.g. Infura, but of course there are many other ones out there) can’t keep up anymore. Does anyone have experience or suggestions regarding the “minimum supported throughput on the minimum supported hardware”?

wjmelements · October 2, 2019, 5:54am

You’re right here: if updates happen often enough it would make sense to selfdestruct data contracts when you replace them, but in other contexts it would be worse. Adding in the code to ensure that only you can self-destruct your data is more expensive than prefixing it with the invalid opcode. In the case where you never update the data, there is no benefit to cleanup. In the case where reads happen sufficiently more often than updates, the differential read cost could overtake the write savings. As with GST2, expected gasPrice should play a role in your calculation.

wjmelements · October 2, 2019, 5:55am

Ooh, I hadn’t considered that.

wjmelements · October 2, 2019, 6:04am

Besides the block time, you also have to consider the block propagation time. A node will not pass along a block unless it is valid, and the network is not fully-connected, so a block must be validated by many nodes before all of the miners will adopt it. Block validation time is a key component of uncle rate.

PaoloRebuffo · October 2, 2019, 9:24am

Given the success of ethereum it is likely that more “repricing” operations will be needed in the future.
I therefore wonder if it is more productive to focus on giving a price to states, ( to complete the work of Alexey Ahkunov) and drop this EIP.

jochem-brouwer · October 2, 2019, 1:39pm

This is not part of the spec. Balance gas will get bumped to the new gas amount even if it is called on the current address.

adlerjohn · October 2, 2019, 3:12pm

A common misconception, but a false one nonetheless. Clients only verify that sufficient PoW was done (i.e., they only validate the header) before propagating the block.

github.com/openethereum/parity-ethereum

Propagate new blocks after verifying PoW but without waiting for them to be imported

opened 02:08PM - 17 Sep 18 UTC

closed 10:30AM - 28 Nov 18 UTC

AlexeyAkhunov

F7-optimisation 💊 M4-core ⛓

- **Parity Ethereum version**: all versions - **Operating system**: all operati…ng systems - **Network**: ethereum / ropsten Go-etherum currently propagates new blocks straight after it checks that Proof Of Work is valid. This makes block propagation speed independent of how fast the clients are able to process them fully. Current version of Parity imports blocks (which includes executing all transactions, computing the new state root, and committing the new state) before it propagates them. This issue requests the change so that Parity also propagates after PoW check. Propagating after PoW check does not present DoS attack vector, because producing a block with valid PoW is quite expensive, so they cannot be created in large enough quantities to enable such an attack. Miners will have to properly import the blocks before building on top of them, therefore, this change will not lead to invalid block branches growing.

wjmelements · October 3, 2019, 8:14am

That seems unfair to existing contracts.

wjmelements · October 3, 2019, 9:38am

I have submitted a proposed adjustment to EIP 1884 that would solve the EXTCODECOPY issue and the SELFBALANCE discrepancy.

dominic · October 12, 2019, 4:08am

I find the argument to increase the gas pricing because of how one concrete implementation performs in syncing time highly questionable. I suggest to completely drop this EIP and define a new strategy for gas pricing / gas pricing changes.

Both operations SLOAD and BALANCE are key value lookups whose complexity solely depend on the data structure chosen by the client implementing it. There is no reason why geth couldn’t adopt a backing data store or just an index with O(1) lookup times for these key value pairs. Does anyone know why geth is not considering an O(1) hash table for SLOAD and BALANCE?
While operating a full sync is an important operation I do not see why it should be the measure for EVM pricing. IMHO normal day-to-day operation of a node should be there reference and not the syncing time. During normal day-to-day operation EVM execution and PoW happen back to back with PoW being probably 99% of the CPU resources burnt and EVM 1% (Sorry this is just a guess have not found reference values on this). The strategy to price EVM operations thus should be the long term effect on the total size of the state tree and only to a lesser degree the CPU time associated with executing the EVM – at least as long as the PoW CPU time is so much bigger.
Making changes to the EVM gas pricing should only be considered if there is a consensus between the client implementations. If we’re seeing that all client implementations have the same performance disparity on a certain operation it would make sense to start the process to change the operation cost - under the same hardware setup at least the big clients aleth, parity, geth (others?) should be checked before drawing any conclusions.

holiman · October 12, 2019, 4:20am

Geth is considering it, and working on it. Parity doesn’t have it either, and none of the other clients do, to my knowledge. It’s not a trivial problem to solve. If it indeed becomes solved (or at least improved) in future iterations, and it’s deemed possible to lower the limits again, that would be wonderful.

Actually, verifyign the PoW on a block takes somwhere between 5-15ms (depending on machine, of course), and verifying the full block (verifying all txs, running all executions) take up somewhere around 200-300ms. It’s mainly disk io that’s the bound – again, depending on machine. Sometime in the future, when we have Optane DC ram, it might be a different case.

My benchmarks have been published for over half a year, nobody has challenged it with drastically different measurements made on another client. I made these on geth, because I’m a geth-developer. Nobody from parity/trinity/nethermind/besu at any time offered the opinion that “No, that’s just geth performing badly, we don’t see this as an issue”.

This is most definitely a cross-client concern.