EIP-7609 reduce transient storage pricing

charles-cooper · February 1, 2024, 1:39pm

discussion for: EIP-7609: Decrease base cost of TLOAD/TSTORE

charles-cooper · February 1, 2024, 10:38pm

I ran some crude benchmarks using revm. The tools and methodology are below, but the tl;dr is:

plain dup2+mul 10k times ~20ns / operation
mload of the same address 10k times ~12ns / operation
tstore of the same address 10k times ~25ns / operation
tstore of 10k different addresses comes ~40ns / operation
tstore + tload of 10k different addresses ~70ns / operation
sha3 costs about 500ns / operation (sha3 of a 32 byte buffer)

script to generate evm scripts: generate benchmarks for transient storage · GitHub
revm script: feat: add evm script by charles-cooper · Pull Request #1039 · bluealloy/revm · GitHub
i want to point out that the summary numbers are estimates, since there is some jitter from stack operations which throw off the numbers a bit (however, i do think they are more or less in the correct ranges). i added two control scripts which do some stack fiddling to mimic what the later operations do, so their timings can be subtracted from the relevant scripts.

results (from revm/bins/revm-test, running for file in *.evm; do cargo run --release --bin evm $file; done):

    Finished release [optimized + debuginfo] target(s) in 0.15s
     Running `/home/charles/src-references/revm/target/release/evm /home/charles/src-references/EIPs/benchmark_control2.evm`
Run bytecode (3.0s) ...              108_523.678 ns/iter (0.992 R²)
    Finished release [optimized + debuginfo] target(s) in 0.12s
     Running `/home/charles/src-references/revm/target/release/evm /home/charles/src-references/EIPs/benchmark_control.evm`
Run bytecode (3.0s) ...              222_961.624 ns/iter (1.000 R²)
    Finished release [optimized + debuginfo] target(s) in 0.12s
     Running `/home/charles/src-references/revm/target/release/evm /home/charles/src-references/EIPs/benchmark_easy_mload.evm`
Run bytecode (3.0s) ...              126_342.356 ns/iter (1.000 R²)
    Finished release [optimized + debuginfo] target(s) in 0.12s
     Running `/home/charles/src-references/revm/target/release/evm /home/charles/src-references/EIPs/benchmark_easy_tstore.evm`
Run bytecode (3.0s) ...              370_559.161 ns/iter (1.000 R²)
    Finished release [optimized + debuginfo] target(s) in 0.13s
     Running `/home/charles/src-references/revm/target/release/evm /home/charles/src-references/EIPs/benchmark_sha3.evm`
Run bytecode (3.1s) ...            4_947_213.996 ns/iter (1.000 R²)
    Finished release [optimized + debuginfo] target(s) in 0.12s
     Running `/home/charles/src-references/revm/target/release/evm /home/charles/src-references/EIPs/benchmark_tload_pure.evm`
Run bytecode (3.0s) ...              215_596.323 ns/iter (0.997 R²)
    Finished release [optimized + debuginfo] target(s) in 0.12s
     Running `/home/charles/src-references/revm/target/release/evm /home/charles/src-references/EIPs/benchmark_tstore.evm`
Run bytecode (3.0s) ...              619_875.709 ns/iter (0.999 R²)
    Finished release [optimized + debuginfo] target(s) in 0.12s
     Running `/home/charles/src-references/revm/target/release/evm /home/charles/src-references/EIPs/benchmark_tstore_tload.evm`
Run bytecode (3.1s) ...              936_628.376 ns/iter (0.998 R²)

MariusVanDerWijden · February 15, 2024, 4:14pm

I created some tstore/tload state tests here: tloadstore_opcode_statetests.json · GitHub

Results:
Tload: 46ms
Tstore: 205ms
For comparison, other tests I created on the same machine:
Push0: 86ms
Mstore8: 107ms
Mstore: 125ms
Sstore: 208ms

So it looks to me that tstore is priced correctly, tload is priced a bit high, but still in line with other opcodes

Rjected · February 15, 2024, 7:17pm

are state tests a “good” or rigorous way to benchmark opcodes? how should we interpret / compare these results with @charles-cooper 's observations?

charles-cooper · February 16, 2024, 2:33am

@Rjected i found there are a couple slightly tricky things to get right with the benchmarks:

need to pop the result of TLOAD, otherwise the transaction can revert early (stack >1024 items) and time gets biased downwards
tload from empty locations when the transient storage map is small or empty biases time downwards since the logic is if key not in map: return 0 and the existence check is very fast
repeatedly tloading from the same location when the transient storage map is small biases time downwards because lookups from small maps are in general faster than lookups from large maps

what i did to address these issues in the benchmarks was to issue tload of an address after tstoring it.

so given those caveats, marius’s benchmarks look about right to me. the sstore benchmark seems too fast though - maybe it’s not physically writing to disk?

charles-cooper · February 16, 2024, 2:38am

i am not sure the coefficient in the EIP needs to be 3, either. with a coefficient of 1 we still get DoS protection (a single map maxes out under the current gas limit at 7738 slots ) but we get more breathing room for the “small maps” - 92 items before becoming more expensive than the current TSTORE rather than 30.

charles-cooper · February 29, 2024, 2:56pm

i updated the coefficient here: Update EIP-7609: reduce SLOPE coefficient in eip-7609 by charles-cooper · Pull Request #8272 · ethereum/EIPs · GitHub

gumb0 · April 30, 2024, 4:49pm

Why is writing to existing slot 0 gas? EIP-1153 argued, that it definitely should be above MSTORE price because of interaction with reverts:

Gas cost for TSTORE is the same as a warm SSTORE of a dirty slot (i.e. original value is not new value and is not current value, currently 100 gas), and gas cost of TLOAD is the same as a hot SLOAD (value has been read before, currently 100 gas). Gas cost cannot be on par with memory access due to transient storage’s interactions with reverts.

gumb0 · April 30, 2024, 4:51pm

Nevermind, it’s 8 gas for existing slot, not 0.

xinbenlv · May 9, 2024, 3:29pm

Coming from EIP for nonreentrant opcodes - #7 by xinbenlv

@charles-cooper do you know where is the best place to see the rationale from prior discussion why EIP-7609 was not prioritized?

charles-cooper · May 9, 2024, 3:32pm

None was given to me. It was barely mentioned in Execution Layer Meeting 185 · Issue #997 · ethereum/pm · GitHub, and it was not actually discussed on the call.

xinbenlv · May 9, 2024, 3:35pm

Got it! Probably lack of attention. Just like the EIP-3074 waited for multiple years before they got proper attention.

I was on the ACD call and heard you in the last 3min. That was a good one to draw attention. Not that I have a voice in core development but I think this EIP has merit for smart contract developers. I will bring it up in discussions with other as I see fit, @charles-cooper . Thank you for drafting and driving this proposal

benaadams · May 9, 2024, 3:56pm

Would support reduction; however WarmStateRead price does make sense?

Otherwise pattern will be SLOAD → TSTORE → TLOAD, TLOAD, TLOAD, TLOAD

Rather than SLOAD, SLOAD, SLOAD, SLOAD, SLOAD

AccessLists aside where there is a state access hit; perhaps warm storage read (SLOAD, TLOAD) is overpriced?

charles-cooper · May 9, 2024, 5:50pm

There is an argument to be made that warm storage read is overpriced; but also DOS with the size of the in-memory map for warm storage is naturally prevented by the high cost of the initial cold storage load/store. From the EIP:

As a comparison point, the total amount of memory which can be allocated on a client by SSTOREs in a given transaction is 30_000_000 / 20_000 * 32, or 48KB.

Transient storage does not have the same “protection”, this is why we consider a different DOS prevention mechanism.

LukaszRozmej · May 9, 2024, 6:05pm

Not sure about other clients code, but for Nethermind it is exactly same code as for warm SLOAD and TLOAD, so pricing them the same makes sense.

charles-cooper · May 9, 2024, 7:15pm

Honestly I see no issue with pricing warm SLOAD and SSTORE at the same proposed base cost as TLOAD / TSTORE (currently 5 and 8, respectively)

benaadams · May 10, 2024, 12:54am

Honestly I see no issue with pricing warm SLOAD and SSTORE at the same proposed base cost

SSTORE is more complicated even for warm (including working out the price of SSTORE ) and it is inherntly a different beast so would probably want to leave that alone.

Reducing pricing for SLOAD would probably have to keep same current price for an access list “warm” load as it is incurring the cost of a cold load, and would have potental to open that up as a DoS vector (and have to consider potential block targetGas increases so as not to create a corner that needs repricing up to get out of)

Both TLOAD and SLOAD are more complex than MLOAD due to hashing vs array access; but you have gone with a higher price.

For TSTORE prehaps including it in the memory expansion cost at a x2 rate; would be an idea as it is storing more than MSTORE does (key+data)

charles-cooper · May 10, 2024, 1:55am

I’m not convinced by the way that TLOAD is substantially more complex than MLOAD.

Yes, there is a hash performed, but that is super cheap. Meanwhile, reads and writes from memory are always doing conversions to big-endian on the way in and little-endian on the way out. Depending how bigints are implemented on your system, TLOAD can be implemented by pointer copy; MLOAD requires allocating a new 32-byte item. It’s probably about the same cost as hashing in the end (xor’ing four 64-bit numbers together, another xor for salt and then mod by some prime number vs bswap64 four times and then writing out 32 bytes back to memory, plus a stack item allocation). The big cost with hashtables is probing when there is a collision, but this can be reasonably dealt with by using sufficiently low load factor. I will try to get a more “fair” comparison with MLOAD/MSTORE from nonzero memory. In the case where memory is larger and doesn’t fit in a single cache line, I suspect they are substantially similar in performance to TLOAD.

I have similar thoughts about TSTORE. It’s three writes in the worst case (in the revert case, it writes to the main map, the journal, and then the main map again), which is well accounted for by the extra 5 gas compared to MSTORE, and honestly I think I’m actually being overly conservative – I think each hashmap write costs about 1 (mayybe 2) gas of CPU time, so it could just as well be priced at 5 base gas. But maybe we could use a little more data here on how MLOAD/MSTORE fares on larger, nonzero memory chunks.

wjmelements · May 10, 2024, 1:57am

How so? The revert behavior is the same. Once warm, we have already paid the access cost. The behavior is not conceptually different after that point. This is why warm SSTORE, warm SLOAD, TLOAD, and TSTORE are currently priced the same, at 100 gas. From EIP-1153:

Gas cost for TSTORE is the same as a warm SSTORE of a dirty slot

benaadams · May 10, 2024, 10:53am

SSTORE pricing depends on multiple factors and refund accounting as well as if value is actually changing so isn’t a straightforward +100 gas. It has been heavily anaylised and justified; so I’d consider repricing it out of scope or a require much deeper analysis, flow and exhaustive list of test cases costs with prices.(as per EIP-2200: Structured Definitions for Net Gas Metering and EIP-3529: Reduction in refunds)

Which isn’t a valid justification for repricing SSTORE; its saying what the price of TSTORE should be.