EIP-2028: Transaction data gas cost reduction

Ok I see it, just approved this post by @guthlStarkware (flagged by Discourse for some reason). Perhaps it was “preliminary results” that seemed too business-like for the spam filter LOL.

1 Like

July 19 is the deadline for clients to have EIPs implemented, not for the EIP to be complete. I know this is water under the bridge and an audible was called but the intent was that the May deadline was when this data should have been available.

How much sooner than 19 July can this data be made available?

Hahaha I will know better next time :slight_smile:

Later this week we’ll share what we currently have, and will keep doing so as data emerges from the various simulations.

Disclaimer: I cannot post more than two links per post as a new user, therefore the multiple posts.

EIP-2028 Standalone Analysis

TL;DR - We present our simulation plan and explain why we focus only on network delay and history. The plan is available here

Introduction

When EIP-2028 was introduced, we discussed two kinds of test-cases:
(1) measuring network effects of increased calldata
(2) measuring local effects on a single node.

Here’s a brief update on both cases. For item (1), we collect data from Ethereum’s mainnet. We shall report our preliminary findings early next week.
To corroborate our findings, we’ve partnered with WhiteBlock to simulate the effect of reducing calldata gas cost and increased block size on important parameters such as network delays, and uncle rate. Our simulation plan is available here and is already running.

The rest of this post discusses (2), the effects on a single node. The purpose of this analysis is to understand how calldata was priced at the network launch.

Update on Local Tests (Standalone Node) and Gas Cost Comparison

The main goal of performing local tests is to understand how the decrease in calldata gas cost would impact the system, assuming the network has zero delay (the effect on network delays is investigated separately). Recall that each node must validate the header of each new block and then, process the block. We want to ensure that the implementation of this EIP does not harm the network: for example, that the mere increase in blocksize does not cause significant problems in computing block headers and increase processing time, even if we assume zero network delays.
Recall that the current cost for calldata is 68 gas for each non-zero byte and 4 gas for each zero byte.
An increase in headers validation would impact security whereas an increase in block processing, all other things being equal, would impact throughput.
A quick gas comparison is interesting here:
When a validator receives a block, it computes the Keccak hash of the whole block to verify that the header used for the PoW is correct.

For a calldata input of size N bytes, if this verification operations were done in the EVM, the gas cost would be 30 gas for the first invocation of keccak and 6 gas for each additional word (32 bytes), totaling 30+6 * N / 32 gas. Asymptotically (neglecting the first 30 gas) this gives a gas price of roughly 6 for every 32 bytes, or 0.2 gas per byte. So, under current pricing, transmission and history for nonzero bytes accounts for all 68 gas.

One could argue that Keccak might be underpriced too. This is debunked by Holiman vm analysis, based on a historical analysis of opcode runtime over Ethereum lifetime. According to this work, the Keccak opcode’s gas price is on par with its processing time.
Our local measurements on our machine go along with this analysis.

So far, we compared calldata to computation cost. One could also try to compare calldata and storage. We tried but failed to find similar analogies there. This goes along with several other reports on SLOAD and SSTORE, stating that their pricing is complex and must be understood as a design choice for the network future 1

To conclude, assuming zero network delay, the fair and consistent pricing of calldata has a lower bound at 0.2 gas per byte. Practically, this means that the new gas cost recommendation should be based on its effects on network delay and history. This is also what was done in the original system design.

This was pointed out by @AlexeyAkhunov on today’s ACD: this EIP only tries to change the txdata cost and not the cost for CALLs between contracts.

Would it make sense changing the title of this EIP to “Transaction data gas cost reduction”?

  1. Can you rename to something like `“Transaction data gas cost reduction”.
  2. Can you also run simulations for the case where the same cost for zero and non-zero bytes?

It would make sense, the new title being self-explanatory.
Where do we need to change it?

Let me discuss it with WhiteBlock.
On a side note, I’m still looking a good reason for the distinction in the first place and would be pleased to discuss it.

File a PR changing the EIP, it should be merged automatically. Please also update the title of this forum topic, if it won’t allow it for you, @jpitts can help.

I believe the reasoning for pricing zero bytes differently was that they might be compressed easier. But considering that these numbers do not closely reflect CPU times, it was rather premature optimization.

If there is a chance to simplify this with this EIP I’d go for it.

@axic @chfast good suggestion - changed it. Thank you

We published a Geth reference implementation for EIP2028: https://github.com/liorgold2/go-ethereum/pull/1.

Note that currently, in Geth, there is an artificial (i.e., non-consensus based) limitation of 32KB on the size of each transaction, and this bound needs to be increased to be consistent with this EIP. A pull request for such an increase already exists in https://github.com/ethereum/go-ethereum/pull/19618 and our new reference implementation for EIP 2028 relies on it (using a limitation that is appropriate to the new gas cost of call data). We are now working on the Parity implementation.

Notice that the gas cost currently stated in the reference implementation (16 gas per byte) may change once we finish analysing the simulation results which are still being collected.

2 Likes

Here’s the parity version of EIP 2028, for testing purposes. Notice that the transmission gas cost there (16 gas/byte) may be updated once we finish analyzing our test data (which we’ll publish next week).

Transmission Gas Cost Reduction – What Ethereum’s Mainnet Has to Say

TL/DR

Analysis of block size vs. uncle rate on Ethereum, including our high-load test conducted on Mainnet on Monday July 15 2019, supports EIP 2028 and the new gas cost of 16 gas per byte. In fact, this data suggests that gas cost can be further reduced in the future.

Introduction and the Parity update of March 2019

As we explained in the EIP 2028 description, reducing the gas cost of transmission will allow larger blocks and this might affect network security, as measured by uncle rate. To start the discussion, lets review on Etherchain the correlation between average daily block size and uncle rate.

The Parity update

In March 2019 Parity modified its block propagation method. Prior to the change, a Parity node would process a block before sending it on the network. After the change, Parity nodes joined other nodes (Geth, pantheon, Nethermind) in transmitting blocks before processing them, once the following three conditions were met: (1) Block body matches the block header (txHash, uncleHash), (2) Proof Of Work in the header is valid, and (3) the block is on the longest PoW chain. The actual processing happens concurrently to this but does not prevent block propagation. This change caused a dramatic 3x reduction in uncle rate, and unlinked the linear correlation between block size and uncle rate that was evident before the change. More details are available in Vitalik’s tweet.

The plot below shows this dramatic effect. The pre-March 2019 points are colored blue and the post-March 2019 ones are colored red (the big red dot at the bottom right corner will be discussed later).


This plot originated from https://etherchain.org/correlations.

The Big Red Dot

StarkWare is responsible for the Big Red Dot, the increased average block size on Monday July 15 2019. (All our transactions originated from this address). That day also holds the Guiness World Record for largest average block size in Ethereum’s history, as seen clearly on the next plot taken from Etherchain. We explain the experiment that caused it, and what we learned from it, next.

The Experiment - A Bunch of Zeros

Not all bytes are equal on Ethereum. In particular, a zero byte costs only 4 gas (compared with 68 gas for all non-zero bytes). This allowed us to generate bigger blocks that have a low gas cost. Ethereum uses snappy compression so we had to be a bit careful to create blocks that have many zeros (and low gas cost) but cannot be compressed significantly (no more than roughly a factor 2x compression), though their gas cost was 5x lower than that of typical (or random) data.

On Monday July 15 2019, we generated a large number of such transactions and submitted them to Ethereum’s mainnet, to measure the effect of larger blocks on uncle rate. We created roughly 2,000 blocks that varied in uncompressed size between 45-534 KB. Here’s the first block in our experiment, here’s the last one and this is the largest one. But, as the the big red dot in the plot above shows, this long period of large blocks had little noticeable effect on the daily uncle rate.

Taking 20% daily uncle rate as an upper bound on acceptable uncle rate, we believe that gas cost can be reduced even below 16 gas per byte. This is the take-away message of the Big Red Dot. But now lets take a look at the experiment in greater granularity and see what we learn from it.

Big Red Dot in finer granularity

Here is a plot of the block size from recent days. The big peak towards the right comes from the blocks of our experiment (and prior peaks are our trials a few days earlier).

To parse this data at finer granularity, we took a small time window of 10,000 blocks and broke it into buckets of 50 blocks (roughly 11 minutes). The next plot shows uncle rate as a function of uncompressed block size. Bear in mind that the average uncompressed blocksize is 23KB, compressed to an average of 15KB. We maintained a similar compression ratio (of roughly 2x) for the blocks in our test, even though we managed to generate blocks that are 22x larger than the average (max-ing at 535 KB uncompressed). The buckets of 50 consecutive blocks during our experiment were up to 7.5x bigger than average. As can be seen, there is little to no effect on uncle rate per bucket.

A block brother is a block that is not part of the main chain, but sits as the same height as another block (i.e., brothers are uncles, but attributed to individual mainnet blocks). The next plot shows brother count as a function of individual block size. Inspecting this plot, we still see no adverse effect of increased block size, even when average block size is larger than the average block size (of 25 KB) by a whopping factor of 22x.

Summary

To conclude, we see no adverse effect of increasing today the average block size by a factor of 7.5x, nor of increasing the individual blocksize by a factor of 22x. Based on this, the conservative choice of reducing transmission gas cost by a factor of roughly 4x, to 16 gas per byte, is well-founded. Our code modifications for Parity and Geth have been posted earlier.

8 Likes

Great work!

To make L2 scaling (i.e. stateless or state-minimized contracts with data stored off-chain and only a 32-byte stateroot stored in the contract) competitive with storing data on-chain using SSTORE, the cost originally suggested by @Recmo was “100 gas per word” or 3.125 gas per byte (see summary post here and proposal here).

The number is based on the size of a merkle proof assuming a depth 32 binary trie and 32-byte hashes (total bytes = trie_depth * hash_size = 1024 bytes). At 16 gas per byte the data cost of a proof would be 1024 * 16 = 16384 (brought down from 1024*68 = 69632), but 16384 is still not competitive with 5000 gas for an SSTORE.

Proofs can be batched/de-duplicated with a varying amount of reduction (depending on where the accounts are in the trie), so in some circumstances 16 gas per byte might be cheap enough to be competitive if proofs are batched.

2 Likes

To bring the last part of the discussion about the EIP, here are our history and sync analysis after the hardfork.

The effect of reduced transmission gas cost on blockchain history accumulation and full nodes

EIP 2028 reduces gas cost of data 4x (to 16 gas/ byte) and we believe this is safe in terms of its effect on network security (as argued here). A different concern might be the effect on blockchain history accumulated by full nodes and archival nodes that maintain this data. On this matter, we believe our conservative 4x gas reduction is safe, for the following reason.

Current accumulation rate

Let’s start by examining the average growth in history per block on full nodes and on archival ones. If you take the total growth seen in the month starting on May 17 2019 and ending on June 17 2019 (on Etherscan) and divide it by the number of blocks, you get that each block adds roughly 100 KB to a full node and 1 MB to an archival node. Over the same time period, the average (and median) block size was roughly 23 KB, so we see that transmission accounts for no more than 25% of history accumulated on a full node and no more than 2.5% of the history accumulated on an archival one. (Calldata requires the same storage volume on both full nodes and archival ones.)

The impact of EIP 2028 on accumulation rate

EIP 2028 reduces transmission gas cost roughly 4x, so it will likely increase block size by roughly 4x. Assuming all other storage components remain the same, according to this calculations an average block size will add roughly 175 KB to a full node, less than doubling the history accumulation rate. For archival nodes the history accumulation rate will increase by roughly 10% to roughly 1.1 MB.

The impact of EIP 2028 on sync

To best of our knowledge, the main bottleneck in syncing is due to state updates (see here for background on syncing). Therefore, even in the worst case scenario of a 2x increase in the accumulation rate of full nodes due to larger block size, we do not anticipate significant increase in sync time. In fact, since EIP 2028 will likely move stuff from storage to transmission, we conjecture that our EIP will improve (decrease) sync time.

Conclusion

Since Ethereum’s mainnet was launched in July 2015 (4 years ago), storage capacities per dollar have more than doubled. Therefore we believe that a worst case 2x increase in the storage needs of full nodes, and a 10% increase for archival nodes, is reasonable.

2 Likes

Thanks!
As we explained, we remain conservative in our approach. Since the timeline before Istanbul is quite short, we want this first modification to go through and if necessary, push further the reduction in the next one.
16 gas is already a large reduction and our data shows that it does not impact the security.

Ideally, we would not apply all of this gas reduction to the current transaction type, given that we need to incentivize the switch to the EIP-1559 transaction type.

As I understand it, this is already being discussed with

1 Like