EIP-7623: Increase Calldata Cost

Nerolation · February 14, 2024, 7:44am

By increasing the calldata cost for users that do not spend more than a certain threshold on EVM computation we can achieve the following:

Reduce maximum possible blocksize from ~1.7 MB to 0.55 MB without effects on current throughput.
Reduce block size variance
Reduce inefficienty from big gap between avg. block size and max. possible block size.
Make room for raising gas limit and/or blob count
Differentiate between users that need calldata inside the EMV vs pure DA.

This is achieved by increasing cost for DA users (who can move towards using blobs).

PR:
https://github.com/ethereum/EIPs/pull/8218

More info:

storm · February 19, 2024, 6:54pm

TOTAL_COST_FLOOR_PER_TOKEN seems like a good heuristic for penalizing call data heavy txs and nudging them toward blobs

to red-team this a bit, are there any circumstances where allowing larger amounts of call data is important or necessary for security? examples:

large number of users might need to submit fraud proofs for optimistic rollup(s) over a short period of time. what might that realistically look like in terms of total EL call data requirements? and is there any other situation where users might need to replicate lots of blob data into regular execution call data?
what are the other unlabeled points in the lower half of the transaction_dist.png in On Increasing the Block Gas Limit? are any of those use cases both 1) important and 2) difficult to convert to blobs?
has any bridge dev team encountered difficulty in converting their architecture over to blobs? what is a reasonable timeline to expect most/all of them to blobify?

MariusVanDerWijden · February 20, 2024, 10:48pm

Draft implementation in geth: [wip] core: implement eip-7623: increase calldata cost by MariusVanDerWijden · Pull Request #29040 · ethereum/go-ethereum · GitHub

One issue I came across is that the EIP is kinda underspecified wrt. the gas costs for contract creation vs normal transactions. A normal transaction costs 21000 + Tokencost * Tokens + evm_gas, a contract creation costs 53000 + Tokencost * Tokens + 2 * Initcode_Wordsize + evm_gas.

I’ve interpreted the EIP as follows:

tx.gasUsed = {
    21000 + 32000 * isContractCreation // 21000 or 53000
    + 
    max (
        STANDARD_TOKEN_COST * tokens_in_calldata + IsContractCreation * (InitCodeWordGas *  words(len(calldata))) + evm_gas_used,
        TOTAL_COST_FLOOR_PER_TOKEN * tokens_in_calldata
    )
)

Where IsContractCreation is 1 if the transaction is a contract creation and 0 otherwise.

Nerolation · February 21, 2024, 7:35am

The formula is applied for each transaction, so it doesn’t matter if multiple use have to execute transations with much calldata at the same time. They are only limited by the block gas limit. I don’t see scenarios where many regular users suddenly need to post 1MB calldata transactions.

Regarding 2, I’ll do further analysis on those outliers but I expect that 99.9% of all transactions using enough evm resources to “qualify” for the 16 gas calldata cost. Then this EIP would basically reduce the max possible block size without impacting regular users at all. Only DA via calldata should become more expensive, preventing the EVM having to deal with Inscriptions, or rollup data.

Regarding 3. I think some of the rollups will start using blobs from day one while the majority plans to shift in the following days/weeks. I’ve only seen a few announcing that they will not yet move to blobs. In the end, rollups know that this is coming and all of them are prepared.

Nerolation · February 21, 2024, 8:11am

Yeah, you’re right.
The initcode cost of currently 2 gas per word might be negligiable low compared to the calldata costs but it makes full sense to put them into the formula, treating it similar as the base cost.

One could even go one step further and adjust the formula to distinguish between CREATE and CREATE2 deployments:

tx.gasUsed = {
    21000 \ 
        + (32000 + InitCodeWordGas *  words(calldata)) * isContractCreation  \
        + isCreate2Creation * Keccak256WordGas *  words(calldata)
    + 
    max (
        STANDARD_TOKEN_COST * tokens_in_calldata + evm_gas_used,
        TOTAL_COST_FLOOR_PER_TOKEN * tokens_in_calldata
    )
)

I’d still keep the gas involved with contract creations outside the conditional part of the formula.

vbuterin · February 28, 2024, 8:06am

I would actually go for Marius’s interpretation!

My reasoning is that Keccak256WordGas and InitCodeWordGas are “actually” not data-related costs, but rather execution-related costs. Those costs were introduced because of issues that have to do with the expense of processing the CREATE and CREATE2 opcodes, and were added to transaction-level creates for symmetry. So they should be put in the same bucket as evm_gas_used.

I would even go so far as to put 32000 * isContractCreation into the same bucket as execution-related costs (since a contract creation by itself isn’t any heavier on calldata than a regular transaction), but I’m happy to go either way on that.

Nerolation · February 28, 2024, 3:21pm

I see! Based on that the formula would look like this:

tx.gasUsed = {
    21000 \ 
    + 
    max (
        STANDARD_TOKEN_COST * tokens_in_calldata \
           + evm_gas_used , \
           + isContractCreation * (32000 + InitCodeWordGas *  words(calldata)),
        TOTAL_COST_FLOOR_PER_TOKEN * tokens_in_calldata
    )

Ive skipped the CREATE2 part, so the difference to @MariusVanDerWijden approach is the 32k base cost inside the max().

I agree that the 32k contract creation could be put into the evm_gas_used side of the max() , contributing towards the standard token cost. Also, the CREATE opcode is different from the 21k base cost and one can argue that it must therefore be treated differently in the gasUsed formula.

chfast · March 15, 2024, 10:53am

How about we remove the special zero-byte cost from the formula as it make no sense?

qizhou · March 27, 2024, 9:46pm

Would the gas cost for EIP-2930 also need to be adjusted since the storage data cost under EIP-7623 is 32 * 68 = 2176, which exceeds 1900 in EIP-2930.

Nerolation · April 2, 2024, 3:56pm

With respect to snappy compression rates it does make sense, as many consecutive zeros can be compressed very well. The formula assumes that zero bytes are less expensive than non-zero bytes.
There’s more info on that here:
https://eips.ethereum.org/EIPS/eip-2028

Nerolation · April 2, 2024, 4:01pm

Could you elaborate on that?

The access list gas in charged in addition to the base tx cost, why would it be affected?
Or are you comparing them as a DA layer now?

Edit: We were thinking about 48 (instead of 68) as a better price anyway (because of merkle proof, post quantum crypto, etc.) and 48 happens to not be vulnerable to the scenario you describe.
It’s fixed in the current version of the EIP. Thanks!

wjmelements · April 4, 2024, 9:28pm

How will they publish to the logs without calldata?

qizhou · April 5, 2024, 6:09am

My point is that the tx access list in 2930 is charged independently compared to calldata cost (see go-ethereum/core/state_transition.go at 35fcf9c52b806d2a7eba0da4f65c97975200a2b2 · ethereum/go-ethereum · GitHub), but it takes the same block space as calldata. That means that if the tx access list is under-charged vs calldata cost, an attack will use the tx access list to create a larger block that circumvents the limit of EIP-7623.

For example, a storage key in the tx access list takes at least 32 bytes (let’s ignore the overhead of RLP encoding); thus, the gas cost per byte of the storage key is 1900 / 32 = 59.37. That said, if the calldata per gas cost is 68, then an attacker can still create a larger block size post-EIP-7623 by filling a large number of storage keys in the tx.

Reducing the gas cost of calldata per byte to 48 should alleviate the issue as using access list to create a large block is less cost-efficient than using calldata.

dror · April 9, 2024, 5:42pm

The motivation for this EIP looks great.
A side-effect is that it clears out the cost of a transaction.
It could be written just as “never charge less than 48*tokens_in_calldata”, but instead, it tries also to spell out the existing gas cost

What I miss in the document are some examples of the impact on TXs that use cpu-gas: That is, what kind of non-L2 transaction might get hurt by this change.

Nerolation · April 9, 2024, 8:55pm

Thanks for pointing that out. I could have linked to some more analysis here but didn’t (yet).

There’s this “post-4844” analysis that highlights the impact of the EIP on post-4844 Ethereum:

It also includes a site where the most commonly used functions are listed together with stats on gas usage and indicating if they’d be affected by this EIP.
Find it here:
EIP-7623 - Impact

In summary, there aren’t many non-DA use cases for big-calldata transactions. One of them are big zk-proofs like STARKs, as well as very large merkle proofs.
As visible in the above table, the number of transactions affected is very small and those non-DA that are affected are not impacted drastically (e.g. certain STARK transcation increase by ~30%).

The largest part of affected users are those attaching additional data to their transaction (messages). For them, the increase in gas cost is negligible as those messages are usually very small (and, to be fair, there are better ways to do messaging than using Ethereum L1 anyways).

Nerolation · April 11, 2024, 1:51pm

Quick summary/faq on EIP-7623

What?

EIP-7623 proposes to introduce a floor cost for calldata.
Transactions that use Ethereum mainly for DA will pay 12 (zero bytes) and 48 (non-zero bytes) gas per byte.

Why?

The main goal is to reduce the maximum possible block size form 3.5 MiB to 1.9 MiB (incl blobs).
Reduce history growth (theoretically as avg. block size might remain the same).

How?

The new formula to calculate the gas used per tx would be:

tx.gasUsed = {
    21000 \ 
    + 
    max (
        STANDARD_TOKEN_COST * tokens_in_calldata \
           + evm_gas_used \
           + isContractCreation * (32000 + InitCodeWordGas * words(calldata)),
        TOTAL_COST_FLOOR_PER_TOKEN * tokens_in_calldata
    )

with STANDARD_TOKEN_COST = 4,
and TOTAL_COST_FLOOR_PER_TOKEN = 12,

and tokens_in_calldata = zero_bytes_in_calldata + nonzero_bytes_in_calldata * 4 (in order to one-dimensionalize calldata).

Who would be affected?

Largest part: users putting messages into their transactions (negligible).
DA users (negligible because they have blobs).
Certain use cases such as STARK proofs, or very large merkle proofs that are heavy in calldata but don’t require much computation.
Simply speaking, every transaction that pays 2x on EVM computation than what it spends on calldata will be unaffected. Around 3% of transaction would be affected. Regular users who are sending ETH, tokens, swapping, etc. are unaffected.

Why floor ar 12?

This makes sure that access lists don’t become cheaper than calldata (otherwise one could again produce “big block” by filling access lists. (h/t @qizhou)

When?

proposed for inclusion in the Pectra hardfork

Useful links

wjmelements · April 26, 2024, 6:26pm

There is a problem with this floor gas pattern: the marginal cost of additional execution can be zero. This means it is free for a transaction with lots of calldata to package additional operations. This allows such transactions to sell this free gas on the market, which could cause all sorts of havoc. Gas should therefore never be structured this way.

shemnon · April 26, 2024, 7:00pm

Can you provide a worked example of how this might be done? I don’t think we need EVM code but a description of what the contracts would do should suffice.

wjmelements · April 26, 2024, 7:54pm

Suppose there is a protocol whose recurring transactions require lots of calldata but do little execution. Several L2s are in this category, but there could also be graffiti apps. Many of their transactions will have gasUsed based entirely on their calldata due to the max calculation. They can auction out additional CALLs or even AUTHCALLs to their periodic transaction, specifying a total gas limit according to their CALLDATASIZE. The marginal cost of these auctioned calls to the operator is only their additional calldata, which can even be zero if the auctioning mechanism is implemented on-chain.

So, a calldata operator has been forced to buy extra gas by this EIP, but they can sell it, and this can offset their gas costs.

The maximum price a reasonable buyer would pay for the auctioned gas is the base fee, but the operator should be willing to sell their gas for much less than the base fee to offset their costs, according to demand. They will sell it at any price because from their perspective, it is free gas.

wjmelements · April 26, 2024, 8:20pm

There is a parallel problem: a transaction with a lot of gasUsed but not much calldata could sell extra calldata. This additional calldata costs the gasUsed operator according to the STANDARD_TOKEN_COST schedule, so they could sell it to someone who would otherwise pay TOTAL_COST_FLOOR_PER_TOKEN.

So in summary this EIP creates a gas loophole that incentivizes heavy gasUsed operators who don’t use much calldata to pair up with heavy calldata operators who don’t have much gasUsed, combining into one transaction. It creates separate markets for both kinds of gas.