EIP-4844: Shard Blob Transactions

Hi folks, big fan of this EIP and the direction this takes Ethereum in!

I had one question/suggestion: The proposed KZG commitment is quite useful for rollups that execute inside the EVM, but one other use case for guaranteed data availability of large off chain, short lived blobs is in enabling a new class of SNARK based applications. In particular, blobs (in theory) enable the usage of off-chain blobs as hidden inputs to a SNARK circuit, and the circuit can attest that the hash of the blob matches the on-chain commitment and has valid properties/transformations. I can think of many applications of this, as an example, here’s an idea made possible by this:

You store params for a trusted setup ceremony in a blob, and a guaranteed hash of these on-chain. Every time a new participant wants to be part of this ceremony, they use the off-chain blob to generate the new params and put them in a new blob, along with a snark proof that their contribution transitions correctly from a blob that hashed to the previous on chain hash to the new hash.

Of course, there are many other classes of such applications that could benefit from such a data availability model that would be ideal for running inside a SNARK (thanks to their succinctness properties). However, this brings me to my suggestion: Add an option to store the on-chain commitment in a SNARK friendly method. KZG commitments require a pairing check which is notoriously hard to implement in a SNARK with the current most popular schemes (Groth/PLONK) since constraints blow up with each “bigint” operation. So, adding a SNARK friendly commitment would enable for more practical SNARK based applications. I would suggest the option to add a merkle tree using a SNARK friendly accumulator function (such as Poseidon/MIMC). Of course, at the end, it is a question of whether or not the added complexity to the interface and the difficulty of implementation is worth it for enabling such applications (that are relatively unproven), so I’d be curious to hear thoughts and considerations of that.

Thanks!

P.S. This is my first time commenting here :slight_smile: Let me know if there’s anything i should add/detail.

2 Likes

KZG commitments are actually very SNARK friendly. The trick is that you don’t take the “naive approach” of actually trying to verify the KZG inside the SNARK directly. Instead, you just directly use the KZG point as a public input (this allows you to directly access everything inside the KZG in systems like PLONK). If you want to make a SNARK over something other than the BLS-12-381 curve or over a different trusted setup, then there is a proof of equivalence protocol that allows you to make another commitment $D$ that is compatible with your SNARK protocol, and prove that $D$ and the KZG commit to the same data.

4 Likes

This is a great proposal. I have a question that others may not be particularly concerned about: how does EIP4844 (and Full Sharding after that) guarantee that nodes retain blob data for a specific amount of time (say a month)? I see in the EIP that blob data is deleted after 30 days, but there is no specific scheme to guarantee this

deleted after 30 days, but there is no specific scheme to guarantee this

Perhaps you mean the guarantee that it is stored for at least 30 days. Node operators can choose when they actually purge / delete. Vitalik’s FAQs mentions cases where some will retain data longer.

Very nice proposal! Just a few questions for clarification:

Beacon chain validation

On the consensus-layer the blobs are now referenced, but not fully encoded, in the beacon block body.

How the blobs in beacon blocks are referenced? Would beacon blocks also include the Tx or other data structure to reference?

Following the previous question - since the blocks of CL and EL are produced asynchronously, what is the expected sequence of including a tx-without-blob in EL and referencing blob in CL? Further, how could we ensure that both EL and CL do the correct work (e.g., a tx-without-blob is included in EL, but no such blob is referenced in CL?)

We add an opcode DATAHASH (with byte value HASH_OPCODE_BYTE ) which takes as input one stack argument index , and returns tx.message.blob_versioned_hashes[index] if index < len(tx.message.blob_versioned_hashes) , and otherwise zero. The opcode has a gas cost of HASH_OPCODE_GAS .

Was a name other than DATAHASH considered? I think it is too similar to CALLDATA as well as potentially accessing the “data” portion of an account code could have such opcodes.

I’d propose to use something akin to BLOBHASH or TXBLOBHASH to start utilising a TX prefix. In connection to EIP-1803: Rename opcodes for clarity we discussed prefixing opcodes according to their role (BLOCK, TX, …)

def point_evaluation_precompile(input: Bytes) -> Bytes:
    # Verify P(z) = a
    # versioned hash: first 32 bytes
    versioned_hash = input[:32]
    # Evaluation point: next 32 bytes
    x = int.from_bytes(input[32:64], 'little')
    assert x < BLS_MODULUS
    # Expected output: next 32 bytes
    y = int.from_bytes(input[64:96], 'little')
    assert y < BLS_MODULUS
    # The remaining data will always be the proof, including in future versions
    # input kzg point: next 48 bytes
    data_kzg = input[96:144]
    assert kzg_to_versioned_hash(data_kzg) == versioned_hash
    # Quotient kzg: next 48 bytes
    quotient_kzg = input[144:192]
    assert verify_kzg_proof(data_kzg, x, y, quotient_kzg)
    return Bytes([])

The precompile uses little endian byte order for certain inputs. Currently the execution layer exclusively uses big endian notation, while the consensus layer (beacon chain) uses little endian. While this proposals makes this data opaque to the EVM (to be just passed through to this precompile), it feels like trading consistency of the execution layer in favour of consistency within the consensus layer.

I do not have any proposed solution here, just interested in opinions and views around the reasoning for this.

Furthermore, I think the description of the precompile is not entirely clear. I assume the assert statements if hit, will result in an OOG outcome (i.e. consume all passed gas in the call). While the successful run will not result in an OOG case, but will return 0 bytes (i.e. returndatasize will equal 0).

Furthermore it is unclear whether it pads inputs with zeroes, or would signal failure if the input is not exactly 192 bytes.

Is that correct?

The closest precompile to this is ECPAIRING (EIP-197: Precompiled contracts for optimal ate pairing check on the elliptic curve alt_bn128) which returns U256(0) or U256(1) depending on the outcome. I think it is nice to avoid the need for checking return values if this can be delegated to only checking the success value of CALL, but nevertheless it is something breaking consistency with other precompiles. In this case likely it makes sense however.

Suggestion is to just clarify the description in the EIP.

class ECDSASignature(Container):
    y_parity: boolean
    r: uint256
    s: uint256

Nice to see that the new transaction format replaces the RLP transaction encoding with SSZ and moves chain_id to be a field allowing for using boolean for `y_parity.

There’s potential for some optimisation here at the expensive of some clarity: EIP-2098: Compact Signature Representation

Was this discussed yet?

Edit: the parity field seems to have a 1 byte overhead so this is neglible.

intrinsic_gas = 20000  # G_transaction, previously 21000

If the intent is to change G_transaction from 21000 to 20000 that should be a separate EIP that includes rationale for the change just like every other gas cost change.

If the intent is to have this specific transaction type have a different intrinsic cost from other transactions then this EIP should include rationale on why these transactions have a lower base operational overhead than other types of transactions.

If the intent is to try to incentivize certain behaviors, we should not be hijacking our operational cost pricing mechanism to try to achieve that.

Went ahead and reverted that change: fix(4844): leave G_transaction at 21000 by gakonst · Pull Request #5636 · ethereum/EIPs · GitHub

1 Like

Great proposal! Quick question, @protolambda , why does the signature cover the blob-data, instead of only covering the commitment(s) to the blob-data? It seems that for Danksharding, validators will not be receiving all of the blobs (only the commitments), yet they’ll still have to verify the signatures.

I made this for myself, just so I can anticipate what this will do to the nodes running on my machines. Sorry for any mistakes. I’ll correct it if there are any. Please let me know.

Thought I’d share:

Can we add something like below from 2718 to the beginning of the specification section to define the || operator? It’s used in several places in the EIP and can be interpreted in a variety of ways, depending on which programming language(s) one happens to be familiar with.

Definitions

  • || is the byte/byte-array concatenation operator.

Moving the discussion from Clarify & rationalize blob transaction hash computations in EIP-4844 · Issue #6245 · ethereum/EIPs · GitHub here.

  • Enable 0-blob transactions to use SSZ format: On devp2p eth/68, 4844 suggests that the EIP-2718 transaction type is advertised to indicate Gossip and mempool behaviour. The EIP-2718 transaction type should solely be used to (1) denote the serialization format of a tx on the wire (also, as part of GetBlockBodys), (2) for the purpose of signing, and (3) for deriving the transaction hash. Giving it mempool specific meaning leads to problems and unnecessary discussions, e.g., for the 0-blob transaction case. It would be preferrable to advertise the number (or presence) of blobs instead of the transaction type on eth/68. Type 5 based 0-blob transactions could then be prossed the same as any type 0x01, 0x02, or 0xc0-0xfe transactions; Transactions with blobs (of type 0x05, or any future types that also have blobs) would use the req/resp based solution using the network wrapper.

  • SSZ encoding: EIP-6475: SSZ Optional SSZ Optional[Address] is preferrable over Union[None, Address], as it makes the SSZ merkle tree shape static, meaning that proof requests including the address can’t randomly fail based on union selector. Note that for fixed length items such as Address, Optional[Address] is equivalent to List[Address, 1] for the purpose of SSZ serialization and merkleization.

  • Constant tuning: MAX_VERSIONED_HASHES_LIST_SIZE is matching the maximum number of blobs per transaction, and is currently set to 16 million. This exceeds any rational design space; keep in mind that SSZ serialization caps out at 4 GB, so such transactions could not even be serialized. If devp2p eth/68 is updated to indicate the number of blobs, note that this number also is part of the advertisement and making it fit into a uint8 or uint16 may be desirable.

There’s been ongoing discussion around mempool DoS concerns, so perhaps we can start adding more recommendations to the spec around mempool handling of blob txs. Right now the spec suggests increasing data gas price by at least 10% for replacement, and the mempool already requires increasing regular gas price by 10% for replacement. Additional constraints that prevent specific DoS scenarios without being too burdensome on clients include:

  • Blob-holding txs should only be replaced by blob txs consuming at least as much datagas (e.g. # of blobs can never decrease). This prevents mempools from being spammed with multiple-blob txs to have them later deleted by (much cheaper) 0 or 1 blob txs.

  • There can only be one blob-containing tx per account. This prevents someone from spamming the mempool with multiple blob-holding txs each with sequential nonces in a way where none of them beyond the first would successfully execute & incur fees.

The suggestion from Etan in a comment above around announcing # of blobs in eth/68 instead of tx type would also help mempools better deal with blob-related DoS risk.

1 Like

Made a concrete proposal for eth/68 changes here:

For SSZ Optional, opened a PR:

1 Like

Re: data blob transaction replacement. From today’s 4844 client devs call, Dankrad noted that requiring increasing data gas (whether 10 or 100%) for blob tx replacement may not be a suitable disincentive for DoS since the gas is priced using 1559-type rules.

Ansgar suggested we might then also require that replacement txs contain the exact blob as before (so it would not have to be reverified). Depending on if it’s important, we could also allow replacement txs to add additional blobs, though this may be too much of an edge case to worry about.

Hey I just noticed that EIP-4844 doesn’t actually specify the ReceiptPayload format anywhere and probably should. For example EIP-1559 includes the line:

The EIP-2718 ReceiptPayload for this transaction is rlp([status, cumulative_transaction_gas_used, logs_bloom, logs])

But, before we add the ReceiptPayload I’m curious what others think of this idea I expressed on the ACD discord, where we remove the logs_bloom field from type 0x5 tx receipts? As discussed over there, most tx receipts only include a single event if any, so dedicating a full 256 bytes to the logs_bloom in the tx receipt feels a bit heavy (if the number of events in a tx receipt is small, it’s probably more effecient to just go ahead and scan the events vs. using the bloom since you have to scan the events anyways to double check against false positives in the bloom filter).

2 Likes