One of the scalability limitations of ethereum today is the fact that full nodes need to store the entire historical chain, including blocks and receipts. Clients that do not do this (eg. Nethermind) exist, and have <1/4 the storage requirements of clients that do, but major clients are choosing not to implement in part because there are dapps that use very old historical blocks and transactions that we are not willing to break.
I propose that we establish a norm that clients should not store transactions or logs more than 1 million blocks (~4 months) behind the head, and we should work hard to create special-purpose cryptoeconomic light client infrastructure to allow clients to continue to fetch such data. This infrastructure can be included by default into major ethereum clients including full nodes and light nodes.
The setup would work as follows:
- There would be a registry contract where anyone can register their address as a light client server by providing a deposit (eg. 32 ETH). Deposits can be withdrawn after one day. When depositing, the server must also provide a root
HISTORY_ROOT
that represents what they think is the Merkle root of all historical blocks. - Suppose that a light client wants to learn about the set of all receipts between heights H1 and H2 that contain topic
T
. The client would send a query(H1, H2, T)
to a set of serversS_1 ... S_n
that are in the registry and that have submittedR
values that the client agrees with. -
Every
S_i
would return a signed message(H1, H2, T, R_1, R_2 ... R_m, P_1, P_2 ... P_m, sig)
whereR_1 ... R_m
is a set of all (block, txindex) pairs which issue such a receipt, andP_i
is a Merkle proof showing thatR_i
contains a valid receipt, rooted in the server’s providedHISTORY_ROOT
.sig
is a signature of(R_1 ... R_m)
signed by the registered key. - The user would receive the answers, verify each Merkle branch, and see if all answers match among the servers that have responded. If they match, the user would accept. If there is any mismatch, the user would publish a fraud proof into the deposit contract themselves, or rebroadcast the mismatch to a server that would do it for them.
- The user can optionally pay a fee. Servers can prioritize responding to requests from users that have paid fees to them. This can be done on-chain or through an agreed L2 protocol (eg. optimistic rollup may work well)
A fraud proof consists of (i) a signed message produced in step (3), and (ii) a Merkle proof, rooted in the HISTORY_ROOT
value submitted by the server at deposit time, of a receipt that is in (H1, H2)
, has topic T
, and was not part of the signed message.
Altogether, this allows clients to fetch the complete set of old receipts with one round of network messaging, completely bypassing the need to scan through bloom filters, greatly increasing the usability of such applications.
Note that there is one exceptional case, where the receipt is too big to include in a block (this would generally only happen due to malicious DoS attacks, but could also happen if a rollup saves data in receipts). This can be solved by having servers commit to (and send to clients once) a list of all (block, txindex) pairs with oversized logs and adding a separate class of fraud proof for omitting any such transaction.
For historical transactions (“fetch me the transaction with this hash”) the mechanism is similar. If there is a transaction with a given hash, then a simple Merkle branch suffices (we lean on the txhash uniqueness assumption here). If there is no transaction with the desired hash, then the servers can send a signed message, and a fraud proof showing the location of the transaction (which because of how the Patricia tree works would only need to include the transaction hash) suffices to prove that the servers gave an incorrect answer.
Possible extensions
- If desired, the oversized log problem could be more efficiently solved at protocol level by changing the receipt data structure so that instead of
logs
it storeslogs_root = sha3(logs)
and changing the above protocol so that the server must return all transactions whose transaction-level bloom filter matchestarget
even if they do not containtarget
in their logs; this way the fraud proof would not need to include the body of the list of logs and would always be log-sized. - The servers could just reply with the list of
(block, txindex)
pairs, without proofs. This would improve efficiency, but at the cost of weakening the security model (if the cryptoeconomic assumption fails, not only could receipts be omitted, but also the client would see false receipts until a fraud proof comes in).