EIP-4444: Bound Historical Data in Execution Clients

aronchick · February 10, 2022, 8:24pm

Hi! I co-lead ResDev for Protocol Labs, we’re more than happy to help make available prove-ably immutable history of ETH forever on Filecoin & IPFS, and would not require any protocol changes. Please don’t hesitate to let me know if anyone would like help doing this!

Thanks!

poojaranjan · March 21, 2022, 2:12pm

PEEPanEIP-4444: Bound Historical Data in Execution Clients with @ralexstokes ️

Neurone · June 8, 2022, 2:33am

My two cents. I disagree with this EIP, both in general and about the contents.

It doesn’t add any real advantage to clients, I can already prune txs if I’d like to.

It removes a feature without offering any solution; everything is “out-of-scope” or “just use other p2p network for that”.

We can already use centralized servers or CANs (IPFS, Swarm, etc.) to store data, even Ethereum related (i.e., TrueBlocks indexes), but having that mandatory is something that makes no sense for me.

Ethereum is a p2p network, using the network to share data and reach consensus should be what it’s used for. And all the Ethereum ecosystem should be consistent without relying on others p2p networks.

Furthermore, storage is a commodity by now, and it will be even more in the future. Four hundred GB (and counting) of old data is really nothing to fear of, I can buy an HDD to store 4 TB of data for $30, and it will last for another 20 years of historical txs. Storage technologies grow and improve really faster than Ethereum’s txs.

In my opinion, the right path is continuing to improve light client, spreading more nodes among constrained devices and hardware.

The wrong path is transforming forcibly all full nodes into light clients.

ytrezq · September 3, 2022, 11:29pm

Hi, your are saying this, but TheGraph just censored Tornado Cash.
The underlying problem is things like databased shared over ipfs or Google Bigquery wouldn’t allows us to have a realtime service as it would take a day to download updates. We also need very old data in order to let a user withdraw funds deposited several years ago.

This isn’t that much a problem currently as this just mean we will need our own home (not cloud) hosted node with parity_tracing, along OpenEthereum Fatdb like for getting smart contract storage range at past blocks. Such thing is possible because full anciant data is broadcasted and updated in real time but show why it is important to have node/rpc be able to broadcast full chain history over the p2p network.

If third party service are made required, then well, Ethereum, will be decentralized at the currency level like Bitcoin while Dapps like casinos or yield farming, will be fully permissionned have to register with authorities by paying an army of lawayers in order to be allowed to run which means, DeFi won’t be that much different than Fintechs on the tradionnal banking system which rent their computing hardware.

I think keeping history on the p2p definitely worth the reduced transactions per seconds outputs or if we decide to behave like swift or MasterCard or Visa for being able to run as fast as them.

ytrezq · September 3, 2022, 11:34pm

No, their point is large database reduce the transaction per second speed. But I think this is far fetched as Visa and Mastercard are in my country required to record data of all transactions of the past 10 years for law enforcement, and this doesn’t prevent them to run.

ytrezq · September 3, 2022, 11:41pm

Would this be in real time for each blocks like the current p2p network, or would there be daily updates pushes ?

ytrezq · September 3, 2022, 11:43pm

Hi, your are saying this, but TheGraph just censored Tornado Cash.
The underlying problem is things like databased shared over ipfs or Google Bigquery wouldn’t allows us to have a realtime service as it would take a day to download updates. We also need very old data in order to let a user withdraw funds deposited several years ago.

This isn’t that much a problem currently as this just mean we will need our own home (not cloud) hosted node with parity_tracing, along OpenEthereum Fatdb like for getting smart contract storage range at past blocks. Such thing is possible because full anciant data is broadcasted and updated in real time but show why it is important to have node/rpc be able to broadcast full chain history over the p2p network.

If third party service are made required, then well, Ethereum, will be decentralized at the currency level like Bitcoin while Dapps like casinos or yield farming, will be fully permissionned have to register with authorities by paying an army of lawayers in order to be allowed to run which means, DeFi won’t be that much different than Fintechs on the tradionnal banking system which rent their computing hardware.

I think keeping history on the p2p definitely worth the reduced transactions per seconds outputs or if we decide to behave like swift or MasterCard or Visa for being able to run as fast as them.

ytrezq · September 3, 2022, 11:46pm

I propose something which is done by the Cloud industry which bill money to keep data. Deposit Ethers on smart contracts : at each blocks, a very tiny fee is removed. When the smart contract values drops to 0, it’s code/storage is SUICIDED and it’s relevant transactions deleted from history.

That way : what is needed is kept while what is forgotten is destroyed. This is also means more efficient than the proposal since stuff can be destroyed before 1 year.
Please also notice that destroying what is unused is also how the human memory works and things always fit in the size of a human skull.

gcolvin · September 4, 2022, 3:54am

ytrezq Deposit Ethers on smart contracts : at each blocks, a very tiny fee is removed

Or maybe, stake the ether and use the proceeds to pay for the data storage in perpetuity.

chaals · June 19, 2023, 12:37pm

Would this imply setting the decay fee in relation to the staking return? (How does it relate to the magical 32 Eth stake number?)

chaals · June 19, 2023, 12:38pm

Ditto. I think it’s important to address it explicitly as part of any proposal, beyond “some other tool will solve this”. For largely the same reasons advanced for requiring processing clients to drop data after a given time.

tonytony32 · November 21, 2023, 11:00am

There is a way to preserve historical state data, it can be based on economic incentives, and also in an altruistic way simultaneously (!), should it happen to be the popular solution.

So pruning data according to this EIP does not necessarily mean that it is lost, but only that the process of verifiability is changed elsewhere.

tonytony32 · November 21, 2023, 11:12am

If the data continues to grow, the rewards per data become smaller with eth limited supply, and historical data is at a disadvantage.

Keep data alive for a non-defined period of time is a promise. You need specific offchain systems to keep it alive, and those systems need to work on an altruistic and incetivized fashion.

qizhou · December 1, 2023, 7:01am

The rewards per data in ETH being smaller may not imply that the historical data is at risk (e.g., under replicated) - if more ETH is staked to pay toward storage, then the price goes higher. Further, the storage cost also decreases constantly over time.

tonytony32 · December 1, 2023, 5:43pm

The assumption that ETH raises its price mainly because of storage costs is risky. Consensus and data storage are two (very) different tasks, markets and economies that uncorrelated contributes to the long-term good of the project. Unless the motivation is to grab the potential profits of the storage market, because the consensus market is not profitable enough.

Also, storage costs decreasing its value over time is one of my favourite assumptions. Moore’s Law always worked until it didn’t.

yorickdowne · December 2, 2023, 3:32pm

I’d like to see this land in early 2025, which means a solid design needs to be in place by early-mid 2024. Is that even remotely feasible?

Solo stakers on 2TB disks have until maybe late 2025 - optimistically sometime 2026 - before they run out of space, and early-mid 2025 before they get nervous. It’d be amazing if they weren’t forced to upgrade to 4TB.

gcolvin · February 16, 2024, 6:45am

4TB SSDs start at $280.

https://www.newegg.com/p/pl?N=100011693%20600545605&Order=1

peersky · February 22, 2024, 5:47pm

My suggestion is to change wording

Clients SHOULD NOT serve headers <...> -> Clients MAY NOT serve headers <...>

Rationale:

MAY formulation is less assertive on node behaviour while does not effect of this EIP.

Ability to use uniform access interfaces such as eth eth_getLogs and eth_call to read both historical, latest and global index data shown great from developer and user experience allowing us to build non permissioned database of records. This is a great functionality which we should not lose in attempt to optimise.

While nodes MAY drop the historical data, there are reasons to assume that not everyone would be willing to do so because (i) older data has intrinsic value (ii) other nodes pruning data are effectively reducing supply of such (iii) data availability of nodes can be ultimately solved by having a tightly and correctly architected financial incentives model. Some discussion in that regard in research forum: Economics for P2P (RPC) data marketplace - Economics - Ethereum Research
(iv) Users who want to maintain full or extended historic experience can negotiate with the Node availability beyond the minimal agreed on. (v) the epoch timeframe is a subjective matter and ideally should be considered to be as short as it can be to maintain consensus and to keep consensus participation storage requirements low while keeping data availability as optional feature