EIP-4444: Bound Historical Data in Execution Clients

Hi! I co-lead ResDev for Protocol Labs, we’re more than happy to help make available prove-ably immutable history of ETH forever on Filecoin & IPFS, and would not require any protocol changes. Please don’t hesitate to let me know if anyone would like help doing this!

Thanks!

1 Like

PEEPanEIP-4444: Bound Historical Data in Execution Clients with @ralexstokes

1 Like

My two cents. I disagree with this EIP, both in general and about the contents.

It doesn’t add any real advantage to clients, I can already prune txs if I’d like to.

It removes a feature without offering any solution; everything is “out-of-scope” or “just use other p2p network for that”.

We can already use centralized servers or CANs (IPFS, Swarm, etc.) to store data, even Ethereum related (i.e., TrueBlocks indexes), but having that mandatory is something that makes no sense for me.

Ethereum is a p2p network, using the network to share data and reach consensus should be what it’s used for. And all the Ethereum ecosystem should be consistent without relying on others p2p networks.

Furthermore, storage is a commodity by now, and it will be even more in the future. Four hundred GB (and counting) of old data is really nothing to fear of, I can buy an HDD to store 4 TB of data for $30, and it will last for another 20 years of historical txs. Storage technologies grow and improve really faster than Ethereum’s txs.

In my opinion, the right path is continuing to improve light client, spreading more nodes among constrained devices and hardware.

The wrong path is transforming forcibly all full nodes into light clients.

4 Likes

Hi, your are saying this, but TheGraph just censored Tornado Cash.
The underlying problem is things like databased shared over ipfs or Google Bigquery wouldn’t allows us to have a realtime service as it would take a day to download updates. We also need very old data in order to let a user withdraw funds deposited several years ago.

This isn’t that much a problem currently as this just mean we will need our own home (not cloud) hosted node with parity_tracing, along OpenEthereum Fatdb like for getting smart contract storage range at past blocks. Such thing is possible because full anciant data is broadcasted and updated in real time but show why it is important to have node/rpc be able to broadcast full chain history over the p2p network.

If third party service are made required, then well, Ethereum, will be decentralized at the currency level like Bitcoin while Dapps like casinos or yield farming, will be fully permissionned have to register with authorities by paying an army of lawayers in order to be allowed to run which means, DeFi won’t be that much different than Fintechs on the tradionnal banking system which rent their computing hardware.

I think keeping history on the p2p definitely worth the reduced transactions per seconds outputs or if we decide to behave like swift or MasterCard or Visa for being able to run as fast as them.

No, their point is large database reduce the transaction per second speed. But I think this is far fetched as Visa and Mastercard are in my country required to record data of all transactions of the past 10 years for law enforcement, and this doesn’t prevent them to run.

Would this be in real time for each blocks like the current p2p network, or would there be daily updates pushes ?

Hi, your are saying this, but TheGraph just censored Tornado Cash.
The underlying problem is things like databased shared over ipfs or Google Bigquery wouldn’t allows us to have a realtime service as it would take a day to download updates. We also need very old data in order to let a user withdraw funds deposited several years ago.

This isn’t that much a problem currently as this just mean we will need our own home (not cloud) hosted node with parity_tracing, along OpenEthereum Fatdb like for getting smart contract storage range at past blocks. Such thing is possible because full anciant data is broadcasted and updated in real time but show why it is important to have node/rpc be able to broadcast full chain history over the p2p network.

If third party service are made required, then well, Ethereum, will be decentralized at the currency level like Bitcoin while Dapps like casinos or yield farming, will be fully permissionned have to register with authorities by paying an army of lawayers in order to be allowed to run which means, DeFi won’t be that much different than Fintechs on the tradionnal banking system which rent their computing hardware.

I think keeping history on the p2p definitely worth the reduced transactions per seconds outputs or if we decide to behave like swift or MasterCard or Visa for being able to run as fast as them.

I propose something which is done by the Cloud industry which bill money to keep data. Deposit Ethers on smart contracts : at each blocks, a very tiny fee is removed. When the smart contract values drops to 0, it’s code/storage is SUICIDED and it’s relevant transactions deleted from history.

That way : what is needed is kept while what is forgotten is destroyed. This is also means more efficient than the proposal since stuff can be destroyed before 1 year.
Please also notice that destroying what is unused is also how the human memory works and things always fit in the size of a human skull.

ytrezq Deposit Ethers on smart contracts : at each blocks, a very tiny fee is removed

Or maybe, stake the ether and use the proceeds to pay for the data storage in perpetuity.

Would this imply setting the decay fee in relation to the staking return? (How does it relate to the magical 32 Eth stake number?)

Ditto. I think it’s important to address it explicitly as part of any proposal, beyond “some other tool will solve this”. For largely the same reasons advanced for requiring processing clients to drop data after a given time.

There is a way to preserve historical state data, it can be based on economic incentives, and also in an altruistic way simultaneously (!), should it happen to be the popular solution.

So pruning data according to this EIP does not necessarily mean that it is lost, but only that the process of verifiability is changed elsewhere.

If the data continues to grow, the rewards per data become smaller with eth limited supply, and historical data is at a disadvantage.

Keep data alive for a non-defined period of time is a promise. You need specific offchain systems to keep it alive, and those systems need to work on an altruistic and incetivized fashion.

The rewards per data in ETH being smaller may not imply that the historical data is at risk (e.g., under replicated) - if more ETH is staked to pay toward storage, then the price goes higher. Further, the storage cost also decreases constantly over time.

The assumption that ETH raises its price mainly because of storage costs is risky. Consensus and data storage are two (very) different tasks, markets and economies that uncorrelated contributes to the long-term good of the project. Unless the motivation is to grab the potential profits of the storage market, because the consensus market is not profitable enough.

Also, storage costs decreasing its value over time is one of my favourite assumptions. Moore’s Law always worked until it didn’t.

1 Like

I’d like to see this land in early 2025, which means a solid design needs to be in place by early-mid 2024. Is that even remotely feasible?

Solo stakers on 2TB disks have until maybe late 2025 - optimistically sometime 2026 - before they run out of space, and early-mid 2025 before they get nervous. It’d be amazing if they weren’t forced to upgrade to 4TB.

2 Likes

4TB SSDs start at $280.

https://www.newegg.com/p/pl?N=100011693%20600545605&Order=1

2 Likes

My suggestion is to change wording

Clients SHOULD NOT serve headers <...> -> Clients MAY NOT serve headers <...>

Rationale:

MAY formulation is less assertive on node behaviour while does not effect of this EIP.

Ability to use uniform access interfaces such as eth eth_getLogs and eth_call to read both historical, latest and global index data shown great from developer and user experience allowing us to build non permissioned database of records. This is a great functionality which we should not lose in attempt to optimise.

While nodes MAY drop the historical data, there are reasons to assume that not everyone would be willing to do so because (i) older data has intrinsic value (ii) other nodes pruning data are effectively reducing supply of such (iii) data availability of nodes can be ultimately solved by having a tightly and correctly architected financial incentives model. Some discussion in that regard in research forum: Economics for P2P (RPC) data marketplace - Economics - Ethereum Research
(iv) Users who want to maintain full or extended historic experience can negotiate with the Node availability beyond the minimal agreed on. (v) the epoch timeframe is a subjective matter and ideally should be considered to be as short as it can be to maintain consensus and to keep consensus participation storage requirements low while keeping data availability as optional feature

1 Like

This is just a symptom of a larger problem: computing parallelism. Ethereum is a distributed computer, so it needs to have an architecture of a distributed computer, not a Von Neumann architecture just replicated in parallel. If you give a parallel task to a bunch of computers designed to run serial processes (even if they have some communication between them) you are going to have serious issues with tasks that share state. If you want to fly you have to design a plane, not a bycicle with wings. Right now you might fix the issue by deleting some not-so-used data from the database (a temporary fix), but the core problem (i.e. the EVM is not natively paralell) will be popping up in some other stuff and eventually Ethereum will scale just proportional to the capacity current hardware.

To design a truly distributed computer (which won’t have issues with scaling), you need to use a computing model designed for parallelism natively. There is a page in wikipedia called “Model of computation” and if you scroll down, there is a section called “Concurrent models”. What the Ethereum team has to do, is to rewrite the whole EVM using some of the models listed in that section. If somebody would ask me , I would implement Kahn’s Processing Network, it is very simple and very reliable model.

Obviously this is a huge change (architectural change) but this is the only right solution. Otherwise, you are going to be patching , patching, patching and keep patching stuff forever and you will never achieve scaling. The data will be growing like popcorn , and you will have to keep buying and buying SSD disk. But if you implement truly parallel computing architecture, every node will have a portion of the data and it could scale to hundred thousand of transactions per second efortlessly.

I think your argument is a lot of hand-wavying for removing an invariant that has been in Ethereum for roughly a decade. We app developers depend on these interfaces to exist and I don’t want them to be gone.

You joined this forum a day ago. I don’t think you have Ethereum’s best interests in mind and your argument is strange and convoluted at best for arguing for such a narrow change to the protocol