EIP-4444: Bound Historical Data in Execution Clients

Additionally, if we use MUST NOT then the peer that breaks this requirement SHOULD be disconnected and penalised.

I agree completely that “preserving this widely-used aspect of the Ethereum network” is of utmost importance. But why does it have to be a capability that the node maintains?

It seems to me that maintaining this ability in the node is exactly the problem. If the data is immutable, the entire state and the entire history of the chain can be written once to some content-addressable store (such as IPFS) and as long as someone preserves that data, anyone can get it. AN Ethereum node would not even have to be involved. All one would need is the hash of where the immutable data is stored.

Fresh data can be written as an ‘addendum’, so there would have to be some sort of published manifest of the original hash and the periodic ongoing hashes. I would argue that the hash of the manifest should be part of the chain, but, short of that, the community would have to maintain it (perhaps by publishing the hash to a smart contract).

My point is that because the data is immutable, and because we have content-address storage to store it in, there’s literally no need to continue to provide the ability to regenerate this data from genesis. The only outcome of regenerating from genesis would be to arrive at the same IPFS hash as you already have.

On top of that, there’s no reason the clients have to maintain this capability, and the entire purpose of this EIP is to remove that requirement. This might possibly open a whole new area of innovation related to providing access to this historical data – which I think would allow for amazingly more interesting dApps than we currently have (because of the need for a node to get to it).

Furthermore, if the historical data is chunked into manageable pieces, and it was properly indexed by chunk (with a bloom filter in front of each chunk) each individual user could easily download and pin only that portion of the database that they are interested in. Thereby distributing this historical data throughout the community as a natural by-product of using it. (See TrueBlocks).

I agree that people are uploading lot of stuff on blockchain especially with the rise of NFTs but also not optimized token contracts which are causing state bloat. But did anyone think about other examples and use cases of blockchain? What if people uploaded their important documents like birth certificates on blockchain as the mission of blockchain is ledger which stores information on-chain forever. Suddenly those people won’t be able to access their documents because some devs thought it’s a good idea to delete blockchain state after some time… Another great example is NFTs especially NFTs that were made before ERC-721 ie 2017 and older like CryptoPunks. Those will be gone for ever.

From developer perspective, I’m sure that there are many data that are not important and doesn’t need to be stored.

Probably better idea would be to store data on full nodes and have light nodes or think about different ideas how to make infrastructure the most efficient without having to delete and loose data.

Don’t get me wrong, I’m just trying to think realistically from non-core-dev perspective and I’m against this EIP.

2 Likes

Ethereum was never designed to be a permanent data storage system. Something like FileCoin is much better suited for long term data storage, and they have incentives built into the protocol to ensure that the cost of long term storage is paid for by those seeking it.

Also, this EIP removes history but not state. State expiry is also an active area of research, but out of scope for this thread.

3 Likes

I have been an Ethereum watcher and dapp developer for years, and have admired all the EIPs that have come through. However, this EIP is deeply troubling. I think this would be an extremely negative EIP to implement. Here’s why:

Ethereum was touted as the system to build “unstoppable” apps over the years, I loved it. With this EIP, these “unstoppable” apps will simply, well, stop (at least, their UXs/UIs will). It forces substantial and necessary adoption of some a.n.other (unknown and uncertain) protocol entirely outside of the Ethereum system. Pulling the “promise” of data persistence on old apps will be disastrous for the long term reputation of dapp development on Ethereum.

For those who say Ethereum was never meant to store data permanently, that is simply not true. It was! It’s specced that way and therefore used that way. And with this EIP, it will no longer. This is a truly fundamental change to (and destruction of) the Ethereum value proposition. Providing canonical transaction history tightly coupled with canonical transaction generation is CRITICAL to Ethereum’s value proposition. Offloading this entirely outside of Etheruem’s control gives away (and destroys) the future utility of Ether. Why? Because the whole point of canonical transaction generation is that you also have canonical transaction history.

With EIP 4444, it is possible to lose entire chunks of past transaction history. Forever. As in gone. No one knows who sent (or did) what to who or when.

There’s a reason why JP Morgan, HSBC, etc, any of these long storied banks are still around. It’s because you can rely on them having, somewhere inside their big walled offices, a transaction history going back over 100 years. This builds TRUST (yes, centralized trust). But that’s why (other) people come back to them (even if blockchain ppl don’t). The old history may be hand written in log books, sure not convenient, but it’s there.

Now imagine Ethereum was just such an organisation. You go to Ethereum in 12 years time and you ask (in code), what transaction happened on this account 10 years ago? The reply: oh go to Graph/Bittorrent/IPFS/a.n. other, we don’t keep that. You try numerous of these organisations, by some stroke of bad luck, they messed up on your particular EOA/contract (or their tech dies), and it’s gone. Would you trust the Ethereum system, simply because it moved to cool stateless consensus, and therefore decided they didn’t need to include anything as boring as past history anymore? I wouldn’t.

What this EIP fails to realise is that the value of canonical decentralised transactions in almost all real world use cases isn’t canonical decentralised transactions, but canonical history of decentralised transactions. The blockchain that does this will win. Ethereum does both today, but with this EIP, it won’t any more. That’s what I would call the broken promise of Ethereum if this EIP happens, and without a simultaneous EIP that ensures canonical transaction history at the Ethereum protocol level. Perhaps some compromise can be made on time horizon. “Forever” is not enforceable, but say 20 years (for example) is, and good enough for most real world use cases (but even then not good enough for academic records, for example). Remember, the well known banks most of us also use can keep (centralised) canonical records for over a century, and universities can keep records for multiple centuries.

3 Likes

Arweave fixes your problem. Period.

It is a.n.other protocol. canonical transaction history is absolutely critical to Ethereum. Why entrust that outside? If Arweave figures out Smartweave properly, let’s see where Ether ends up. More power to Arweave :slight_smile:

Hi! I co-lead ResDev for Protocol Labs, we’re more than happy to help make available prove-ably immutable history of ETH forever on Filecoin & IPFS, and would not require any protocol changes. Please don’t hesitate to let me know if anyone would like help doing this!

Thanks!

1 Like

PEEPanEIP-4444: Bound Historical Data in Execution Clients with @ralexstokes

1 Like

My two cents. I disagree with this EIP, both in general and about the contents.

It doesn’t add any real advantage to clients, I can already prune txs if I’d like to.

It removes a feature without offering any solution; everything is “out-of-scope” or “just use other p2p network for that”.

We can already use centralized servers or CANs (IPFS, Swarm, etc.) to store data, even Ethereum related (i.e., TrueBlocks indexes), but having that mandatory is something that makes no sense for me.

Ethereum is a p2p network, using the network to share data and reach consensus should be what it’s used for. And all the Ethereum ecosystem should be consistent without relying on others p2p networks.

Furthermore, storage is a commodity by now, and it will be even more in the future. Four hundred GB (and counting) of old data is really nothing to fear of, I can buy an HDD to store 4 TB of data for $30, and it will last for another 20 years of historical txs. Storage technologies grow and improve really faster than Ethereum’s txs.

In my opinion, the right path is continuing to improve light client, spreading more nodes among constrained devices and hardware.

The wrong path is transforming forcibly all full nodes into light clients.

3 Likes

Hi, your are saying this, but TheGraph just censored Tornado Cash.
The underlying problem is things like databased shared over ipfs or Google Bigquery wouldn’t allows us to have a realtime service as it would take a day to download updates. We also need very old data in order to let a user withdraw funds deposited several years ago.

This isn’t that much a problem currently as this just mean we will need our own home (not cloud) hosted node with parity_tracing, along OpenEthereum Fatdb like for getting smart contract storage range at past blocks. Such thing is possible because full anciant data is broadcasted and updated in real time but show why it is important to have node/rpc be able to broadcast full chain history over the p2p network.

If third party service are made required, then well, Ethereum, will be decentralized at the currency level like Bitcoin while Dapps like casinos or yield farming, will be fully permissionned have to register with authorities by paying an army of lawayers in order to be allowed to run which means, DeFi won’t be that much different than Fintechs on the tradionnal banking system which rent their computing hardware.

I think keeping history on the p2p definitely worth the reduced transactions per seconds outputs or if we decide to behave like swift or MasterCard or Visa for being able to run as fast as them.

No, their point is large database reduce the transaction per second speed. But I think this is far fetched as Visa and Mastercard are in my country required to record data of all transactions of the past 10 years for law enforcement, and this doesn’t prevent them to run.

Would this be in real time for each blocks like the current p2p network, or would there be daily updates pushes ?

Hi, your are saying this, but TheGraph just censored Tornado Cash.
The underlying problem is things like databased shared over ipfs or Google Bigquery wouldn’t allows us to have a realtime service as it would take a day to download updates. We also need very old data in order to let a user withdraw funds deposited several years ago.

This isn’t that much a problem currently as this just mean we will need our own home (not cloud) hosted node with parity_tracing, along OpenEthereum Fatdb like for getting smart contract storage range at past blocks. Such thing is possible because full anciant data is broadcasted and updated in real time but show why it is important to have node/rpc be able to broadcast full chain history over the p2p network.

If third party service are made required, then well, Ethereum, will be decentralized at the currency level like Bitcoin while Dapps like casinos or yield farming, will be fully permissionned have to register with authorities by paying an army of lawayers in order to be allowed to run which means, DeFi won’t be that much different than Fintechs on the tradionnal banking system which rent their computing hardware.

I think keeping history on the p2p definitely worth the reduced transactions per seconds outputs or if we decide to behave like swift or MasterCard or Visa for being able to run as fast as them.

I propose something which is done by the Cloud industry which bill money to keep data. Deposit Ethers on smart contracts : at each blocks, a very tiny fee is removed. When the smart contract values drops to 0, it’s code/storage is SUICIDED and it’s relevant transactions deleted from history.

That way : what is needed is kept while what is forgotten is destroyed. This is also means more efficient than the proposal since stuff can be destroyed before 1 year.
Please also notice that destroying what is unused is also how the human memory works and things always fit in the size of a human skull.

1 Like

ytrezq Deposit Ethers on smart contracts : at each blocks, a very tiny fee is removed

Or maybe, stake the ether and use the proceeds to pay for the data storage in perpetuity.

Would this imply setting the decay fee in relation to the staking return? (How does it relate to the magical 32 Eth stake number?)

Ditto. I think it’s important to address it explicitly as part of any proposal, beyond “some other tool will solve this”. For largely the same reasons advanced for requiring processing clients to drop data after a given time.

There is a way to preserve historical state data, it can be based on economic incentives, and also in an altruistic way simultaneously (!), should it happen to be the popular solution.

So pruning data according to this EIP does not necessarily mean that it is lost, but only that the process of verifiability is changed elsewhere.

If the data continues to grow, the rewards per data become smaller with eth limited supply, and historical data is at a disadvantage.

Keep data alive for a non-defined period of time is a promise. You need specific offchain systems to keep it alive, and those systems need to work on an altruistic and incetivized fashion.