Hey V, thanks for writing those out a bit.
This seem to focus on only one type of node. The type of node that follows the chain and is used for either staking or normal usage. And then try to split is into the subcategories of light/medium/heavy.
I would split nodes further into these categories that come in mind:
- follower node. Just for normal usage and keeping up with the chain.
- staking node. Used for staking.
- historical noded. Used to access historical data
Follower node
The follower node can be either light or heavy, matching what you mentioned above. But I would assume in the absence of staking, the incentives are not there to have anything but a light node.
Staking node
The staking node is divided into two so far from our experience. Local block building staking node and one that uses mev-boost and pushes the block building to 3rd parties.
Having increasing hardware requirements, for either of these is not bad imo. Anyone that puts up the capital of 32 > ETH should be able to afford it. Having a bit more beefed up machines is not a problem, so long as it stays within sensible size limits that can run in a residential location.
The biggest difference between the local staker and mevboost staker is bandwidth. There was some measurements (I don’t have links in handy) but the difference is quite big (when we cross 50 mbps upload bandwidth required, limitations hit). And bandwidth requirements is the one thing we can’t easily pay for if you want to have stakers run from home and have geographical diversity. As there is many areas around the world with insufficient infrastructure.
So power and bandwidth requirements are what can hurt us here and we need to find a good balance.
The problem I see is that there is no incentive to run a local builder vs a mevboost proposer. You can still run mevboost at home, even in places with VDSL (and not fiber) so 250mbp down/50 mbps up and you make more money than running a local builder. Win/win.
So the incentives here keep pushing the local builders towards big actors, or data centers and centralization. That’s something I fear about.
Historical node
The ability to be able to query the history of the chain, and now with all the L2 of all these chain is essential. You need to be able to see what was the stated on 30/04/2021 on a given contract. It’s essential for accounting, for taxes, historical processing and generally book-keeping.
But right now it’s very hard. Running an archive node per chain is super expensive and non-trivial for end-users. On top of that it’s not enough. Even if you run an archive node you can’t get the answer to some really basic questions:
- Give me a list of all the transaction hashes that an address appeared in. Essentially tell me my history.
- Give me a list of all the withdrawals to an address
- Give me a list of all the blocks proposed by an address (as a fee recipient)
- Turn a block number to timestamp and vice versa
etc.
At the moment lll these require extra expensive indexing on top of an archive node. Per chain. Even for devoted individuals this becomes too expensive and too much work to maintain.
Which is why we all end up centralizing the fetching of these data around indexers such as etherscan, blockcha.in and other centralized APIs.
I don’t have a solution here. But I see this as a big problem, I have been talking about for years now as someone writing local applications analyzing the chain.
The continuous centralization of historical queries to external centralized indexers, even if you run your own archive nodes is scary for me and I am afraid that a few years down the line we won’t even be able to double check stuff and what the 1, or 2 remaining centralized sources of data say will become the truth of what happened.
And that is too much power to put on any one or two entities.