Alternative method to rent for alleviating the 'large and growing data' problem


#1

I’ve read as deeply as I can into the not-yet-formally-proposed ‘rent’ model, and I wonder if there has been much discussion about alternative methods of alleviating the problem that rent is trying to solve.

I understand that rent is trying to solve the ‘long-term’ storage problem, but perhaps the promise of long-term storage itself is the problem. If the (implicit?) promise of long-term storage were removed, dApp developers would be forced to respond to that context by providing long-term storage themselves. In other words, they would have to write their smart contracts knowing that the data stored there had a lifetime. Instead of providing a rent mechanism, which seems wildly complicated (from a user perspective, not so much a developer perspective), perhaps a simple “old-data-will-be-evicted” policy would suffice.

If this were the case, the size of the chain data would stop growing at a certain point (all the blocks are full – new data is added to the head, old data falls off the tail).

Perhaps it’s a less useful system, but at least it won’t suffer from a problem that I foresee if a rent model prevails. My concern is that, over time, under a rent model, the entire chain will become “pay-to-play” making usage by those with lower resources more and more out of reach. This will create two sets of participants (haves and have-nots), which will kill any chance of building systems that maximize benefit for each individual and all participants at the same time (perato-optimal systems).

Yes – it’s likely I don’t know what I’m talking about, but I haven’t seen anyone else ask, so I thought I would.

Were alternatives to the rent model discussed and rejected? Or was the rent model the only alternative suggested to solve the ‘data is growing too fast’ problem? Are there other ideas to solve the problem?


#2

I think state rent is simply a proposal that is being tested, I don’t think it’s officially going to be implemented on the main chain. https://en.wikipedia.org/wiki/Cache_replacement_policies has a list of cache eviction policies, if they were going to implement one, I’d imagine it would be the least frequently used data policy, as it’s the most consumer friendly.

That said, a block chain is not a cache, it should be immutable and data should never be changed or removed without a secured and recorded transaction driving it. If it doesn’t meet that standard, you can’t really trust it. There is really only one solution to this problem, distributing the data. That covers things like off-chain processing, lightning, plasma, sharding, IPFS, etc. It’s a hard problem, but it’s been solved in other contexts before, it will be solved again in the crypto space.

We aren’t that far away from standards that allow blockchains to interact, cooperate, and affect change… state arbitrage is possible. Seriously, this is a smaller issue than people are making it out to be.


#3

First of all, thank you for reading the rent proposal! The next revision should be much simplified.
Alternatives to rent certainly exist. I must admit that I did not spend a lot of time searching for them, because I figured that if I did that, I would not have produced rent proposal yet. And also I thought that other people will kindly point me to the alternatives (which is what you are trying to do, thank you!)

After thinking about your proposed model, I realised that I briefly thought about it before, and came to the conclusion that it is roughly equivalent to rent, but with two differences. If someone really wants their data to be kept on the blockchain, they will keep “touching” or “modifying” it (whatever the rules of not getting “old” require). That would be accomplished by sending transactions, and effectively paying rent to miners (instead of burning it, as in the state rent proposal). That was the first difference (paying miners vs burning). The second difference is that the state rent proposal implies that you can pay rent upfront for long time, saving on number of transactions. If you use transactions to prevent things from getting old, that is equivalent to only being able to pay rent in small portions.

To the questions of pricing people out of the blockchain. You are right, there is a risk of diverging from parent-optimal system. What I would say to that - alas, we are think that the prioritising the survival of the platform over it demonstrating to everyone else what happens if the state grows too quickly (which is also a very useful experiment if you ask me). So I guess when the proposal is more well-formed and has PoC, and some specification, and some data behind it (that will take some time), we shall see if there is another viable alternative - I am sure most of people will be open to that. Currently we have a plan and we are working on it, until a better plan is found :slight_smile:


#4

@samlavery Can you provide some pointers to “It’s been solved in other contexts…”?

@AlexeyAkhunov Thanks. I worry about unintended consequences as these, I think, tend to dominate in the long run.


#5

I’m not sure if this has been proposed before, but I’ve noticed that one of the biggest issues with syncing a node is RAM size. The larger the page cache, the less the node has to go out to fetch pages from disk, alleviating I/O pressure which is the biggest cost. Currently there is no incentive for dapp developers/users to be cache-friendly, which is why nodes need large SSDs - basically all the data needs to be in ‘warm’ storage. However, real-world machines use hierarchical storage (e.g. from hot to cold, CPU cache -> RAM -> SSD -> HDD), so creating an incentive structure to match that would enable node operators to make better use of hardware. The proposal is basically to make reads which go out to colder and colder storage to be more and more expensive.

Here’s a simple accounting model which is meant to model a hierarchical LRU cache. Define a global value, READ_COUNTER. Each value in the state database has a piece of metadata associated with it, LAST_READ_COUNTER. Each time a value is read or written to (i.e. SLOAD/SSTORE), READ_OPS is calculated as log2(READ_COUNTER - LAST_READ_COUNTER + 1). READ_COUNTER is incremented by READ_OPS, and LAST_READ_COUNTER for that value is set to the new value of READ_COUNTER. The gas cost for the read is proportional to READ_OPS (e.g. 200 * (READ_OPS + 1) would be the same as the current cost for SLOAD of a hot value, although the minimum could start out even lower at the cost of MLOAD assuming the value is already in memory).

READ_OPS is meant to represent the number of cache levels which need to evict words in order to read the new word, but perhaps it should have some sort of relation to growing block size for larger and larger storage. Another alternative is to model real hardware more closely, so instead of assuming infinite levels of caching, just 3 or 4 and having a specific gas cost for each of those levels. What do you think? I could be going down a complete garden path here but adjusting the gas cost for SLOAD seems very non-invasive compared to state rent and could significantly reduce storage costs for node operators.


#6

That’s kind of a neat idea, but it would be totally up to the miner to make up that cost. That automatically makes it a non-starter. Not all miners are equal, if I were sending tons of transactions, I would want my own node to have enough ram to fit the whole ethereum tree. And if I’m running one of these super-nodes, I certainly don’t want to mine somebody else’s transaction, unless I have an open spot. This breaks the notion of ethereum net neutrality, decentralization and democratization of public blockchains.

As for ‘alternative contexts’, look into the services offered by amazon and google. There are lots of options for distributed/replicated databases. Cassandra, mysql, postgresql all support some form of sharding, there is even an acronym: DDBMS. Pied Pipers middle out compression is allegedly amazing, but I haven’t trialed it yet.


#7

@samlavery Thanks for the feedback! I’m not quite sure I understand why it would be up to the miner to make up the cost though. The idea is to make it more expensive for users to access cold storage, not less, so that accessed state is more likely to be in RAM/cache, making it more feasible to run a full node using off-the-shelf hardware. From a miner’s perspective, their revenue would go up. If users demand that miners have hot storage for their transaction, the miners need to purchase the hardware anyways, but in a universe where reads are more expensive they would actually get compensated for that directly.