You are thinking of push and pops to stack, as memory goes via stack no endian conversion happens. Is also one _mm256_permutevar64x8_epi8 instruction to do the coversion whereas hashing has a dependency chain of instructions with the input of one instruction waiting on output of another. Memory is an array offset lookup so is inherently simplier than a hashtable lookup. Could also be a pointer copy to that location if that’s your thing. But this is getting into the weeds a bit on specific implementations which can vary between clients.
The addtional difference between TSTORE and MSTORE is TSTORE additionary requires the key (address: byte20, slot: byte32) as well as the byte32 value; so each new entry is requiring > x2 the memory of MSTORE which just writes to a position and is only one byte32 value.
Hence the suggestion that it is included in the memory expansion cost at a x2 rate:
The behavior is not conceptually different after that point.
SSTORE needs basically two lookups, one from the pre-transaction map and one from the journal:
(Additional complexity “hidden” in SSTORE is that it results in needing to prepare and RLP serialize the new account trie at the end of the transaction. I think this is accounted for in the current net metering schedule)
I remember reasoning through this before and apparently forgot my conclusions . So, I take it back, not so easy to change the cost of SSTORE
In comparison, TSTORE is simple, it’s just a write to a single journal data structure:
You are thinking of push and pops to stack, as memory goes via stack no endian conversion happens.
So I think this might be a bit implementation specific, since different runtimes represent stack items differently. E.g. py-evm:
vs geth:
But this is getting into the weeds a bit on specific implementations which can vary between clients.
Yes, definitely a little in the weeds + implementation specific. But all clients for LE architectures must do conversion sometimes (whether they do it “eagerly” a read/write from memory or “lazily” when something like ADD is requested, unless there are some really funky bigint implementations that I’m not aware of ), so I think it’s worth mentioning. Even on a BE arch, you still need to copy 32 bytes, because memory is mutable. The very best you can do is create a stack item which is copy-on-write.
Hence the suggestion that it is included in the memory expansion cost at a x2 rate:
e.g. memory use = memory slots + 2 x tstore slots
Rather than having its own load factor
It’s a thought, although I was under the impression that the quadratic pricing for memory is evil and we wanted to move away from it, see for instance Proposals to adjust memory gas costs
Since the superlinear pricing in both memory and transient storage is intended to prevent DOS, I think another way to think about it is: how should the practical memory bound increase as gas limit increases? I think DDR and L1-L3 cache do not scale over time a la Moore’s law the same way as CPU speed and storage cost (see references). I haven’t done the analysis, but I suspect it is something like log() or sqrt() compared to the above two variables, and this EIP bounds memory like sqrt(gas limit). We could try something to bound like log(gas limit), but I think that makes the pricing model more complicated in terms of math.
Sounds nice but I don’t want to reason about the interaction between memory expansion and transient storage expansion. For one thing, they have different scope, so the bounds behave differently under nested call scenarios (I guess the two important ones are “deep” nesting, recursion, and “broad” nesting: sequential calls). It’s also proven nearly impossible in practice to change memory expansion prices. I guess because they are so hard to reason about?
IMO it’s just simpler to have separate pricing formulas for them, and tune them separately. I’m open to fusing the pricing functions, but transient storage is just a different beast, I think it’s fine for it to have its own pricing model.
Tangentally, we do want people to use Access Lists for other reasons; but they aren’t used much as the penalty for including an item and not using it is very high and will outweigh all savings so they end up not being used that much. So I’d withdraw the differentiation from access list warm and actual warm (edited my entry)
I’ve received some feedback offline suggesting the base cost is too close to memory base cost; I suspect the benchmark here is making memory look better than it really is, a better apples/apples benchmark would be to mload from many dirty memory locations, but I have not had time to run the benchmark yet.