EIP-7609 reduce transient storage pricing

benaadams · May 10, 2024, 11:11am

You are thinking of push and pops to stack, as memory goes via stack no endian conversion happens. Is also one _mm256_permutevar64x8_epi8 instruction to do the coversion whereas hashing has a dependency chain of instructions with the input of one instruction waiting on output of another. Memory is an array offset lookup so is inherently simplier than a hashtable lookup. Could also be a pointer copy to that location if that’s your thing. But this is getting into the weeds a bit on specific implementations which can vary between clients.

The addtional difference between TSTORE and MSTORE is TSTORE additionary requires the key (address: byte20, slot: byte32) as well as the byte32 value; so each new entry is requiring > x2 the memory of MSTORE which just writes to a position and is only one byte32 value.

Hence the suggestion that it is included in the memory expansion cost at a x2 rate:

e.g. memory use = memory slots + 2 x tstore slots

Rather than having its own load factor

charles-cooper · May 10, 2024, 12:07pm

The behavior is not conceptually different after that point.

SSTORE needs basically two lookups, one from the pre-transaction map and one from the journal:

github.com

ethereum/py-evm/blob/d8df507e42c885ddeca2b64bbb8f10705f075d3e/eth/vm/logic/storage.py#L92-L137


      
          current_value = computation.state.get_storage(
              address=computation.msg.storage_address,
              slot=slot,
          )
          
          original_value = computation.state.get_storage(
              address=computation.msg.storage_address, slot=slot, from_journal=False
          )
          
          gas_refund = 0
          
          if current_value == value:
              gas_cost = gas_schedule.sload_gas
          else:
              if original_value == current_value:
                  if original_value == 0:
                      gas_cost = gas_schedule.sstore_set_gas
                  else:
                      gas_cost = gas_schedule.sstore_reset_gas

This file has been truncated. show original

(Additional complexity “hidden” in SSTORE is that it results in needing to prepare and RLP serialize the new account trie at the end of the transaction. I think this is accounted for in the current net metering schedule)

I remember reasoning through this before and apparently forgot my conclusions . So, I take it back, not so easy to change the cost of SSTORE

In comparison, TSTORE is simple, it’s just a write to a single journal data structure:

github.com

ethereum/py-evm/blob/d8df507e42c885ddeca2b64bbb8f10705f075d3e/eth/vm/forks/cancun/state.py#L164-L168


      
          def get_transient_storage(self, address: Address, slot: int) -> bytes:
              return self.transient_storage.get_transient_storage(address, slot)
          
          def set_transient_storage(self, address: Address, slot: int, value: bytes) -> None:
              return self.transient_storage.set_transient_storage(address, slot, value)

benaadams · May 10, 2024, 12:13pm

One element of calldata is VeryLow (3) + Memory (3) => 6; so is question if TLOAD should be lower than calldata

So maybe something like

SLOAD cold => 2100 (Unchanged)
SLOAD warm => 6 (CallData 1 Word Read)
TLOAD => 6 (CallData 1 Word Read)
TSTORE => 10 (High) + inclusion in Memory expansion cost at x2 (key+value)
SSTORE => Unchanged

While SLOAD and TSTORE are slightly more complicated than a call data read; call data also includes 3 pops vs 1 pop for the loads, so evens out

charles-cooper · May 10, 2024, 12:35pm

You are thinking of push and pops to stack, as memory goes via stack no endian conversion happens.

So I think this might be a bit implementation specific, since different runtimes represent stack items differently. E.g. py-evm:

github.com

ethereum/py-evm/blob/d8df507e42c885ddeca2b64bbb8f10705f075d3e/eth/vm/logic/memory.py#L12-L42


      
          def mstore(computation: ComputationAPI) -> None:
              start_position = computation.stack_pop1_int()
              value = computation.stack_pop1_bytes()
          
              padded_value = value.rjust(32, b"\x00")
              normalized_value = padded_value[-32:]
          
              computation.extend_memory(start_position, 32)
          
              computation.memory_write(start_position, 32, normalized_value)
          
          
          def mstore8(computation: ComputationAPI) -> None:
              start_position = computation.stack_pop1_int()
              value = computation.stack_pop1_bytes()
          
              padded_value = value.rjust(1, b"\x00")
              normalized_value = padded_value[-1:]
          
              computation.extend_memory(start_position, 1)

This file has been truncated. show original

vs geth:

github.com

ethereum/go-ethereum/blob/e5f5eaebc4c7810e640ec0f95195e76eaf67095c/core/vm/instructions.go#L490-L502


      
          func opMload(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) {
          	v := scope.Stack.peek()
          	offset := int64(v.Uint64())
          	v.SetBytes(scope.Memory.GetPtr(offset, 32))
          	return nil, nil
          }
          
          func opMstore(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) {
          	// pop value of the stack
          	mStart, val := scope.Stack.pop(), scope.Stack.pop()
          	scope.Memory.Set32(mStart.Uint64(), &val)
          	return nil, nil
          }

But this is getting into the weeds a bit on specific implementations which can vary between clients.

Yes, definitely a little in the weeds + implementation specific. But all clients for LE architectures must do conversion sometimes (whether they do it “eagerly” a read/write from memory or “lazily” when something like ADD is requested, unless there are some really funky bigint implementations that I’m not aware of ), so I think it’s worth mentioning. Even on a BE arch, you still need to copy 32 bytes, because memory is mutable. The very best you can do is create a stack item which is copy-on-write.

Hence the suggestion that it is included in the memory expansion cost at a x2 rate:

e.g. memory use = memory slots + 2 x tstore slots

Rather than having its own load factor

It’s a thought, although I was under the impression that the quadratic pricing for memory is evil and we wanted to move away from it, see for instance Proposals to adjust memory gas costs

Since the superlinear pricing in both memory and transient storage is intended to prevent DOS, I think another way to think about it is: how should the practical memory bound increase as gas limit increases? I think DDR and L1-L3 cache do not scale over time a la Moore’s law the same way as CPU speed and storage cost (see references). I haven’t done the analysis, but I suspect it is something like log() or sqrt() compared to the above two variables, and this EIP bounds memory like sqrt(gas limit). We could try something to bound like log(gas limit), but I think that makes the pricing model more complicated in terms of math.

References:
Historical Memory Prices 1957+
Historical cost of computer memory and storage - Our World in Data

benaadams · May 10, 2024, 12:45pm

Then that will get resolved in an update to memory expansion prices

charles-cooper · May 10, 2024, 12:58pm

Sounds nice but I don’t want to reason about the interaction between memory expansion and transient storage expansion. For one thing, they have different scope, so the bounds behave differently under nested call scenarios (I guess the two important ones are “deep” nesting, recursion, and “broad” nesting: sequential calls). It’s also proven nearly impossible in practice to change memory expansion prices. I guess because they are so hard to reason about?

IMO it’s just simpler to have separate pricing formulas for them, and tune them separately. I’m open to fusing the pricing functions, but transient storage is just a different beast, I think it’s fine for it to have its own pricing model.

benaadams · May 10, 2024, 2:18pm

Ah, good point

Tangentally, we do want people to use Access Lists for other reasons; but they aren’t used much as the penalty for including an item and not using it is very high and will outweigh all savings so they end up not being used that much. So I’d withdraw the differentiation from access list warm and actual warm (edited my entry)

charles-cooper · June 6, 2024, 2:59pm

I’ve received some feedback offline suggesting the base cost is too close to memory base cost; I suspect the benchmark here is making memory look better than it really is, a better apples/apples benchmark would be to mload from many dirty memory locations, but I have not had time to run the benchmark yet.