So I was reading through this new Create3 Factory someone built using transient storage and noticed that there’s a bunch of additional gas overhead when using transient storage to store bytecode. Specifically because you need to write the bytecode from transient storage to memory before returning. I’m wondering why the RETURN opcode only allows for returning data from memory instead of transient or storage. It would probably save a decent amount of gas from both having fewer opcodes and memory expansion costs. I’m thinking about some kind of an EIP to replace RETURN with 3 more specific opcodes all with the same parameters:
MRETURN
SRETURN
TRETURN
They would all operate basically the same and with the same stack arguments but return data from that section of data storage accordingly. MRETURN and TRETURN would probably use the same gas formula as RETURN does currently. SRETURN would most likely use a more complicated equation similar to SLOAD now, based on warm/cold slots.
The best way to support backwards compatibility would probably be to just rename RETURN -> MRETURN and leave it as 0xF3 and then add SRETURN and TRETURN as new opcodes (Maybe 0xF6 and 0xF7). There is precedent for opcode-renaming, with changing SUICIDE to SELFDESTRUCT
I’m not well versed enough in things like static analysis or compilers so I would like to hear thoughts on how this would affect their design and operation, but it would definitely make it easier to optimize bytecode especially when done with assembly/yul/huff.
+1 on this one, sounds like a really nice optimization. Although MRETURN would probably be a bit different semantically from SRETURN and TRETURN as storage is divided into 32-byte chunks.
Would it be beneficial to also have a RRETURN opcode that returns returndata? I mean, currently if a contract A calls B and returns B's return, there is an unnecessary returndata copy introduced.
Very good points. Perhaps the best way is to do the following
MRETURN/RRETURN → Accepts a starting index and a length to return exactly as it is now
S/TRETURN → Accepts a starting slot number and a number of slots to return 32-byte increments, inclusive of the starting index, so SRETURN(0x0, 0x40) returns the data in slots [0, 1]
RETURN is currently 0xF3 so to maintain backwards compatibility I would suggest keeping it the same but renaming and setting the following.
SRETURN -> 0xF6 TRETURN -> 0xF7 RRETURN -> 0xF8
I’m curious what you think about gas pricing for S/TRETURN for cold and warm slots. The easiest implementation would be to have the cost be based on the number of slots, and if those slots are already warm/cold, I.E the cost of the opcode is the sum of each of those slots being accessed by independent SLOAD under current rules. However, it might make more sense to have it be a single cost, where if a single slot is cold, the entire read operation is treated as cold and priced higher accordingly.
The optimization comes from not having the loop to write to memory first which means that you should be able to keep the cost the same and still come out ahead at the end simply due to fewer read/write operations, no? The RRETURN cost definitely makes sense, although I wonder if you should get some kind of a discount on S/TRETURN because even through you’re accessing more slots, you are accessing them sequentially, incentivizing reducing state accesses through independent SLOADs.
As for EOF I don’t see why it would break anything. As I understand it EOF is supposed to make future EVM versions easier to integrate due to better versioning.
Yeah, so we can keep the pricing the same and benefit from the absence of memory expansion. To be honest, I am not super familiar with current EC implementations and if they have sequential storage reads.
EOF “deprecates” and introduces a lot of opcodes. Also, it makes the bytecode (easier) to statically analyze as all the “JUMPs” with “JUMPDESTs” won’t be used. But probably storage and returndata loads have nothing to do with this.
I’d love to hear from a static analysis dev since I also have very limited knowledge in that area but I would think that a more explicit opcode would make it easier instead of having to analyze a more complex read/write memory pattern. I don’t think EOF deprecates anything here that would be important. I’d have to dig into the actual EC implementations but it shouldn’t be that much additional overhead from retrieving sequential memory.
If the stateDB is external, than reducing the number of calls through batching or sequential reads should be less computationally expensive as I understand it, but I’d like to hear from a EC client dev as well. If this is true then we can discuss what the discount ought to be otherwise leaving it as the equation now seems reasonable to me.