EIP-1153: Transient storage opcodes

moodysalem · November 9, 2022, 9:39pm

Probably closer to 10 than 50, but when was the last time you saw the need for an in-memory map in solidity? Particularly one with any size? I don’t expect this to practically be used solely for in-memory maps, but regardless you can always clear the entries in the map before the end of the call.

You can’t reprice out a wasted storage load for a value that is always known to be 0 at the beginning of a transaction. @Philogy explains this rationale a few posts above.

If you’re marking some storage slots as transient via EOF, what is the practical difference? It seems to have all the same issues, except you can’t tell from the opcode alone if it’s a transient slot or not, so it seems even harder to analyze. Edit: furthermore, you cannot have mappings with keys that are determined at runtime (e.g. token addresses).

Storage refunds may stick around for original-new-original writes, but not necessarily zero-nonzero-zero writes if transient storage is available. This simplification alone would make storage refunds much easier to talk about since it removes a branching condition. Alternately, with transient storage, SSTORE/SLOAD pricing could be made completely dumb about caching and the contract can use a TSTORE/TLOAD mapping to move it to the application layer.

ekpyron · November 9, 2022, 10:12pm

As for storage pricing and refunds: the main point there is that you’re left with one cold load no matter what refunds and whatnot, right? Can a cold load from a full zero page in verkle trees be cheaper as well, hence allowing for reserving full pages as transient storage? Honestly, I have no idea, maybe not. Maybe even if, it’s messy and not a good idea to go that way. But that’s not my area of expertise. But the advantage of this direction is that the clearing is fully explicit and the EVM semantics stays simpler and there’s no implications for composability as Chris hinted at. (Of course at the cost of more complex gas accounting no matter what, but that complexity is actually less relevant for analysis and FV, since you can usually assume to have enough gas for that purpose anyways)

As for memory mappings: A full loop iteration for iterating a list I’d guess at 10-20 gas, so worst case you’re at 100 with 5 to 10 elements, average case 10 to 20 - that’s not a long way. (Granted that’s only comparing reading the thing, and writing to it before would be quite more costly in a transient storage version as well) But anyways, of course you don’t see memory maps around now, since they’re costly and a pain to implement right now - which may change with the ability of abusing transient storage for them, that’s the main point :-). And if you did that, clearing the map is actually the costly part (since then you need to keep and traverse a list of keys), so you’re not unlikely to not do that. (and I mean, implementing cheap memory mappings in actual memory has been requested from us, it’s just not feasible with current memory design, but it’s not too crazy of a concept in general)

Maybe that concern is ultimately unwarranted, but I’d maintain that it’s a valid concern in any case.

To avoid this issue alone, restricting the address space would help. I.e. if I just don’t have enough transient storage available (either by restricting the address range of transient storage explicitly or implicitly by it only being possible to mark a low count of storage slots as transient via EOF), I can’t abuse it in this way. On the other hand, one of the use cases I’ve read also appears to be passing complex data structures like mappings through calls, it’s probably impossible to keep that and prevent the abuse as call-local mappings at the same time (personally, I’d still argue that passing data around like that would be better done with more flexible calldata in principle, but anyways).

What marking slots in EOF indeed doesn’t account for either is the increase in semantic complexity, that stays the same and is a clear gaping con of transient storage in general. It may be valid to conclude that the pros outweigh this - I’m personally not convinced by that, but I can relate to the opposite position. The fact that this doesn’t even occur in the list of relative cons of the EIP and I don’t exactly see it conceded as a valid concern, made me worry if it is even properly weighed in at all, though. (Also not entirely sure if people doing FV would appreciate a position of basically “FV is too hard anyways, so it doesn’t matter if it gets even harder” ;-))

ekpyron · November 9, 2022, 10:56pm

But yeah, wrt memory mappings I guess you can get away with “well, then people just shouldn’t be doing that” and maybe that’s fair enough.

Wrt static analysis, auditing and FV, I’d at least want to make sure that this is given sufficient thought, since transient storage will make things more complex - that can’t just be denied entirely, can it? If that’s generally deemed worthwhile, that doesn’t make me happy, but is also fair - it just should be properly considered at all IMHO.

ehildenb · November 9, 2022, 11:28pm

These two posts basically sum up my feelings on the issue. The goal here is to provide a fundamentally different memory region, with different scoping, longevity, and pricing than any existing memory region? Then it should be a separate memory region, rather than leaving it up to the clients to cache smarter.

I am fairly confident this will not be hard to implement in KEVM (client that RV maintains), and I don’t think it will increase the complexity of verification using KEVM significantly. I can’t speak for other tools. I think with good usage of this feature, it may even reduce the complexity of some verification efforts (many variables you’ll be able to tell immediately that they cannot alias, for example, or if an entire modifier only uses transient variables, maybe we can have modular verification of the modifier more easily?).

It seems to me as well that it is not too much complexity for clients, because several clients have been modified to handle this new opcode, and tests have been provided (though maybe this could have been done earlier in the discussion, I know that having tests increases my confidence quite a bit: EIP-1153: Transient Storage tests by moodysalem · Pull Request #1091 · ethereum/tests · GitHub).

That being said, I cannot speak for other tools. Our semantics and verification is based on symbolic execution, which is different than other tools. I also can’t speak for the Solidity compiler, but does the Solidity compiler need to support the feature immediately? Can we let devs use inline assembly, let a few examples of how it’s being used trickle in, then give people the version of the feature that has compiler-guaranteed guardrails in place?

The first users of the feature, whether via the Solidity compiler or not, are going to be taking the brunt of risk here (risk of not understanding the new feature correctly, or risk of Solidity compiler behaving unexpectedly on the new feature). I am usually a fan of the “give the devs tools, and let them figure out how to not shoot themselves in the foot” approach. I do think that the devs opinions here are more valuable than my own opinion, they are the ones trying to innovate here.

Also shoutout to @pcaversaccio for the diagram, I find these types of visualizations very helpful.

ekpyron · November 9, 2022, 11:34pm

Ok, fair enough. And sure, we can provide plain assembly support immediately, properly optimizing may take a bit longer, high-level language support a bit longer still, but we can manage, I’m not so much concerned with that, but with the complexity of the language semantics that inherits the increase of the complexity of the EVM semantics.
But if this is deemed a non-concern, I consider myself beat on this.

ehildenb · November 9, 2022, 11:44pm

I do think it will be easy to make pathological hard-to-analyze code here.

But I don’t think these types of examples are what people will be trying to do formal verification on, and I think you could make the same or similar examples using normal storage.

Double-edged sword I guess.

SamWilsn · November 10, 2022, 3:08pm

If the TSTORE opcode is called within the context of a STATICCALL, the call must revert.

This is different wording than how STATICCALL handles writes in static contexts as defined in EIP-214. Is the behaviour intended to be different?

moodysalem · November 10, 2022, 3:16pm

It’s not intended to be different, and I can adjust to this if it sounds more accurate:

If the TSTORE opcode is called within the context of a STATICCALL, it will result in an exception instead of performing the modification.

SamWilsn · November 10, 2022, 3:19pm

I’d certainly appreciate it, but I’m also a pedant ;3

wbl · November 14, 2022, 5:29pm

I don’t think the language semantics get much messier: this storage behaves the same as indexing a variable by the transaction. Now the transaction hasn’t appeared before, so symbolic analysis based on model exploration will have to change, but it doesn’t seem to me to be that bad.

poojaranjan · December 1, 2022, 7:55pm

PEEPanEIP #91: EIP-1153: Transient storage opcodes with @moodysalem

charles-cooper · December 3, 2022, 7:29am

So I spoke about this at some length with @moodysalem this week and I want to say I do support this proposal in principle. Here’s why: since transactions are the unit of atomicity in the EVM, it makes sense to have a data location which is also transaction scoped. It allows the developer to “reason about transaction atomicity”, which, as demonstrated by both the existence of reentrancy bugs and techniques to deal with them, is an extremely useful thing to be able to reason about. (In fact, one could argue that memory should have been transaction-scoped to begin with, although it’s a bit late for that).

By way of example, another use-case this enables is “critical sections” - scoped sections of code which, while entered, do not allow reentrancy into the contract at all (via checking a transient storage slot before entering the selector table). This is possible with regular storage of course, but it incurs the cost of an SLOAD at every single call to the contract.

So if you think of transient storage as a tool for reasoning about transactions, transient storage not clearing after every call might be a feature, not a bug, since you can trace information about a txn (ex. how many times a contract has been called in a particular txn – which, if I am not mistaken, is not currently possible with existing opcodes in the EVM). I do think that the concerns voiced about making it potentially harder to reason about contracts are valid! But maybe the complexity is a basic complexity of smart contract development that needs to be reasoned about anyway, and by adding this data location to the EVM we are just making it explicit.

I do have the issues with the API that I voiced above. I also think “transient storage” is a confusing name, since the scope of the data is much more like memory than storage as far as most programmers would be concerned - the whole point of the proposal is that data is never “stored” to disk. A better name might be “long-lived memory” or simply “transaction-scoped memory”. Lastly, I am unconvinced that this proposal is strictly better than other proposals which provide some sort of transaction scoping, for instance the TXID proposal from Nikolai (rest in peace). I haven’t considered the alternatives long enough. But this proposal may indeed be the happy middle ground in terms of usability and the use-cases it enables.

As far as language implementation goes, Vyper team is happy to support it at the language level. As has been pointed out, our existing implementation is a PoC. In principle, it works! And you can probably use it to try out the feature! But we will not officially release as a language feature or put the level of effort necessary for production until the EIP is scheduled for a fork.

charles-cooper · December 3, 2022, 7:49am

I think using transient storage for “memory” mappings just because it has different addressing semantics from actual memory is an anti-pattern that can and will lead to the type of bugs that several people have raised concerns about in this thread. As I mentioned before, the lack of memory mappings in SC languages so far is a language restriction, not a VM restriction. You can see that C, C++, Python, Java, Rust, etc., all manage to implement map data structures with linear, not associative memory. As a language implementer myself - I would prefer that transient storage addresses the same way as memory, and to provide memory mappings as a language feature instead of having people fall back to transient storage mappings. Ultimately - transient storage should be used for things that require transient storage, and memory should be used for things that require memory. I think here is where the abstraction might leak, if developers are reaching for transient storage because of its addressing semantics instead of its volatility semantics.

wbl · December 6, 2022, 2:59pm

I’m not sure I really follow what these two messages say: I think the two points you want to make are that transient storage might not be the best name, and programmers reaching for this might be surprised by the semantics if they reach for it as an alternative to memory, so that we should change the transient storage semantics to match those of memory to avoid the temptation. I’d personally be happy to consider renaming to “transaction-scoped storage” if that conveys the intent better.

I don’t agree with changing the addressing model to match that of memory, especially not for the reason of mismatch leading people to misuse it. Fist compilers already have code generation that works with storage as addressed today, so it would be easy to take that code and have it output TSTORE TLOADs instead to interact with transient marked storage. Using something memorylike would be a higher lift, but I don’t maintain a compiler so happy to be corrected on this point. I do definitely want to be able to do storage like things like put structures in there. You might say in the future we can do it with memory-like addressing, but that’s not ready now.

I don’t think the temptation argument really works. If programmers want memory with storage-like addressing, the right solution is to give them what they want, not take away transaction-scoped storage for fear they will use it instead. They are adults capable of making their own tradeoff decisions and ending up with bugs as a result.

There is a strong reason to use associative addressing in Ethereum, namely cost alignment. Computation in the EVM is expensive. Having contracts reimplement in EVM associative maps when the Go code has them much more efficiently is a mistake imho. We should expose operations that are useful with costs reflective of the actual implementation cost+evaluation costs, rather than require a number of expensive operation invocations to achieve the same result.

I also think associative addressing avoids difficult allocation and reallocation problems that linear memory with its difficult sizes and costs for range spanned invites programs to run into. It’s just easier for everyone.

souptacular · December 12, 2022, 6:23pm

Can we get an update on both the client/EVM testing efforts and the DoS concerns that have been raised previously?

I mentioned it on Twitter just now (literally a minute ago, so don’t expect replies yet), but will leave the link in case anyone answers on there: https://twitter.com/hudsonjameson/status/1602366049911017496?s=20&t=6Dd1J9kgY8fBI2aEfOK8GA

@holiman: Expressing DoS concerns in April: Shanghai Planning · Issue #450 · ethereum/pm · GitHub

@moodysalem’s open PR with tests (seemingly just manual client tests, but has a few mention of DoS stuff): https://github.com/ethereum/tests/pull/1091

Philogy · December 16, 2022, 4:09pm

The semantics of transient storage opcodes as proposed in their current state don’t do anything that the existing storage opcodes don’t already do in terms of node memory usage. They’re both: bound to their respective accounts, persist across successful calls and lead to O(n) effort upon reverts. The main difference is that transient storage has a lower upfront gas cost due to it not needing to read / write to permanent storage.

This means that unlike storage there can be a larger set of changes that may need to be reverted in total (the “journal”). However this effort grows proportionally to the total amount of TSTOREs possible in one transaction, meaning it is / should be priced into the opcode. Based on the threads you shared this seems to be the main root of uncertainty around whether / not EIP1153 could be a DoS vector.

If it does look like TSTORE is priced too low at 100 gas then arguably the SSTORE opcode’s warm, dirty price should arguably also be changed.

moodysalem · December 27, 2022, 12:23am

With how storage reverts are currently implemented in geth, any DoS issue that exists for TSTORE will also exist for SSTORE. However there is no DoS issue with geth. This is covered in the EIP text.

There is also a test specifically for the worst case O(N) revert scenario in the etheruem/tests PR, which is run against all clients. We are pretty certain that there is no DoS issue, but having this on a multi-client testnet will allow us to further verify. ~~The EIP is blocked from merging into 2/5 clients because it’s not yet included in a HF.~~

moodysalem · March 12, 2023, 9:13pm

Documenting another use case for transient storage I stumbled upon: Add generic parameter to IBlockhashOracle interface · Issue #15 · paradigmxyz/zk-eth-rng · GitHub

This is similar to the fourth use case in the latest draft of the EIP:

Fee-on-transfer contracts: pay a fee to a token contract to unlock transfers for the duration of a transaction

More generally it might be stated:

Unlocking actions within the same transaction: a fee-on-transfer token contract might require a fee to be paid before unlocking a certain amount of token transfers, or a specific implementation of an oracle interface might require a proof to be submitted before a value can be read

Would appreciate feedback

0xJepsen · March 13, 2023, 11:35pm

Hey, I am not sure i understand how using something like the weiroll VM would introduce security concerns? Couldn’t you also just write an off-chain DSL for the calldata and map it however you like? We built something like the Weiroll VM for our needs of transient storage at Primitive and are happy with how it works. In fact because it is a FSM we can reason about it’s correctness much more powerfully than we would if it was a new opcode.

For Reference: GitHub - primitivefinance/portfolio: On-chain portfolio protocol for risk and liquidity management. and the FVM.sol is the file where make our own vm to handle these challenges.

blakewest · May 12, 2023, 10:00pm

Hey all, I’m the cofounder of Goldfinch. I wanted to share a potential use case that I think could be used to really open up the smart contract architectural design space, but which I don’t think I see in the EIP. I apologize if this has been discussed above.

The case I’m envisioning is a group of smart contracts that make up an on-chain app being able to work together more seamlessly, by storing certain user or transaction level data up front, and then downstream contracts being able to access it, knowing it’s correct. Sort of like a global request object might be used in a traditional web app. This pattern could allow for key data to be shared across an app’s network of smart contracts, allowing for the modularity and limitless size of the diamond pattern, but without needing any solidity tricks or complexities, as well as better permissioning across contracts

For example, imagine contract A is called, and msg.sender = 0xABC. Then contract A calls out to contract B in order to do some calculation. Contract B may need to verify that the original msg.sender is in fact 0xABC. But there’s no good way to do this in current Solidity. You could use tx.origin, but that has security issues from phishing, and is therefore frowned upon now. You could pass msg.sender to every single downstream function or contract, but that is pretty gross and complicates your functions.

The other option is to create a global config contract that all your other contracts have access to. Indeed, we tried this for a while with our smart contracts. The issue with this approach though is that all of your functions now require a storage slot to be written, and thus you can’t really have view functions. Which breaks a lot of things, and loses your ability to make those guarantees to others.

What I’m hoping is that if Transient Storage is guaranteed to be thrown away at the end of a transaction, then the compiler could still deem such a function to be a view function. Is that the case with the transient storage? I did not see any direct discussion of how transient storage would interact with view functions either here or in the EIP. What is the story there?

Thanks! - Blake