EIP-1153: Transient storage opcodes

Ethernian · June 23, 2018, 3:20pm

How much as should the opcode execution cost?
I would try to compare it with gas costs for usual message call.

AlexeyAkhunov · June 23, 2018, 7:50pm

I suggested 8 gas per TLOAD operation. It does not cost much more to read from other contract’s transient storage. An implementation would most probably already have some kind of nested hash table (contract_address => (storage address => value)) which can be access by 2-argument TLOAD.

Ethernian · July 11, 2018, 8:09am

Sorry for long delay, @AlexeyAkhunov.

All in all I see two proposals here:

introduce transient storage and TSTORE/TLOAD opcodes.
allow store reading opcodes (TLOAD, SLOAD) to read other contract’s storage.

I think (2) is interesting, because it is much less expensive than usual contract reading calls - we don’t need serialization/deserealization any more. It is some kind of reversing delegate call idea: we let our contract code read external storage.
If we think ethereum as a storage for public data like identities or merkle trees, offered onchain for other contracts, this proposal will help a lot. It allows powerful searches even if they are implemented later in user’s contract instead of in the public storage initially.
Nevertheless, (2) is possible without Transient Storage at all.
In case of permanent storage I am also not quite sure about 8 gas costs. Possibly we should limit an amount of other contract storages accessible by user’s contract, in order to ensure all calls are hitting the cache and not a disk. Then we can keep it low.

For (1) I am still unclear about meaningful distinct usecase.
Technically it means that some contract should create (not load from disk!) a reasonable amount of data (more than few flags for signalling) and offer it for access to other contracts. What use case should it be? I don’t know.

androlo · December 4, 2018, 1:05am

RETURNDATA could have leveraged this generic memory had it existed before, instead of having to put everything in a special memory that is only used for one single thing (and requires 2 instructions). Though most of us are extremely happy that they added it, it can also be argued that return data is “wrong” compared to adding a generic transaction-spanning memory, readable and writable through VM instructions, in a manner similar to what is proposed here.

Languages could easily work around control flow and all other type of issues. Just because they can use it doesn’t mean they have to. But you know all that. Solidity & EVM could just reserve the two first words for return data position and length, for example. The point is there would still only be 2 instructions, TLOAD & TSTORE, just like RETURNDATA & RETURNDATASIZE, but it can be used for a lot more. Return system could be deprecated. If the memory is different arrays mapped to addresses then maybe other transaction-spanning data could be mapped to it as well, for example, maybe CALLDATA could just be assigned to transMem[0x000…000]. It would still be read-only there.

Maybe taking it to far but there seem to be many potential use cases.

Arachnid · December 4, 2018, 1:28am

Bear in mind that this kind of global transient memory would make static analysis difficult or even impossible in many cases.

androlo · December 4, 2018, 6:04pm

This is interesting.

What do you think about updating the EIP to make it a map of arrays? I read some of your posts and the discussion here and in the gas netting thread and I saw that suggestion, but i will formulate it again. Basically, the key would be an ethereum address and the value a regular byte array. Informally:

Map<Address, byte[]> tStorage;

Perhaps it would require a temp version too so nothing is added until it is certain there is no reverts.

Instructions:

TSTORE (sAddr val) - pop 'sAddr' and 'val' and set tStore(_ADDRESS_)[sAddr] = val
TLOAD (accAddr, sAddr) - pop 'accAddr' and 'sAddr' and push tStore(accAddr)[sAddr]

_ADDRESS_ would be the address of the current account. Additionally there could be TCOPY or something which could work like CALLDATACOPY and CODECOPY works now, i.e. copying data from transient storage to memory. The EVM could use _ADDRESS_ = 0 for immutable data, since no account would be able to write there, and it could decide for itself what addresses to use so it doesn’t overwrite anything itself.

The good thing with this is that much of the EVM functionality that are now using separate instructions and storage locations could maybe be harmonized. For example, it could be used for all the things you say. It could be used for return data, and tx data - and maybe even contract-to-contract calldata.

There would be difficulties, and some of this is probably not practical, but it is at least an interesting discussion. There are plenty of instructions involved in passing data around (all the CALLs, and everything CALLDATA-related, and RETURN-related), and then there are reentrancy locks and such, so maybe it can all be part of one single EIP.

androlo · December 4, 2018, 6:10pm

The thing that makes this even worth talking about imo is that there is no good and uniform way to store data over an entire transaction execution, so everything related has to be done using things that feel like “hacks”, like using permanent storage for non permanent data, special call, calldata and return logic, and other things.

AlexeyAkhunov · December 4, 2018, 10:42pm

Yes, I suspect that when EVM was designed, one of perceived goal was maximum isolation for the sake of security. In the hindsight, that turned out to be quite restrictive, even for implementation of security-related primitives (like reentrancy locks).
I think a good way to go about it is to first redesign EVM completely given what we know now about our needs, and then make a reduction back to EVM, to see what are the best modifications. It would be quite an intense process though

androlo · December 5, 2018, 1:47am

Seems it’s pretty complicated to do it the byte array way. At least in a programming language. “static initialization” is no problem and could just be a flag in the transient storage array (a reserved address) that is set the first time the contract code is run during a transaction, and it would be possible to use that to prepare user defined variables, but the contract storage map type seems like the only good way to actually do that efficiently. But using a map instead of an array will of course make it terrible to use for arrays like return and call-data.

androlo · December 5, 2018, 2:11am

If no dynamic arrays or mappings are allowed:

{ // start of body code

transient uint x = 5;
transient bytes bts;
transient bool b;

// tStorage
// 0x0 : static initialization
// 0x20: free mem pointer

// assembly version of what would always run when contract is called, before any functions.
if(iszero(tload(address, 0)) { // is tStorage of this contract 0 at address 0x00
tstore(0x00, 0x01) // set init flag
tstore(0x40, 0x05) // init x
// 0x60 reference to ‘bts’
// 0x80 value of b
tstore(0x20, 0x100) // update free tstore pointer
}

}

AlexeyAkhunov · December 5, 2018, 9:22am

Yes, compilers would need to hide this problem away, but inserting the initialisation logic you showed later. Also, to calculate the size of storage require, and perhaps issue opcode to explicitly resize the storage. Although it makes life harder for compilers, it will eventually make life easier for developers. They will be able to use libraries (finally), closely integrate with eWASM, and stop using cryptographic hash function as a cheat data structure

androlo · December 6, 2018, 1:41am

I’ve implemented TLOAD/STORE as I suggested in a previous post (memory type array mapped to addresses).

I modified the LLL compiler and added the instructions in at 0x5C (TLOAD) and 0x5D (TSTORE), just after JUMPDEST. This is a contract doing a “static init” type routine before running:

{
	
	(def "T_INIT_ADDR" 0x00)
	(def "T_COUNTER_ADDR" 0x20)

	(def "tInit" (TLOAD (ADDRESS) T_INIT_ADDR))

	(def "tInitW" (val) (TSTORE T_INIT_ADDR val))

	(def "StaticInit" 
	    (unless tInit {
		(tInitW 1)
	    })
	)
	
	StaticInit
	
	(return tInit)
}

It compiles down to this:

6000305c600c57600160005d5b6000305c60005260206000f300

(the STOP at the end is auto injected by LLLC)

I also modified my own evm implementation to include the new opcodes. This is the output after running that code:

{
	"errno": 0,
	"errpc": 24,
	"returnData": "0000000000000000000000000000000000000000000000000000000000000001",
	"mem": "0000000000000000000000000000000000000000000000000000000000000001",
	"stack": [],
	"accounts": [
		{
			"address": "cd1722f2947def4cf144679da39c4c32bdc35681",
			"balance": "0",
			"nonce": 1,
			"code": "",
			"storage": [],
			"destroyed": false
		},
		{
			"address": "0f572e5295c57f15886f9b263e2f6d2d6c7b5ec6",
			"balance": "0",
			"nonce": 0,
			"code": "6000305c600c57600160005d5b6000305c60005260206000f300",
			"storage": [],
			"destroyed": false
		}
	],
	"logs": []
}

The EVM implementation was actually not hard, since I just created TStorage by modifying a copy of the data structure I use for normal storage, using memory structs as values (instead of just 32 byte ints). This of course would not be as simple to do in an actual fully featured EVM like the one in geth or parity…

Either way, I will continue to experiment a bit.

androlo · December 6, 2018, 2:12am

Here’s a more interesting example:

{
	
	(def "T_INIT_ADDR" 0x00)
	(def "T_COUNTER_ADDR" 0x20)

	(def "tInit" (TLOAD (ADDRESS) T_INIT_ADDR))
	(def "tCounter" (TLOAD (ADDRESS) T_COUNTER_ADDR))

	(def "tInitW" (val) (TSTORE T_INIT_ADDR val))
	(def "tCounterW" (val) (TSTORE T_COUNTER_ADDR val))

	(def "StaticInit" 
	    (unless tInit {
		(tInitW 0x01)
		(tCounterW 0x00)
	    })
	)
	
	StaticInit

	(unless (= tCounter 5) {
		(tCounterW (+ tCounter 1))
		(msg (ADDRESS) 0)
	})
	
	(return tCounter)
}

Calling itself a number of times, using the static counter to keep track of how many.

Output:

{
	"errno": 0,
	"errpc": 76,
	"returnData": "0000000000000000000000000000000000000000000000000000000000000005",
	"mem": "0000000000000000000000000000000000000000000000000000000000000005",
	"stack": [],
	"accounts": [
		{
			"address": "cd1722f2947def4cf144679da39c4c32bdc35681",
			"balance": "0",
			"nonce": 1,
			"code": "",
			"storage": [],
			"destroyed": false
		},
		{
			"address": "0f572e5295c57f15886f9b263e2f6d2d6c7b5ec6",
			"balance": "0",
			"nonce": 0,
			"code": "6000305c601157600160005d600060205d5b60056020305c1460405760016020305c0160205d6000600052602060006020600060003060155a03f150600051505b6020305c60005260206000f300",
			"storage": [],
			"destroyed": false
		}
	],
	"logs": []
}

I haven’t added in any revert protection yet but that won’t be hard since all it needs is logic similar to how dirty account modifications are discarded, which i have.

androlo · December 6, 2018, 3:01am

Even more interesting one, deploying contract and do static init in both init and body. The code should increment the counter twice, once in init and once in body (because of the call). The value written to storage address 0x20 in the first contract is the value read from the other contract’s transient storage, i.e. a message has been passed between contracts without using return.

{
	(def "T_INIT_ADDR" 0x00)
	(def "T_COUNTER_ADDR" 0x20)

	(def "tInit" (TLOAD (ADDRESS) T_INIT_ADDR))
	(def "tCounter" (TLOAD (ADDRESS) T_COUNTER_ADDR))

	(def "tInitW" (val) (TSTORE T_INIT_ADDR val))
	(def "tCounterW" (val) (TSTORE T_COUNTER_ADDR val))

	(def "StaticInit" 
	    (unless tInit {
		(tInitW 0x01)
		(tCounterW 0x00)
	    })
	)
	
	[0x20] (create {
	
		StaticInit
		(tCounterW (+ tCounter 1))

		(returnlll {
			StaticInit
			(tCounterW (+ tCounter 1))
			(return 0)
		})
	})
	
	(msg @0x20 0)
	[[0x00]] (TLOAD @0x20 T_COUNTER_ADDR)
}

The contract with address 0f572e5295c57f15886f9b263e2f6d2d6c7b5ec6 is the deploying one, using the default contract address for evm invocations like these. The one being deployed is 5ecfbe86fcd903321c505cb5c8a5de6331e2e7b1.

{
	"errno": 0,
	"errpc": 64,
	"returnData": "",
	"mem": "00000000000000000000000000000000000000000000000000000000000000000000000000000000000000005ecfbe86fcd903321c505cb5c8a5de6331e2e7b1296000396000f300fe6000305c601157600160005d600060205d5b60016020305c0160205d600060005260206000f30000000000000000000000000000000000",
	"stack": [],
	"accounts": [
		{
			"address": "cd1722f2947def4cf144679da39c4c32bdc35681",
			"balance": "0",
			"nonce": 1,
			"code": "",
			"storage": [],
			"destroyed": false
		},
		{
			"address": "0f572e5295c57f15886f9b263e2f6d2d6c7b5ec6",
			"balance": "0",
			"nonce": 1,
			"code": "6000600052596000526050806042600051396000516000f060205260006000526020600060206000600060205160155a03f1506000515060206020515c60005500fe6000305c601157600160005d600060205d5b60016020305c0160205d60278060296000396000f300fe6000305c601157600160005d600060205d5b60016020305c0160205d600060005260206000f300",
			"storage": [
				{
					"address": "0",
					"value": "2"
				}
			],
			"destroyed": false
		},
		{
			"address": "5ecfbe86fcd903321c505cb5c8a5de6331e2e7b1",
			"balance": "0",
			"nonce": 0,
			"code": "6000305c601157600160005d600060205d5b60016020305c0160205d600060005260206000f300",
			"storage": [],
			"destroyed": false
		}
	],
	"logs": []
}

Here’s an even more advanced one. The first contract writes a message into its transient storage then calls the other. The other contract looks at the transient storage of (CALLER) to see if they left them a message. If so, and if it’s the correct one, it will respond. The first contract checks the targets transient storage after the call and writes the response into its regular storage.

{
	(def "T_INIT_ADDR" 0x00)
	(def "T_MSG_ADDR" 0x20)

	(def "tInit" (TLOAD (ADDRESS) T_INIT_ADDR))
	(def "tMsg" (TLOAD (ADDRESS) T_MSG_ADDR))

	(def "tInitW" (val) (TSTORE T_INIT_ADDR val))
	(def "tMsgW" (val) (TSTORE T_MSG_ADDR val))

	(def "StaticInit" 
	    (unless tInit {
		(tInitW 0x01)
		(tMsgW 0x00)
	    })
	)
	
	[0x20] (create {
	
		StaticInit

		(returnlll {
			StaticInit
			(when (= (TLOAD (CALLER) T_MSG_ADDR) "Here's ur message.") (tMsgW "Thanks, bro."))
			(return 0)
		})
	})

	(tMsgW "Here's ur message.")
	(msg @0x20 0)
	[[0x00]] (TLOAD @0x20 T_MSG_ADDR)
}

andreas@AndreasLT:~/solevm/bin$ node run.js
{
	"errno": 0,
	"errpc": 101,
	"returnData": "",
	"mem": "00000000000000000000000000000000000000000000000000000000000000000000000000000000000000005ecfbe86fcd903321c505cb5c8a5de6331e2e7b16000305c601157600160005d600060205d5b7f486572652773207572206d6573736167652e00000000000000000000000000006020335c14156060577f5468616e6b732c2062726f2e000000000000000000000000000000000000000060205d5b600060005260206000f3000000000000000000000000000000000000000000",
	"stack": [],
	"accounts": [
		{
			"address": "cd1722f2947def4cf144679da39c4c32bdc35681",
			"balance": "0",
			"nonce": 1,
			"code": "",
			"storage": [],
			"destroyed": false
		},
		{
			"address": "0f572e5295c57f15886f9b263e2f6d2d6c7b5ec6",
			"balance": "0",
			"nonce": 1,
			"code": "600060005259600052608c80610067600051396000516000f06020527f486572652773207572206d6573736167652e000000000000000000000000000060205d60006000526020600060206000600060205160155a03f1506000515060206020515c60005500fe6000305c601157600160005d600060205d5b606c806100206000396000f300fe6000305c601157600160005d600060205d5b7f486572652773207572206d6573736167652e00000000000000000000000000006020335c14156060577f5468616e6b732c2062726f2e000000000000000000000000000000000000000060205d5b600060005260206000f300",
			"storage": [
				{
					"address": "0",
					"value": "5468616e6b732c2062726f2e0000000000000000000000000000000000000000"
				}
			],
			"destroyed": false
		},
		{
			"address": "5ecfbe86fcd903321c505cb5c8a5de6331e2e7b1",
			"balance": "0",
			"nonce": 0,
			"code": "6000305c601157600160005d600060205d5b7f486572652773207572206d6573736167652e00000000000000000000000000006020335c14156060577f5468616e6b732c2062726f2e000000000000000000000000000000000000000060205d5b600060005260206000f300",
			"storage": [],
			"destroyed": false
		}
	],
	"logs": []
}

AlexeyAkhunov · December 6, 2018, 9:50am

androlo:

Instructions:

TSTORE (sAddr val) - pop 'sAddr' and 'val' and set tStore(_ADDRESS_)[sAddr] = val
TLOAD (accAddr, sAddr) - pop 'accAddr' and 'sAddr' and push tStore(accAddr)[sAddr]

_ADDRESS_ would be the address of the current account

Great work! I need some time to catch up with you. so TLOAD above allows reading from other contract’s transient storage? So that it allows arbitrary message passing across contracts in a frame?

androlo · December 6, 2018, 3:25pm

Yes. I think I saw you write this in some other post that this could be a goal? Maybe you did not put it exactly like this but it seems like a good start.

The way I implemented it as of right now, every account get an additional memory that it can read and write to and the data will remain throughout an entire transaction (i.e. in between contract-to-contract calls). They can also read from other contract’s transient storage “memory”.

The last example i posted above is essentially a call that involves calldata + returndata, but uses only the transient storage.

I am making a repo which i will upload soon, with some stuff from this thread and ideas on various things like how to structure call/returndata. It also shows instructions on how to run my modified solidity evm (until a better alternative is done, maybe using pyeth or something). Maybe I upload tomorrow or today.

BTW the modified LLLC can be found at: https://github.com/androlo/solidity

It is in the ‘tstore’ branch, just build like normal (with -DLLL=ON) and it will understand:

TLOAD accountAddress storageAddress (0x5C)

TSTORE storageAddress value (0x5D)

TCOPY accountAddress storageAddress memoryAddress length (0x5E)

Copy is just a simple way to move data to memory, since it would be very useful for return/calldata, but I guess for reentrancy checks and such it is optional.

EDIT:

repo is here: https://github.com/androlo/tstorage

androlo · December 6, 2018, 7:23pm

Honestly, I don’t understand why this storage would be worse, or why it would make code analysis more difficult. If a general purpose memory could simplify or even deprecate several other instructions, along with their special purpose memories it should imo make analysis easier and not harder. Shouldn’t it make the model of computation simpler? This could probably be proven I’m guessing.

Sure, it would make return and calldata “volatile” in that it could change when making a call in the code, but that’s already the case for returndata afaik. also i don’t understand why regular calldata needs to be some holy data that can’t be edited. tStorage for 0x000…000 could be used by EVM to store the transaction input, which means it would be accessible throughout the entire transaction, from all contracts, sort of like argv. it would also be immutable. The data passed between one contract and another could just be managed with TLOAD/TSTORE using a standard, for example in the repo i show how it can have standard reserved addresses for length and location.

AlexeyAkhunov · December 6, 2018, 7:31pm

I believe this type of storage would be better. And do think reading from other contract’s transient storage/memory could be a useful resource. I just need some time to process what you have done so far. These opcodes might be quite useful when we start integrating eWASM into Ethereum 1.0 (Or Ethereum 1x).

androlo · December 6, 2018, 11:41pm

Added some solidity versions to repo too. They compile with the modified compiler i link to in the docs and runs in the solevm I link to as well.

Below is example of an already posted LLL contract that has a static counter. A bit contrived but it maybe shows a bit better what the idea is to people who don’t know LLL.

pragma solidity ^0.5.0;

// compile bin-runtime and use '364497e4' as input argument after the bytecode with ./bin/run

contract Test {

    function __STATIC_INIT() private view {
        assembly {
            if iszero(tload(address, 0x0)) {
                tstore(0x00, 1)
                tstore(0x20, 0)
            }
        }
    }

    function counterIncrease() private view {
        assembly {
            tstore(0x20, add(tload(address, 0x20), 1))
        }
    }

    function counterGet() private view returns (uint) {
        uint ctr;
        assembly {
            ctr := tload(address, 0x20)
        }
        return ctr;
    }

    constructor() public {
        __STATIC_INIT();
    }

    function body() public view returns (uint) {
        __STATIC_INIT();

        if (counterGet() < 5) {
            counterIncrease();
            this.body();
        }

        return counterGet();
    }
}

AlexeyAkhunov · December 7, 2018, 11:37am

Looks good! I am really grateful to you for researching these ideas. What do you think the main advantage of using transient storage/memory would be? I thought about libraries - in the current form, and not just libraries for computing some functions, but more importantly, libraries for dealing with data structures (hash tables, balanced trees, skip lists). Of course, it would be great to have the non-transient storage to be linear too, so that you can map part of the storage to the transient memory, pass it to the library (for example, to a balanced tree library), which will modify the structure, and then the owner of the storage commits it back to the storage. I am not suggesting you do all this work, but I am really intrigued about what Ethereum would look like if EVM was designed with linear storage, how much more efficient and convenient it would be. That is one of the reason I attempted to introduce linear cross-contract storage in State Rent proposal. It is likely that it won’t be implemented as part of the State Rent, but I still think the idea of linear storage is important. And perhaps it will be a requirement for proper integration with eWASM