EIP-615: Subroutines and Static Jumps for the EVM

gcolvin · March 30, 2019, 11:40pm

I agree. That’s why EIP-615 adds a DATA section and disallows invalid contract code elsewhere. I think @holiman’s concern is that in current contracts there may be bytes after a leading STOP opcode which look like a valid version identifier followed by executable bytecode, so the contract will no longer stop immediately, but will run the code.

sorpaas · March 31, 2019, 12:01am

Yeah, so my second argument in the previous post is that the risk of this wouldn’t be higher compared with the risk of adding new opcodes. When adding new opcodes, there is always a similar risk where a contract has “data” bytes that accidentally equal to the new opcodes.

We have been adding new opcodes just fine, so from the risk assessment point of view, it also shouldn’t be a problem for versioning prefix.

gcolvin · March 31, 2019, 12:31am

I understand your argument. I’m not sure whether to ban the practice, or simply admit that such programs are and will be in trouble. With EIP-615 they’ll be deprecated, probably banned later.

But I also think there are contracts that immediately stop on purpose.

Edit: One reason would be to store data after the leading null.

holiman · March 31, 2019, 12:47pm

No, I see now I must have been to vague. So my concern is that the EIP (note: this may be a misunderstanding on my part, I might just have missed something) vaguely says things like “… then the contract/ code is invalid”. And it emerges that at some point, there is a validation performed, saying “yes ok, this code is fine for deploying”.

So the EIP premise is that you can’t throw any code up there, only ‘valid’ code.

Now, here’s my concern: At any time before this ‘validity’-check becomes enforced, I can place valid-looking (magic bytes, hashes, the works) on chain, but the actual code contains ‘invalid’ things, like static jumps into data-sections, or generally breaking any of the invariants that the EIP promises.

And, needless to say, unless this behaviour is well defined, we have an immediate chain split here.

So, if an attacker does put such code there, it’s not sufficient to say “oh that’s invalid” – because how will we know it is invalid? The only way to detect that we just jumped into a data-section, violating the invariants promised by this eip, is … jumpdest analysis! PLUS basically redoing the entire validity-check that supposedly was done at deploy-time.

So, basically, as far as I see it, that removes any speed-gains that the new EIP static jumps would have brought to the EVM. For program flow analysis during development, this EIP offers nothing (because a superior flow analysis can be done using AST), and the only tangible gain is analysis/decompilation of evm bytecode.

Now, there are ways to solve this, but the only way I can think of is to modify the state storage; add flags or something to signify that this is ‘validated code’. The same bitflags could be used to signify ‘this is ewasm’.

So, please let me know what I missed, because I really don’t understand how this is intended to work.

Oh, and one last thing: I don’t think the current jumpdest analysis is that bad. It’s a one-time pass over the code, and the actual analysis size needs only be one eiigth of the code size – if using a bitmap for code/data sections. Checking that the destination is JUMPDEST can be done at the time of the jump. In geth, a jumpdest analysis is far faster than e.g. calculating the code hash.The entire jumpdest analysis is also lazily done at the first jump.

For reference – the jumpdest analysis in geth: go-ethereum/core/vm/analysis.go at master · ethereum/go-ethereum · GitHub

sorpaas · March 31, 2019, 2:31pm

Not commenting on the rest of the argument provided by @holiman, but I fully agree that jumpdest analysis is not bad. The bitmap can even be cached in state.

One thing I want to point out is that a lot of reasons why we lack more EVM optimizations are not because it cannot be optimized, but because we had the inexplicit intuitions shared among teams that it’s I/O, not EVM, that is the actual bottleneck for performance.

gcolvin · March 31, 2019, 2:56pm

I’m not much concerned about the cost of JUMPDEST analysys, although it will be done statically in EIP-615

This was your original concern. I saw here a more general concern that we wouldn’t know whether this contract was valid, but would run it anyway, with indeterminate results. So I started looking at schemes for identifying whether a contract was an EIP-615 contract, which led into general schemes for identifying and versioning contracts.

I think @sorpaas argues that this is just a special case of a bigger problem: that adding opcodes can change the execution of any program that contains them, so there is no reason to worry about this case in particular.

gcolvin · March 31, 2019, 3:03pm

This is likely true, though part of my work on optimizing the C++ interpreter was eventually stymied by dynamic jumps. And we hear many complaints from formal analysts about things like recognizing the contortions Solidity goes through to implement subroutines.

Edit: Also, compilers can produce good machine code from unstructured bytecode, but given structured bytecode (like Wasm) they can produce it in a single n log(n) pass. Compilers that can go quadratic are an attack surface.

holiman · March 31, 2019, 5:31pm

Ok good, then we’re on the same page. I thought we were on different tracks when you wrote “accidentally looking like…” and “the odds of the last 20 bytes just happening to be the right hash”, then it sounded like you were not addressing intentional attacks.

So then I guess the EIP is missing a lot of details on exactly how to deal with invalid “new” code.

gcolvin · March 31, 2019, 5:40pm

Exactly. I think we need some way to distinguish old and new code. Are we on the same page there?

None of the existing proposals for doing that with versioning work, except for adding a field to the account state, which has other problems.

Thus I’m asking @holiman whether a scheme that uses a hash of the bytecode can serve the purpose.

holiman · March 31, 2019, 6:42pm

I was about to answer “No, because how can you prevent me from adding the same hash to my malicious contract” … but then i figured out a scheme. Note, though, that it’s a hacky scheme that I wouldn’t recommend. But it would work, so I’ll present it even so.

So if we fork at block N, we could do hash= keccak(code_hash + hx) where hx is blockhash for block N-1. The attacker wouldn’t know beforehand what hx is, and can’t put the right hash in place. It’s butt-ugly because we’ll have to always remember/lookup hash hx every time we execute a contract (eventually hardcode it). Also, the codeHash in the trie won’t help us here, since the codeHash is the entire thing - includign the prepended hash. So we’d basically have to hash the code[32:] at every execution to check if it’s ‘legit’ or not.

My overall impression at this point though, is that the very high complexity of this EIP overshadows the gains. But then I’m coming from the evm-perspective, where I don’t see that it will speed things up that much (for the reason @sorpaas pointed out). Perhaps there are other perspectives than evm speed that are very important for other people – if so, I’d very much like to learn about those usecases more in-depth.

holiman · March 31, 2019, 6:59pm

Just throwing this random idea out there: what if we use a 31-byte codeHash for new code. Where…
codeHash := version(byte) ++ keccak256(code)[:30] . That means we’d lessen the actual hash strength from 32 bytes to 30, but we’d get a versioning method that can contain 256 variants. And we’d sacrifice one byte to signifify that this is versioned_codehash and not oldstyle codehash.

gcolvin · March 31, 2019, 7:16pm

This isn’t a just block N thing. It’s "was it deployed with the EIP-615 validator’’. If it was, then it is interpreted by those rules. Simple enough.

So the problem is how to tell if it was. This a general problem that needs to be solved regardless. One solution is EIP-1707 which says “after block N contracts will be deployed with a header containing a version identifier.” Simple enough.

But EIP-1707 may have a problem–old code that begins with STOP followed by data bytes can be mistaken for new code and executed with indeterminate results. One solution is to append a footer with a hash of the code. Whatever the scheme, it’s a bit of complexity. I think it’s not that much complexity in practice–we are pretty used to dealing with hashes, and the clients have a few at hand anyway.

If we are on the same page then getting EIP-1707 in place takes care of the version problem, so we know how to interpret the code. And it pushes this complexity out of EIP-615.

holiman · March 31, 2019, 7:41pm

How are those two not the same thing? Anything after block N is deployed with the EIP-615 validator, no? (N has nothing to do with the contract deployment block, it’s the fork block number)

The problem with 1707 is that it’s not exclusive. Any contract can opt-in on that. And EIP-615 requires exclusivity. Hashes won’t give you that, unless you use a scheme like mine where the forkblock-1 hash becomes magic.

So IMO 1707 does not help 615 (at least not the aspect I’m concerned about).

gcolvin · March 31, 2019, 7:56pm

Aha. No, the spec must allow for unvalidated new code, if only to support old code that deploys unvalidated code itself. It currently allows users to deploy unvalidated code in order to allow for a smooth transition.

I’m not sure what you mean by exclusivity, but I like magic. Please explain?

It seems that an sort of hash lets you tell whether code was deployed with 1707 versioning or not.

sorpaas · March 31, 2019, 8:27pm

I think I may understand what @holiman might mean by exclusivity (correct me if I’m wrong!). The issue is that we validate EIP-615 contract code on contract deployment, but an attacker can pre-deploy something that looks like it has that version, but is actually invalid.

Throwing an idea here: EIP-1891: Contract-based Account Versioning by sorpaas · Pull Request #1891 · ethereum/EIPs · GitHub
I think if we want to make sure version bytes cannot be faked, then we just cannot save it in account code. Changing account RLP structure to have an extra item definitely works (like EIP-1702 or some sorts), but why don’t we just store those extra items in a known contract’s storage?

gcolvin · March 31, 2019, 9:57pm

Aha. I see the problem, @sorpaas, whether it is what @holiman worries about or not.

Let’s say that the deployment mechanism always prepends a version identifier, including for code deployed at runtime by contacts that have already been deployed.

So the runtime can know that anything deployed after blocknumber N has a version identifier and run the appropriate VM.

No hash needed. Am I missing something?

sorpaas · March 31, 2019, 10:08pm

It still needs something like EIP-1891 or EIP-1702, because otherwise you have no way to know whether a contract is deployed before or after block N.

gcolvin · March 31, 2019, 10:21pm

Storing info in state or storage works for deployed code, but not otherwise. Seems good if we can just extend the existing Wasm header.

I’d suggest starting at

0x00evm0x010x000x00

for unvalidated code and

0x00evm0x010x050x00

for validated EVM-615 code. So we have asm.1.0.0, evm.1.0.0 and evm.1.5.0 to start with, and plenty of room for more.

gcolvin · March 31, 2019, 11:06pm

We are into parts of the runtime that I don’t understand. My assumption was that the runtime would have some way to know the block number of the transaction doing the creation of a block. Apparently I’m wrong.

In which case you are right. EIP-1702 makes the most sense to me, though I think that also maintaining a version header in the source code (as Wasm does anyway) makes sense.

sorpaas · March 31, 2019, 11:09pm

TBH if you let me choose from EIP-1702 and EIP-1891, I would choose EIP-1891. They both accomplish the same thing, but EIP-1702 is comparably more complicated to implement.