EIP-615: Subroutines and Static Jumps for the EVM

gcolvin · February 25, 2019, 5:32pm

I don’t think you get path explosion with this proposal, (or with Wasm) @nevillegrech. The stack must be the same at control-flow joins, so you can just mark a path as taken. But I’ll leave it to @expede to better understand. The Abstract and perhaps a bit more might need to be more clear on this distinction.

nevillegrech · February 25, 2019, 5:52pm

No problem. To be clear, I didn’t suggest that path explosion had any thing to do with the proposal (or lack of it). It is just a characteristic of symbolic execution, as a technique.

gcolvin · February 25, 2019, 6:11pm

Got it. For this proposal symbolic execution is guaranteed to take linear time. I’m not sure about other static analysis approaches.

fulldecent · February 25, 2019, 11:12pm

I saw Brooklyn’s call for comments, so I’ll send a few. But posting into https://github.com/ethereum/EIPs/issues/615 because that is the official Discussion-To location.

gcolvin · February 26, 2019, 1:08am

Thanks! Comments are welcome in either place. The issue thread is older, and mostly a detailed review of the semantics by Sydney, who has since published work with Yoichi’s Lem formalization of the EVM, and also did a Lem formalization of this proposal. I should probably change the draft to point here.

gcolvin · February 26, 2019, 2:12am

Why we want static analysis:

"I want you to write a program that has to run in a concurrent environment under Byzantine circumstances where any adversary can invoke your program with any arguments of their choosing. The environment in which your program executes (and hence any direct or indirect environmental dependencies) is also under adversary control. If you make a single exploitable mistake or oversight in the implementation, or even in the logical design of the program, then either you personally or perhaps the users of your program could lose a substantial amount of money. Where your program will run, there is no legal recourse if things go wrong. Oh, and once you release the first version of your program, you can never change it. It has be right first time.

“I don’t think there are many experienced programmers that would fancy taking on this challenge. But call it ‘writing a smart contract’ and programmers are lining up around the block to have a go! Most of them it seems, get it wrong…”

https://blog.acolyer.org/2018/03/08/zeus-analyzing-safety-of-smart-contracts/

gcolvin · February 26, 2019, 2:18am

@fulldecent and others have complained about the size of this proposal. I’m pulling this response from the EIP issue. A fair comparison might be to eWasm, which provides essentially the same functionality.

This spec is about 15 pages, and it took about 300 lines of C++ to add it to the aleth interpreter. It is backwards-compatible by default.

There is no current eWasm EIP and implementation for direct comparison, but the Wasm spec itself is 150 pages, and only floating point and a few other non-deterministic operations aren’t relevant. eWasm also requires an Environment Interface between the VM and the client, and a Transcompiler of EVM to eWasm bytecode for backwards compatibility. The ewasm/design repo, which is currently mostly empty, gives a feel for the scope of the effort remaining.

expede · February 26, 2019, 2:30am

Thank you! Yes, good to keep the deeper discussion together in one thread

latrasis · February 26, 2019, 9:04pm

I would second @arachnid’s request to keep instruction set representation constant.

Unless I misunderstand the proposal, as a user, having multibyte encodings would make on-chain code validation and decoding harder to deal with since it would not be possible to know the immediate stack values. Example Validation.

gcolvin · February 26, 2019, 11:03pm

Nick convinced me too, and the current draft moves the jump tables into the data section. So all of the proposed instructions now take a fixed number of inline bytes, if any.

chfast · February 28, 2019, 3:55pm

I don’t agree that having this new encoding scheme for opcodes make it “a lot” more complex. It’s trivial to implement. Using PUSH as some kind of prefix might look that you don’t have to do any changes to EVM implementations, but if you want a fast EVM you probably will have to do something about it anyway. I’d like to see code diffs implementing both variants.

So maybe we will save some developers time in EVM code, but this pattern would have be recognized by all analysis tools, tracers, and finally smart contract developers. I’m not sure it is worth it.

That wasn’t the case for me before, but now I prefer to add new opcodes instead of trying to extend the semantics of existing ones as we did e.g. with SSTORE or CALL.

Anyway, I’d like to propose to leave the opcode encoding issue until the end. I’m sure we will find a way to encode new opcodes in a way that satisfies everyone.

gcolvin · February 28, 2019, 4:52pm

I’m recommending the same scheme that Wasm uses, which has been thoroughly reviewed and tested by now. And it satisfies @Arachnid’s problems with multibyte encodings. And we are getting very near the end

nevillegrech · February 28, 2019, 5:14pm

A few more thoughts:

I also think the complexity of having this feature is manageable, especially compared to something like ewasm. E.g., one shouldn’t underestimate the complexity of building a transpiler (which ideally is backed by formal proof) that will convert existing EVM to eWASM and preserve exactly the same semantics (incl. gas).

It would be interesting to document the required effort to implement EIP-615 for a static analysis framework like we use in contract-library.com and also quantify the benefit in fidelity we get at some point on a few large experimental programs. My estimate is that it would around a week to implement and test properly (with bytecode parsing being insignificant in terms of effort). The improvement in fidelity of the analysis would be well worth the effort in larger contracts.

Interestingly, it also seems like semantic-preserving conversion from EVM bytecode to YUL and vice versa should be easier after this change. My impression is there is hardly much difference in the abstraction level after EIP-615, except for types.

MrChico · February 28, 2019, 5:16pm

Can you explain what this encoding looks like?

gcolvin · February 28, 2019, 7:23pm

Which encoding? The encoding of the instructions is laid out in the [proposal] (https://github.com/ethereum/EIPs/blob/master/EIPS/eip-615.md) I’ve changed JUMPV and JUMPSUBV so that they have fixed-length immediate data.

Arachnid · March 1, 2019, 5:24am

I like the change; it provides for reuse of jump tables too, which is nice.

The vector is stored inline at the jump_targets offset after the BEGINDATA bytecode as MSB-first, two’s-complement, two-byte positive integers.

Why specify that they’re twos-complement if they must always be positive?

chfast · March 1, 2019, 5:08pm

They want to reserve the top most bit for future use. I think it should be “two’s-complement, signed, two-byte positive integers”.

Arachnid · March 1, 2019, 7:04pm

Two’s complement implies signed. But if you want to reserve the most significant bit, why not say so, instead of specifying signed numbers and then requiring them to be always positive?

expede · March 1, 2019, 7:06pm

I am inclined to agree with @Arachnid here. IMO reserving the top bit is more straightforward. Thoughts @gcolvin?

charles-cooper · March 1, 2019, 8:06pm

If the index is greater than or equal to n - 1 the last (default) offset is used.

Why not have the same behavior as INVALID or STOP? I suppose that would make it harder to prove the absence of an exceptional halting state, but it might be better than having a more nuanced bug.