EIP-615: Subroutines and Static Jumps for the EVM

gcolvin · February 24, 2019, 8:47am

BEGINSUB, like the PUSHns, is a one-byte opcode with fixed-length immediate data. Seems no big deal. Variable-length immediate data is a big deal that I want to be rid of if it’s safe.

gcolvin · February 24, 2019, 9:16am

Probably a good idea, which is why I’m discussing it here before going to Last Call, and why Boris and Brooke have been meeting with implementers and auditors for feedback. And at this point Brooke has volunteered to do the Parity implementation and maybe someone (does @boris know?) has volunteered for geth.

My main fear of going to the core devs with less than a rock-solid proposal is getting sent away with instructions to perform unwanted surgery. The last time I did that C++ got auto_ptr, which then took 5 years to be deprecated and replaced by the original classes that the committee thought were too complex. (Those classes and their associated philosophy are now the bedrock of Rust’s memory management too. Much less complex than garbage collection. And consolation for years of rejection

expede · February 24, 2019, 11:18pm

DISCLAIMER: I just got off a 14-hour flight, about to step onto the next plane in a few minutes, and an jet lagged AF (ie: I’m not at my sharpest at the moment). Quick thoughts that will possibly get expended later:

One of the reasons it might fail is because it’s a large, monolithic change

I’m inclined to agree. A few times I’ve started breaking this proposal up into several sub proposals linked by the requires metadata field. As @gcolvin mentioned, EIP-615 has been in process since 2016(!), is referenced in Gavin Wood’s book, people know the number, it’s been discussed with lots of people that like it, &c &c &c. I do agree that some parts may be more controversial (the push/pop optimizations), and more granularity makes it easier to show progress on the portions that are absolute no-brainers.

Yes, they’re all part of a single strategy, but I’d call adding any number of them a win.

True, though compared to eWasm it’s tiny

Oh yeah: like a completely different scale of change! I think that the difference is that there is already political will to advance eWasm. Obviously I believe that there should be effort into improving the EVM, and the base changes (literally just subroutines and static jumps) are essentially uncontroversial.

Be careful; that assumption is what bought us the issues with net gas metering.

Yes, everything that goes into the spec should be solid. My views on having a formally-verified canonical spec are well known at this point Now if there was funding for these initiatives, that would be amazing!

expede · February 24, 2019, 11:28pm

It seems to me that this proposal complicates the EVM a lot compared to its existing status

I agree that the number of changes per proposal (“10 cpp” ) is high. Would it be more palatable spread over multiple EIPs? (honest question!)

My two cents worth: the current spec is deeeeeeeeeeeeep in the Turing Tarpit. The existing spec is possibly so simple that it’s causing issues. Most(?) existing clients are so simple that they’re not doing even the most straightforward performance optimizations. This is part of a strategy to improve that.

Simple doesn’t always mean small; in VM and PLT good design is generally accepted as following principles like orthogonality and extensibility. As much as I agree that less code is easier to maintain, there’s a balance to be struck between few moving parts, and a machine that’s easy to mechanistically optimize and verify.

expede · February 24, 2019, 11:33pm

I mean, ideally we get funding to do this There’s only so many unpaid projects that SPADE Co can take on. This one is near and dear to my heart, yes, but surely this type of core infrastructure is fundable! Let’s not fall into the tragedy of the commons!

gcolvin · February 25, 2019, 1:10am

Actually, no. If you remove dynamic jumps but don’t add indirect jumps you make switch statements and virtual functions slower, larger, and cost more gas than they do now. And providing stack frames for arguments and local variables but no instructions for directly accessing them is just silly. There really is a reason that most every CPU with a stack has all of these instructions.

gcolvin · February 25, 2019, 1:12am

Of course. I think testing the existing aleth implementation is sufficient, but the more merrier.

nevillegrech · February 25, 2019, 11:52am

Hi, Neville from contract-library.com. I support your proposal!

We routinely perform static analysis of all programs deployed on the mainnet. It is very hard for a scalable analysis to precisely figure out the jump targets of some of the dynamic jumps introduced by the Solidity compiler (particularly for implementing nested returns or call-with-continuation) especially after optimizations. A good static analysis tool needs to figure out the most complete, yet also the most precise subset of jump targets. The latter reduces false positives when running security analyses.

Introducing more structured jumps (private call and return) to the EVM bytecode language will facilitate the development of static analysis tools for EVM programs and will enable these tools to figure out a precise subset of jump targets. When dynamic jumps are eliminated, the bar for implementing static analysis tools for the EVM will be significantly lower. I guess most bytecode analysis tools today probably use symbolic execution techniques rather than static analysis (meaning abstract-interpretation and similar techniques) because of dynamic jumps.

Another change that would facilitate the development of static analysis tools for EVM bytecode is the balancing of stack depths at control-flow joins.

gcolvin · February 25, 2019, 3:02pm

Thanks, Neville. Could you clarify what you mean by static analysis versus symbolic execution?

I believe this is already a validity condition.

8 For every instruction in the code the frame size is constant.

nevillegrech · February 25, 2019, 4:16pm

Sorry, I missed condition 8. I’ve looked at validate_subroutine and indeed that condition should hold

In symbolic execution, each path is independently executed which allows targets of dynamic jumps in the case of function returns to be identified precisely in every path. Symbolic execution however misses many program behaviors (e.g. due to the problem of path explosion) and so it is limited in its applications. In other static analysis approaches, the state (e.g. values present in each stack position) is combined at control-flow joins. (Approaches exist to mitigate the loss in precision in this case, primarily context sensitivity).

gcolvin · February 25, 2019, 5:32pm

I don’t think you get path explosion with this proposal, (or with Wasm) @nevillegrech. The stack must be the same at control-flow joins, so you can just mark a path as taken. But I’ll leave it to @expede to better understand. The Abstract and perhaps a bit more might need to be more clear on this distinction.

nevillegrech · February 25, 2019, 5:52pm

No problem. To be clear, I didn’t suggest that path explosion had any thing to do with the proposal (or lack of it). It is just a characteristic of symbolic execution, as a technique.

gcolvin · February 25, 2019, 6:11pm

Got it. For this proposal symbolic execution is guaranteed to take linear time. I’m not sure about other static analysis approaches.

fulldecent · February 25, 2019, 11:12pm

I saw Brooklyn’s call for comments, so I’ll send a few. But posting into https://github.com/ethereum/EIPs/issues/615 because that is the official Discussion-To location.

gcolvin · February 26, 2019, 1:08am

Thanks! Comments are welcome in either place. The issue thread is older, and mostly a detailed review of the semantics by Sydney, who has since published work with Yoichi’s Lem formalization of the EVM, and also did a Lem formalization of this proposal. I should probably change the draft to point here.

gcolvin · February 26, 2019, 2:12am

Why we want static analysis:

"I want you to write a program that has to run in a concurrent environment under Byzantine circumstances where any adversary can invoke your program with any arguments of their choosing. The environment in which your program executes (and hence any direct or indirect environmental dependencies) is also under adversary control. If you make a single exploitable mistake or oversight in the implementation, or even in the logical design of the program, then either you personally or perhaps the users of your program could lose a substantial amount of money. Where your program will run, there is no legal recourse if things go wrong. Oh, and once you release the first version of your program, you can never change it. It has be right first time.

“I don’t think there are many experienced programmers that would fancy taking on this challenge. But call it ‘writing a smart contract’ and programmers are lining up around the block to have a go! Most of them it seems, get it wrong…”

gcolvin · February 26, 2019, 2:18am

@fulldecent and others have complained about the size of this proposal. I’m pulling this response from the EIP issue. A fair comparison might be to eWasm, which provides essentially the same functionality.

This spec is about 15 pages, and it took about 300 lines of C++ to add it to the aleth interpreter. It is backwards-compatible by default.

There is no current eWasm EIP and implementation for direct comparison, but the Wasm spec itself is 150 pages, and only floating point and a few other non-deterministic operations aren’t relevant. eWasm also requires an Environment Interface between the VM and the client, and a Transcompiler of EVM to eWasm bytecode for backwards compatibility. The ewasm/design repo, which is currently mostly empty, gives a feel for the scope of the effort remaining.

expede · February 26, 2019, 2:30am

Thank you! Yes, good to keep the deeper discussion together in one thread

latrasis · February 26, 2019, 9:04pm

I would second @arachnid’s request to keep instruction set representation constant.

Unless I misunderstand the proposal, as a user, having multibyte encodings would make on-chain code validation and decoding harder to deal with since it would not be possible to know the immediate stack values. Example Validation.

gcolvin · February 26, 2019, 11:03pm

Nick convinced me too, and the current draft moves the jump tables into the data section. So all of the proposed instructions now take a fixed number of inline bytes, if any.