EIP-615: Subroutines and Static Jumps for the EVM

fubuloubu · February 24, 2019, 12:18am

Lol, no comment™

I wouldn’t? With the gas model, there’s diminishing returns to more complex functionality since the expense of execution only makes certain coordination functions practical. Let’s not get too far down the rabbit hole of what’s possible and take the win here if we can get this implemented.

gcolvin · February 24, 2019, 12:19am

Solidity has virtual functions.

gcolvin · February 24, 2019, 12:24am

It’s the gas model that makes the four “extra” opcodes so valuable. They can be implemented with one cheap interpreter instruction, or compiled to one wasm or machine instruction, but require expensive sequences of primitives otherwise. Long chains of comparisons and jumps. Long chains of dups and swaps. Lots of gas.

gcolvin · February 24, 2019, 1:18am

I’m incorporating changes for a later PR into the original proposal. Including this motivation:

Especially important is efficient translation to eWasm. To that end we maintain a close correspondence between the operations proposed here and Wasm.

Wasm	EIP-615
br	JUMPTO
br_if	JUMPIF
br_table	JUMPV
call	JUMPSUB
call_indirect	JUMPSUBV
return	RETURN
get_local	GETLOCAL
put_local	PUTLOCAL
unreachable	DATA

gcolvin · February 24, 2019, 1:38am

@Arachnid I think the biggest problems for your disassembler are JUMPV and JUMPSUBV, which have not just multiple arguments–like PUSHn–but a variable number of arguments. Wasm’s corresponding br_table and call_indirect avoid that problem by maintaining the tables of indirections separately from the instructions–not inline. I kept them inline for fear that one could write an exploit that used one table and lots of indirect jumps. If I’m being overly cautious we can copy Wasm and solve that problem.

Arachnid · February 24, 2019, 3:43am

Be careful; that assumption is what bought us the issues with net gas metering.

I hadn’t noticed that.

It seems to me that this proposal complicates the EVM a lot compared to its existing status. I agree with the goals, but I also wonder if the complexity is worthwhile, especially with plans to migrate to new VMs in the future.

gcolvin · February 24, 2019, 6:16am

@Arachnid I’m specifically wondering if (and why) BEGINSUB 1 2 would be harder to disassemble than, say, PUSH1 3? You recognize the opcode, you skip the requisite number of bytes.

I can for sure see that JUMPV n 1 2 3 ... N is harder, and suggest it’s easy to fix unless that opens a security hole.

Arachnid · February 24, 2019, 6:22am

Because presently all opcodes except pushn are one byte long, and all the push opcodes can be handled with one special case clause. Surely you must see that this complicates the EVM ISA substantially.

gcolvin · February 24, 2019, 8:14am

Yes, the power and simplicity of JUMP and JUMPI are such that they can be used in complex ways to emulate 7 of these 10 opcodes. Solidity does so use them. The price of this power is that many kinds of useful analyses become difficult or impossible. Better, I keep arguing, to just provide the same opcodes Wasm does.

As for plans to move the mainchain to other VMs. The only eWasm EIP is a three year old issue, was never implemented, was never submitted to the EIP editors, was never publicly submitted to the core devs, and is now closed. This simpler, much less disruptive EIP draft and its implementation have been sidelined for over two years now by eWasm plans. Last I talked to the eWasm team they thought it was time to get it moving again.

Arachnid · February 24, 2019, 8:35am

Can I suggest putting this as an agenda item on a future All Core Devs for discussion before Last Call / Final status? I think it’d be good to get input from implementers, and I don’t think it will get enough attention as a discussion thread here.

gcolvin · February 24, 2019, 8:47am

BEGINSUB, like the PUSHns, is a one-byte opcode with fixed-length immediate data. Seems no big deal. Variable-length immediate data is a big deal that I want to be rid of if it’s safe.

gcolvin · February 24, 2019, 9:16am

Probably a good idea, which is why I’m discussing it here before going to Last Call, and why Boris and Brooke have been meeting with implementers and auditors for feedback. And at this point Brooke has volunteered to do the Parity implementation and maybe someone (does @boris know?) has volunteered for geth.

My main fear of going to the core devs with less than a rock-solid proposal is getting sent away with instructions to perform unwanted surgery. The last time I did that C++ got auto_ptr, which then took 5 years to be deprecated and replaced by the original classes that the committee thought were too complex. (Those classes and their associated philosophy are now the bedrock of Rust’s memory management too. Much less complex than garbage collection. And consolation for years of rejection

expede · February 24, 2019, 11:18pm

DISCLAIMER: I just got off a 14-hour flight, about to step onto the next plane in a few minutes, and an jet lagged AF (ie: I’m not at my sharpest at the moment). Quick thoughts that will possibly get expended later:

One of the reasons it might fail is because it’s a large, monolithic change

I’m inclined to agree. A few times I’ve started breaking this proposal up into several sub proposals linked by the requires metadata field. As @gcolvin mentioned, EIP-615 has been in process since 2016(!), is referenced in Gavin Wood’s book, people know the number, it’s been discussed with lots of people that like it, &c &c &c. I do agree that some parts may be more controversial (the push/pop optimizations), and more granularity makes it easier to show progress on the portions that are absolute no-brainers.

Yes, they’re all part of a single strategy, but I’d call adding any number of them a win.

True, though compared to eWasm it’s tiny

Oh yeah: like a completely different scale of change! I think that the difference is that there is already political will to advance eWasm. Obviously I believe that there should be effort into improving the EVM, and the base changes (literally just subroutines and static jumps) are essentially uncontroversial.

Be careful; that assumption is what bought us the issues with net gas metering.

Yes, everything that goes into the spec should be solid. My views on having a formally-verified canonical spec are well known at this point Now if there was funding for these initiatives, that would be amazing!

expede · February 24, 2019, 11:28pm

It seems to me that this proposal complicates the EVM a lot compared to its existing status

I agree that the number of changes per proposal (“10 cpp” ) is high. Would it be more palatable spread over multiple EIPs? (honest question!)

My two cents worth: the current spec is deeeeeeeeeeeeep in the Turing Tarpit. The existing spec is possibly so simple that it’s causing issues. Most(?) existing clients are so simple that they’re not doing even the most straightforward performance optimizations. This is part of a strategy to improve that.

Simple doesn’t always mean small; in VM and PLT good design is generally accepted as following principles like orthogonality and extensibility. As much as I agree that less code is easier to maintain, there’s a balance to be struck between few moving parts, and a machine that’s easy to mechanistically optimize and verify.

expede · February 24, 2019, 11:33pm

I mean, ideally we get funding to do this There’s only so many unpaid projects that SPADE Co can take on. This one is near and dear to my heart, yes, but surely this type of core infrastructure is fundable! Let’s not fall into the tragedy of the commons!

gcolvin · February 25, 2019, 1:10am

Actually, no. If you remove dynamic jumps but don’t add indirect jumps you make switch statements and virtual functions slower, larger, and cost more gas than they do now. And providing stack frames for arguments and local variables but no instructions for directly accessing them is just silly. There really is a reason that most every CPU with a stack has all of these instructions.

gcolvin · February 25, 2019, 1:12am

Of course. I think testing the existing aleth implementation is sufficient, but the more merrier.

nevillegrech · February 25, 2019, 11:52am

Hi, Neville from contract-library.com. I support your proposal!

We routinely perform static analysis of all programs deployed on the mainnet. It is very hard for a scalable analysis to precisely figure out the jump targets of some of the dynamic jumps introduced by the Solidity compiler (particularly for implementing nested returns or call-with-continuation) especially after optimizations. A good static analysis tool needs to figure out the most complete, yet also the most precise subset of jump targets. The latter reduces false positives when running security analyses.

Introducing more structured jumps (private call and return) to the EVM bytecode language will facilitate the development of static analysis tools for EVM programs and will enable these tools to figure out a precise subset of jump targets. When dynamic jumps are eliminated, the bar for implementing static analysis tools for the EVM will be significantly lower. I guess most bytecode analysis tools today probably use symbolic execution techniques rather than static analysis (meaning abstract-interpretation and similar techniques) because of dynamic jumps.

Another change that would facilitate the development of static analysis tools for EVM bytecode is the balancing of stack depths at control-flow joins.

gcolvin · February 25, 2019, 3:02pm

Thanks, Neville. Could you clarify what you mean by static analysis versus symbolic execution?

I believe this is already a validity condition.

8 For every instruction in the code the frame size is constant.

nevillegrech · February 25, 2019, 4:16pm

Sorry, I missed condition 8. I’ve looked at validate_subroutine and indeed that condition should hold

In symbolic execution, each path is independently executed which allows targets of dynamic jumps in the case of function returns to be identified precisely in every path. Symbolic execution however misses many program behaviors (e.g. due to the problem of path explosion) and so it is limited in its applications. In other static analysis approaches, the state (e.g. values present in each stack position) is combined at control-flow joins. (Approaches exist to mitigate the loss in precision in this case, primarily context sensitivity).