EIP-615: Subroutines and Static Jumps for the EVM

fubuloubu · February 23, 2019, 4:41pm

I definitely appreciate this as a political argument!

gcolvin · February 23, 2019, 4:44pm

I do see this proposal as a logical whole, better as one proposal than three. None of the features are visible to a high-level programmer, and the analytic and performance gains are there only for programs that do not use dynamic jumps. But high-level compilers and assembly coders can use these features in concert to produce much better code.

AlexeyAkhunov · February 23, 2019, 6:11pm

The idea is to execute transactions of 1 block concurrently, stopping at a barrier whenever there is a read from the state or a write to the state. Both read and write result in one of the transactions to successfully acquire an exclusive lock on the item being read or written, whereas other transaction wanting to access the same item, would need to wait. That exclusive lock would be held until the transaction completes the execution. Deadlocks need to be detected and need to result in one of the deadlocked transactions being aborted and restarted. This model would lead to what they call “Serialisable” transactional isolation.
I thought that symbolic execution coupled with control flow analysis can help to elide some of the locking, but I have not spent enough time thinking about it.

Arachnid · February 23, 2019, 6:22pm

I’m really not a fan of the new complexity that this introduces to the instruction set representation. Currently, every instruction takes one byte, with the exception of PUSHn, which depends on the value of n.

This EIP introduces 10(!) new instructions, all but two of which have multibyte encodings.

As an alternative suggestion, why not instead take these arguments from the stack, but require that they were PUSHed immediately before? In that event, BEGINSUB n_args, n_results would be encoded as PUSHn n_args PUSHn n_results BEGINSUB, and removes the need for everyone to adopt new instruction decoding code. It would also remove the need for two of the new instructions - JUMPTO and JUMPIF can be represented using the existing JUMP and JUMPI instructions, but with the new restrictions.

PUTLOCAL and GETLOCAL introduce an entirely new type of memory and don’t seem to have any direct connection to the rest of this EIP. I think they should be in a separate EIP.

gcolvin · February 23, 2019, 8:04pm

@Arachnid . So sort of a reverse polish notation with extra PUSHes. A tiny bit verbose, and an unusual constraint on an instruction set. It might also complicate validation a little, as it would have to look backwards from these instructions to be sure the previous pushes were valid. Still, I’m open to the change, if we thought the complication would help enough users.

gcolvin · February 23, 2019, 8:07pm

@Arachnid I don’t see how PUTLOCAL and GETLOCAL introduce new kinds of memory, they just provide an alternative to multiple DUPs and SWAPs for getting values where you want them on the stack. So not necessary, but useful and efficient. But as with JUMPV and JUMPSUBV they can be emulated with slower sequences of other instructions, despite being directly supported by Wasm and most all CPUs. If reducing the size of the proposal would make the difference to its acceptance then these would be the instructions to postpone.

Arachnid · February 23, 2019, 8:40pm

Validators can do this fairly easily by calculating provenance on stack elements. Executors don’t need to care, and can just treat them as stack arguments.

I misunderstood how they work, sorry. I thought they accessed a ‘local variable storage’, but they access elements further down the stack at a location specified by a frame pointer.

I do still think that this EIP specifies several different modifications, and should be split into smaller, more concise EIPs. It would make it easier to review and approve them independently.

gcolvin · February 23, 2019, 10:44pm

Fair enough. I’m still not sure I can write a regular grammar to express your idea, or how to put it in the Yellow Paper. I guess a back reference from the appendix where BEGINSUB is described to an extra exceptional halting state in the case that BEGINSUB would be executed with arguments on that the stack that were not the results of one of the PUSHn operations.

And yes, these could be three EIPs, with the condition that the second two depend on the first. I don’t know if that makes it easier or harder to evaluate the facility as a whole. Which is to say: This EIP offers the control-flow primitives provided by Wasm and by most every CPU ever. Shall we just put them all in now, or spend the next two years at it?

I should maybe add a table of corresponding EVM+615/Wasm/8086/ARM operations to clarify.

fubuloubu · February 23, 2019, 10:48pm

I think one interesting way to think about it as 3 EIPs that exist atomically.

For example, if we get the first one done for Istanbul, but not the others, that’s good. If we get both the dependant ones in there for Istanbul, that’s great. If we get all 3 in time… That’s fantastic!

It’ll be good to have break points to de-risk the implementation steps and engineering (and social coordination of a fork)

gcolvin · February 23, 2019, 11:03pm

Technical arguments aside, this has been EIP issue 615 since December of 2016, and EIP-615 Draft since April of 2017. I designed it as a whole and implemented it as whole. I’d rather move it as whole and decide what to do if fails, depending on why it fails.

fubuloubu · February 23, 2019, 11:07pm

One of the reasons it might fail is because it’s a large, monolithic change. I think I like the political calculation of rolling out all of it at the same time and attempting to get community buy in to make the change because it reduces the amount of coordination effort long term.

As a backup plan though, I am liking the 3 step approach for those of a more moderate risk appetite.

fubuloubu · February 23, 2019, 11:09pm

I want to see this proposal succeed, because I’ve heard a lot of great feedback, but 10 opcodes definitely gives one pause, especially when we’ve had months of trouble getting half that many to work lol

gcolvin · February 24, 2019, 12:02am

True, though compared to eWasm it’s tiny

I know the core devs have taken to arguing at length over individual opcodes, most of them variants on CALL with subtle security implications. They are not accustomed to discussing a computational facility with several opcodes and no security implications except gas costs. And even less accustomed to bringing a deficient VM up to the minimal state of the art.

I would like to here from language implementers how they would implement virtual functions without JUMPSUBV or similar.

fubuloubu · February 24, 2019, 12:18am

Lol, no comment™

I wouldn’t? With the gas model, there’s diminishing returns to more complex functionality since the expense of execution only makes certain coordination functions practical. Let’s not get too far down the rabbit hole of what’s possible and take the win here if we can get this implemented.

gcolvin · February 24, 2019, 12:19am

Solidity has virtual functions.

gcolvin · February 24, 2019, 12:24am

It’s the gas model that makes the four “extra” opcodes so valuable. They can be implemented with one cheap interpreter instruction, or compiled to one wasm or machine instruction, but require expensive sequences of primitives otherwise. Long chains of comparisons and jumps. Long chains of dups and swaps. Lots of gas.

gcolvin · February 24, 2019, 1:18am

I’m incorporating changes for a later PR into the original proposal. Including this motivation:

Especially important is efficient translation to eWasm. To that end we maintain a close correspondence between the operations proposed here and Wasm.

Wasm	EIP-615
br	JUMPTO
br_if	JUMPIF
br_table	JUMPV
call	JUMPSUB
call_indirect	JUMPSUBV
return	RETURN
get_local	GETLOCAL
put_local	PUTLOCAL
unreachable	DATA

gcolvin · February 24, 2019, 1:38am

@Arachnid I think the biggest problems for your disassembler are JUMPV and JUMPSUBV, which have not just multiple arguments–like PUSHn–but a variable number of arguments. Wasm’s corresponding br_table and call_indirect avoid that problem by maintaining the tables of indirections separately from the instructions–not inline. I kept them inline for fear that one could write an exploit that used one table and lots of indirect jumps. If I’m being overly cautious we can copy Wasm and solve that problem.

Arachnid · February 24, 2019, 3:43am

Be careful; that assumption is what bought us the issues with net gas metering.

I hadn’t noticed that.

It seems to me that this proposal complicates the EVM a lot compared to its existing status. I agree with the goals, but I also wonder if the complexity is worthwhile, especially with plans to migrate to new VMs in the future.

gcolvin · February 24, 2019, 6:16am

@Arachnid I’m specifically wondering if (and why) BEGINSUB 1 2 would be harder to disassemble than, say, PUSH1 3? You recognize the opcode, you skip the requisite number of bytes.

I can for sure see that JUMPV n 1 2 3 ... N is harder, and suggest it’s easy to fix unless that opens a security hole.