EIP-615: Subroutines and Static Jumps for the EVM

I definitely appreciate this as a political argument!

2 Likes

I do see this proposal as a logical whole, better as one proposal than three. None of the features are visible to a high-level programmer, and the analytic and performance gains are there only for programs that do not use dynamic jumps. But high-level compilers and assembly coders can use these features in concert to produce much better code.

1 Like

The idea is to execute transactions of 1 block concurrently, stopping at a barrier whenever there is a read from the state or a write to the state. Both read and write result in one of the transactions to successfully acquire an exclusive lock on the item being read or written, whereas other transaction wanting to access the same item, would need to wait. That exclusive lock would be held until the transaction completes the execution. Deadlocks need to be detected and need to result in one of the deadlocked transactions being aborted and restarted. This model would lead to what they call “Serialisable” transactional isolation.
I thought that symbolic execution coupled with control flow analysis can help to elide some of the locking, but I have not spent enough time thinking about it.

2 Likes

I’m really not a fan of the new complexity that this introduces to the instruction set representation. Currently, every instruction takes one byte, with the exception of PUSHn, which depends on the value of n.

This EIP introduces 10(!) new instructions, all but two of which have multibyte encodings.

As an alternative suggestion, why not instead take these arguments from the stack, but require that they were PUSHed immediately before? In that event, BEGINSUB n_args, n_results would be encoded as PUSHn n_args PUSHn n_results BEGINSUB, and removes the need for everyone to adopt new instruction decoding code. It would also remove the need for two of the new instructions - JUMPTO and JUMPIF can be represented using the existing JUMP and JUMPI instructions, but with the new restrictions.

PUTLOCAL and GETLOCAL introduce an entirely new type of memory and don’t seem to have any direct connection to the rest of this EIP. I think they should be in a separate EIP.

1 Like

@Arachnid . So sort of a reverse polish notation with extra PUSHes. A tiny bit verbose, and an unusual constraint on an instruction set. It might also complicate validation a little, as it would have to look backwards from these instructions to be sure the previous pushes were valid. Still, I’m open to the change, if we thought the complication would help enough users.

@Arachnid I don’t see how PUTLOCAL and GETLOCAL introduce new kinds of memory, they just provide an alternative to multiple DUPs and SWAPs for getting values where you want them on the stack. So not necessary, but useful and efficient. But as with JUMPV and JUMPSUBV they can be emulated with slower sequences of other instructions, despite being directly supported by Wasm and most all CPUs. If reducing the size of the proposal would make the difference to its acceptance then these would be the instructions to postpone.

Validators can do this fairly easily by calculating provenance on stack elements. Executors don’t need to care, and can just treat them as stack arguments.

I misunderstood how they work, sorry. I thought they accessed a ‘local variable storage’, but they access elements further down the stack at a location specified by a frame pointer.

I do still think that this EIP specifies several different modifications, and should be split into smaller, more concise EIPs. It would make it easier to review and approve them independently.

Fair enough. I’m still not sure I can write a regular grammar to express your idea, or how to put it in the Yellow Paper. I guess a back reference from the appendix where BEGINSUB is described to an extra exceptional halting state in the case that BEGINSUB would be executed with arguments on that the stack that were not the results of one of the PUSHn operations.

And yes, these could be three EIPs, with the condition that the second two depend on the first. I don’t know if that makes it easier or harder to evaluate the facility as a whole. Which is to say: This EIP offers the control-flow primitives provided by Wasm and by most every CPU ever. Shall we just put them all in now, or spend the next two years at it?

I should maybe add a table of corresponding EVM+615/Wasm/8086/ARM operations to clarify.

1 Like

I think one interesting way to think about it as 3 EIPs that exist atomically.

For example, if we get the first one done for Istanbul, but not the others, that’s good. If we get both the dependant ones in there for Istanbul, that’s great. If we get all 3 in time… That’s fantastic!

It’ll be good to have break points to de-risk the implementation steps and engineering (and social coordination of a fork)

Technical arguments aside, this has been EIP issue 615 since December of 2016, and EIP-615 Draft since April of 2017. I designed it as a whole and implemented it as whole. I’d rather move it as whole and decide what to do if fails, depending on why it fails.

One of the reasons it might fail is because it’s a large, monolithic change. I think I like the political calculation of rolling out all of it at the same time and attempting to get community buy in to make the change because it reduces the amount of coordination effort long term.

As a backup plan though, I am liking the 3 step approach for those of a more moderate risk appetite.

1 Like

I want to see this proposal succeed, because I’ve heard a lot of great feedback, but 10 opcodes definitely gives one pause, especially when we’ve had months of trouble getting half that many to work lol

1 Like

True, though compared to eWasm it’s tiny :wink:

I know the core devs have taken to arguing at length over individual opcodes, most of them variants on CALL with subtle security implications. They are not accustomed to discussing a computational facility with several opcodes and no security implications except gas costs. And even less accustomed to bringing a deficient VM up to the minimal state of the art.

I would like to here from language implementers how they would implement virtual functions without JUMPSUBV or similar.

1 Like

Lol, no comment™


I wouldn’t? With the gas model, there’s diminishing returns to more complex functionality since the expense of execution only makes certain coordination functions practical. Let’s not get too far down the rabbit hole of what’s possible and take the win here if we can get this implemented. :stuck_out_tongue_winking_eye:

Solidity has virtual functions.

It’s the gas model that makes the four “extra” opcodes so valuable. They can be implemented with one cheap interpreter instruction, or compiled to one wasm or machine instruction, but require expensive sequences of primitives otherwise. Long chains of comparisons and jumps. Long chains of dups and swaps. Lots of gas.

3 Likes

I’m incorporating changes for a later PR into the original proposal. Including this motivation:

Especially important is efficient translation to eWasm. To that end we maintain a close correspondence between the operations proposed here and Wasm.

Wasm EIP-615
br JUMPTO
br_if JUMPIF
br_table JUMPV
call JUMPSUB
call_indirect JUMPSUBV
return RETURN
get_local GETLOCAL
put_local PUTLOCAL
unreachable DATA
2 Likes

@Arachnid I think the biggest problems for your disassembler are JUMPV and JUMPSUBV, which have not just multiple arguments–like PUSHn–but a variable number of arguments. Wasm’s corresponding br_table and call_indirect avoid that problem by maintaining the tables of indirections separately from the instructions–not inline. I kept them inline for fear that one could write an exploit that used one table and lots of indirect jumps. If I’m being overly cautious we can copy Wasm and solve that problem.

1 Like

Be careful; that assumption is what bought us the issues with net gas metering.

I hadn’t noticed that.

It seems to me that this proposal complicates the EVM a lot compared to its existing status. I agree with the goals, but I also wonder if the complexity is worthwhile, especially with plans to migrate to new VMs in the future.

@Arachnid I’m specifically wondering if (and why) BEGINSUB 1 2 would be harder to disassemble than, say, PUSH1 3? You recognize the opcode, you skip the requisite number of bytes.

I can for sure see that JUMPV n 1 2 3 ... N is harder, and suggest it’s easy to fix unless that opens a security hole.