EIP-2315 Simple Subroutines for the EVM

It seems everyone has been assuming otherwise. Why are they wrong?

Why the 0x00ff?

Taking my own bait. Extending the turbo-geth code to handle JUMPSUB (original spec - to JUMPDESTs) I get the following sketch. I can’t see any problem – this jumpdest analysis should find and mark all of the immediate data for the PUSH* opcodes and any others we introduce, so at runtime these will not be considered valid destinations.

	for pc := 0; pc < len(code); {
		op := OpCode(code[pc])
		pc++
		if op >= PUSH1 && op <= PUSH32 {
			numbits := int(op - PUSH1 + 1)
			// ... set bits
			pc += numbits
		} else if op == JUMPSUB {
			// ... set bits
			pc += 2
		} // else if op == any other op taking immediate data
	}

But way back over here I find an example of the problem I don’t really understand, from @gumb0

@chriseth put it as:

If the argument to a newly-introduced multi-byte opcode is the byte value of JUMPDEST, then this was a valid jump destination before the hard fork, but it is no longer a valid jump destination after the hard fork.

I think this can happen only where code preceding the JUMPDEST wasn’t executabl – at some point I could scan the blockchain to see if the danger has ever manifested. This danger could be eliminated going forward by demanding that unreachable code be 0 (or INVALID, or something) so that new opcodes will never occur outside of immediate data.

And this is the problem with my analysis algorithm - it can’t tell whether it’s seeing old data or a new bytecode. Perhaps it can do better.

Not sure exactly what you are suggesting or how it helps.

What strikes me this morning is that if the problem is that the immediate data might look like a JUMPDEST the answer is just to never use 0x5b == 0x01011011 as immediate data. That could be fixed by always setting the high bit of immediate data to 1 and masking it away before use. That does reduce the number of bits for specifying immediate jumps to 14, and 2^14 is only 16384 – not enough to span an entire code array. We can either live with this restriction, use three bytes of immediate data, or use a UTF-8 style bit-ladder.

The remaining problem would be a JUMPDEST occurring before the new opcode, which in the past would have aborted with an error, but instead would execute. I think this is just another case of the fact that adding a bytecode always risks causing an invalid program to become valid - a risk we have taken many times before.

A few months down the road and I still like this idea. I suspect some base-64 variation on VLQ Variable-length quantity - Wikipedia would be best, setting bit 7 on to mark immediate data bytes (and prevent collisions with JUMPDEST), using bit 6 as a continuation bit, and the remaining 6 bits for data. (Various schemes can be used for negative numbers, including using the leading data bit as a sign bit and using 2’s complement.) It’s backwards-compatible, not needing any sort of versioning. And it’s future-proof, allowing for the size of the bytecode addressed to increase over time. It does require more decoding time than a fixed two-byte offset. It may use more or less space, depending on whether the offset requires more than 12 or less than 7 bits to represent.

And a little further down the line and I’ve moved the validation code to it’s own PR, as I realized it does not depend on these new opcodes, only deprecating dynamic use of JUMP and JUMPI.

Since this EIP is up for discussion for Shanghai again, maybe it’s good to clarify the following to avoid misunderstandings:

It is still my position (and I suspect this is shared by the solidity team in general), that gas savings cannot serve as justification for this EIP and the gas analysis in the EIP is a bit misleading (for example, the subroutine examples use RJUMP, which is per se independent of this EIP and would also reduce the gas costs of the non-subroutine versions, if used there; the swap for the return address can be nicely integrated in the overall stack layout of more complex functions, in which most of the time a globally swap-free stack layout cannot be achieved anyways, so this is no major issue for us; etc. - but I actually don’t think there is a lot of merit in arguing these points in detail). More importantly, any gas savings this EIP may theoretically bring will be insignificant in practice, since internal function calls only occur at very low frequency in real world Solidity-generated bytecode (among others due to inlining, etc.). So the overall real-world gas advantage of subroutines (for Solidity generated bytecode) will be negligible in practice.

That is not to say that I oppose this EIP. I just want to make sure that if it is considered, that it is considered for the right reasons. In combination with EIP 4200, the subroutines of this EIP can serve to completely eliminate the need for dynamic jumps, which may have significant merit independently of gas savings - but this merit has to be argued for independently and different approaches like EIP 4750 have to be considered. And such arguments and considerations will be largely independent of the use in Solidity (resp. of any theoretical gas savings).

And also: sorry, if I have missed or misunderstood any particular developments - we have quite a high workload these days and I did not find the time to review the updated EIP in all detail, but maybe the general sentiment expressed above is helpful.

Thanks for your helpful review.

The gas savings I show are very large for low level code, and I don’t think it’s at all irrelevant or misleading to show that. Quite the opposite.

And I don’t think the tail-optimization example I give is at all misleading. I’m properly comparing using the subroutines and static jumps that I require here – both RJUMP and RJUMPSUB – to the current practice of using dynamic JUMP. (Note that EIP-2315 requires EIP-4200).

It’s good that inlining Solidity code helps. And it’s interesting that Solidity does not usually generate many small routines from the code it’s usually given. (That may change given these instructions.) But Solidity is not the only compiler, and people do hand-code assembly when performance is critical - even when mostly writing Solidity.

As this proposal states, it’s the whole package of EIP-2315, EIP-4200, and EIP-3779 that provides for a complete, static, and verifiably safe control-flow facility. But even without EIP-3779 I do think my arguments for performance improvements are valid, and that improvements as large as I report are well worth it for the sort of code that needs them. It’s the critical code I’m targeting here.

Note: EIP-4750 is a very new PR that I just now took a glance at. Off-hand it looks to have excess runtime overhead – larger return-stack slots to support runtime checks for stack height – that EIP-2315 moves to validation time. And by restricting code sections to a single function it makes optimizations more difficult. I might be misreading, will study it more later.

1 Like

I would not expect that looking at hand-written assembly code for performance-critical situations will yield significantly different results in that regard. Any function call - no matter the calling convention - will incur overhead and especially in hand-optimized performance-critical code I would very rarely expect to find dense calls, since especially in such situations inlining (whether done by automatic optimization or by hand) will in general result in the lower overall gas cost. But yeah: really all I want to say is that I suggest to put more focus on the other aspects of this EIP (and its combination with the others) rather than the gas savings for its evaluation.

I don’t fully follow you here, I agree performance is not the only argument, but I do think it’s a stronger one than you give it credit for :wink:

Turing designed his calling convention to be cheap enough that inlining was less often necessary – inlining to avoid an RJUMPSUB/RETURNSUB would save 8 gas, compared to saving about 26 gas for the current approach. That moves the line for the speed/space tradeoff a fair distance.

It’s generally good software engineering to keep subroutines small and reusable, so making them a lot cheaper to call has to be a win.

1 Like

Of course you always want to have small and reusable subroutines initially. That’s why even low-level code is not written directly as opcodes, but rather e.g. in Yul, in which you still have functions you can subject to inlining by optimization. In case you don’t want to run any optimization, but optimize solely by hand, you usually optimize for every single piece of gas, so the size of the function call overhead won’t matter, you will still inline by hand.
What I’m saying is that the usual case is to have few and relatively large functions in deployed bytecode. If somebody wants to refute that: solidity will annotate jumps into and out of functions, so in any deployed contract with sources available it should be straightforward to calculate (a good approximation of) the ratio of overall gas cost to function calls (this applies to both solidity generated code and inline assembly code, i.e. “hand-optimized cases”).
Or to put it differently: most meaningful transactions will involve at least one state change, so 20000 gas. Even if you save 20 gas per function call, you need hundreds of calls to make a difference that comes even close to the percentages in the EIP’s gas analysis.
Also the main advantage of inlining itself often is not really saving the function call overhead, but rather allowing to fuse and optimize the code further based on the concrete arguments and surroundings.

So I’m rather worried about gas savings being an exaggeration rather than not giving it too much credit :-).

But yeah, I think I made my point about this now, so I’ll leave it at that for the time being.

I think I see your point of view as a Solidity compiler writer. (E.g. inlining isn’t just about pulling small routines in line to avoid call overhead, it’s also about pulling in large routines to subject them to further optimization. We went down that road 100% in C++ – many libraries exist entirely as inline code.) I still think that if you had these operations you could make good use of them, and would appreciate it if you could think on that a little.

My point of view is that of a machine designer who is relatively agnostic about where the code is coming from – it’s hard to fully anticipate all of the ways a general-purpose machine will be used. So I look at the history of computing machines and find that most of them provide subroutine operations going all the way back to Turing, 1946. And I look at the history of virtual machines going back to Forth, 1970 and farther and find they are mostly stack machines that use Turing’s design: a stack of return addresses. And I can’t see any good reason for our VM not to do the same.

So… I’m proposing two opcodes that follow an industry-standard design and provide the most efficient call/return mechanism I know of. Calling takes 5 gas for an RJUMPSUB, returning takes 3 gas for aRETURNSUB. 8 gas total.

By contrast, currently: Calling takes 14 gas for a PUSH <return address> PUSH <destination> JUMP. Returning takes 11 gas for a SWAP JUMP. 25 gas total.

So, 25 - 8 = 17 gas, and 17 ÷ 25 = 68% savings. YMMV.

That seems to me to be too large of a difference to ignore, even if maximally efficient subroutines aren’t on the critical path for Solidity.

(Yes, sometimes the SWAP isn’t needed: 22-8=14, 14/22=64%. And if we also had RJUMP then calling could only take PUSH <return address> RJUMP <destination>. 17-8=9, 9/17=53%. The savings are still substantial.)

I think what @ekpyron refers to is that with EIP-4200 this would look like as PUSH <return address> RJUMP <destination> for calling in and JUMP for returning (swaps are only needed during returning when there are return parameters). The prices depend on what prices are decided for RJUMP.

1 Like

I think I covered the cases you mention?

(Yes, sometimes the SWAP isn’t needed: 22-8=14, 14/22=64%. And if we also had RJUMP then calling could only take PUSH <return address> RJUMP <destination>. 17-8=9, 9/17=53%. The savings are still substantial.)

So yes, the exact savings depend on the exact gas costs, but I think the savings would remain substantial. In the proposal I also examine an alternative that combines the PUSH/RJUMP into one 6-gas operation. That saves 5 more gas – 12-8=4, 4/12=33% – which is the best I think we can do without a return stack.

Note: I neglected to count the JUMPDEST needed by each JUMP, so all of the savings are slightly better than shown. But I’m wondering if JUMP really needs a JUMPDEST now – it seems all the jumpdest analysis needs to do is mark all immediate data as invalid destinations so JUMP can check for that at runtime.

Some suggestions for improving current EIP text.

  1. Mention how EOF validation procedure is changed. It shoud check that each argument of RJUMPSUB is within code section bounds and doesn’t point to push data or another RJUMPSUB immediate.
    Compare to similaar paragraph in EIP-4200:

We also extend the validation algorithm of EIP-3670 to verify that each RJUMP /RJUMPI has a relative_offset pointing to an instruction. This means it cannot point to an immediate data of PUSHn /RJUMP /RJUMPI . It cannot point outside of code bounds. It is allowed to point to a JUMPDEST , but is not required to.

  1. It would be best to provide python code for new validation procedure, as in EIP-4200 Reference Implementation section. In fact I think validation for this EIP will be very much similar.

  • If a resulting PC to be executed is beyond the last instruction then the opcode is implicitly a STOP, which is not an error.

This shouldn’t happen at all thanks to validation of all RJUMPSUBs, so I think this note can be removed.

  1. Specify what happens when RETURNSUB is called with empty return stack.

  2. Paragraphs with justification of costs in the end of Specification should be moved to Rationale.

NOTE: An apology, I wrote most of this many days ago and left it unfinished…

Thanks for the careful review, Andrei. Pretty much all your points I agree with, and will work them into the spec when I can.

The harder part is the validation algorithm in EIP-3379, which provides safety rules for this EIP and others. I currently think it’s wrong, or at least can no longer convince myself it’s right for subroutines. Am working with a mathematician friend on that. We think we have a correct algorithm, but so far it’s more complex than I would like.

From a discussion with Pawel in Amsterdam I am OK with leaving this proposal as a draft in favor EIP-4750: EOF - Functions. I’ll comment there later, but my concerns are that – being very close to EIP-615: Subroutines and Static Jumps for the EVM – it needs to include comparable safety constraints and an algorithm for validating them. In general the only runtime checks needed should be for stack overflow and OOG.

Also, I’d prefer that 4750 be specified in such a way that this EIP can be seamlessly introduced later as a performance optimization. I think that is already the case.

1 Like

Ready for Review:
Proposal: EIP-2315 Simple Subroutines for the EVM
Requires: EIP-4200: Static relative jumps

This proposal introduces two opcodes to support calling and returning from subroutines:

  • RJUMPSUB relative_offset – relative jump to subroutine
  • RETURNSUB – return to PC after most recent RJUMPSUB.

It deprecates JUMP and JUMPI and ensures, at initialization time, that valid code will not execute invalid instructions or jump to invalid locations, will not underflow stack, will maintain consistent numbers of inputs and outputs for subroutines, and will have bounded stack height in the absence of recursion.

These are very nearly the same guarantees as EIP-615 and EIP-4750 with less complexity and better opportunities for optimized code.

This proposal does not impose any syntax – a subroutine is not a contiguous sequence of bytecode, it is a subgraph of the control-flow graph. This provides more opportunities for optimization, especially for compact layout of code. Since we wish to support one-pass compilers of EVM code to machine code it is crucial that the EVM code be as well optimized as possible.

I still prefer that EOF code sections represent Modules containing multiple procedures rather than being a single Function. This allows for low-level optimizations within a module, but no control flow between modules except via defined interfaces. But this is a discussion for elsewhere

My priorities are:

  1. EIP-4200
  2. EIP-2315
  3. EIP-4750

I will certainly support whatever kind of subroutines we can get. We cannot write one-pass compilers without them.

1 Like

I would like to raise a few points:

It deprecates JUMP and JUMPI.

Can you please clarify?

RJUMPSUB (0x5f) relative_offest: uint16

I assume it should be signed 16-bit?

( as it is “partially” demonstrated in the example case: 0xe5ff = RJUMPSUB -1 )

The test cases seams to be inconsistent (bytecode vs tables). Would you like to have some cleanup PRs?

I see there is a PR for fixing points 2 and 3, thanks.

Yes, as explained in the motivation, dynamic jumps make traversal of code vulnerable to quadratic-time attack. Anything we want to do at initialization time can’t take time and space more than linear in the size of the initialization code. That includes validating the code and compiling it to machine code. And I didn’t go into it, but they can also make it impossible to validate that stack doesn’t underflow. So dynamic jumps must go.