EIP-2315 Simple Subroutines for the EVM

A new Draft of this EIP is up at

https://eips.ethereum.org/EIPS/eip-2315

It lays out a few simple rules and a validation algorithm for safe, statically analyzable code using the EVM with this EIP.

2 Likes

Thanks, @ekpyron. I stand corrected. In your example it looks like most of the problem is just that JUMPSUB costs more than JUMP, and all the extra gas buys is readability. How would your optimized MULTIPLY look if was to be called in multiple places? And we should use a more realistic leaf routine.

In the current draft I’ve removed the appendix for now, as what is needed is a more complete analysis of the current Solidity calling conventions compared to what is possible with this EIP, (For Solidity and other languages) and I’ll need help with that.

For what it’s worth this EIP intentionally does not force you to use subroutines at all, so they don’t prevent you from optimizing your code in the ways you have.

A good summary of the history of subroutines, from Lovelace and Turing onward

https://people.cs.clemson.edu/~mark/subroutines.html

and the subroutine facilities provided by many CPUs

https://people.cs.clemson.edu/~mark/subroutines/

including the following:

  • 704
  • 709x
  • alpha
  • amd29k
  • b5000
  • cdc6600
  • i960
  • itanium
  • m68k
  • m88k
  • mips
  • pdp8
  • pdp11
  • ppc
  • s360
  • sparc
  • vax
  • x86

@ekpyron @chriseth @holiman @AlexeyAkhunov

I’ve been working through this basic example more carefully, writing hand optimized code for all cases. I’m finding that JUMPSUB-based code simplifies calling conventions and saves gas compared to JUMP-based code in the general and optimized cases. Please do check my work.

TEST_DIV:
beginsub
0x02        3
0x03        3
DIVIDE      3
jumpsub    10
returnsub   5

DIVIDE:
beginsub
div         5
returnsub   5

Total 34 gas.

The same code, using JUMP.

TEST_DIV:
   jumpdest    1
   RTN_DIV     3
   0x02        3
   0x03        3
   DIVIDE      3
   jump        8
RTN_DIV:
   jumpdest    1
   swap1       3
   jump        8

DIVIDE:
   jumpdest    1
   div         5
   swap1       3
   jump        8

50 gas total. Both approaches need to push two arguments and divide = 11 gas, so control flow gas is 39 using JUMP versus 23 using JUMPSUB.

  • That’s (39-23)/39 = 41% savings in control flow cost.
  • That’s (50-34)/50 = 32% savings in total cost.

In the general case of one routine calling another I don’t think the JUMP version can do better. Of course we can optimize the tail call, so that the final jump in DIVIDE actually returns from TEST_DIV.

TEST_DIV:
   jumpdest  1
   0x02      3
   0x03      3
   DIVIDE    3
   jump      8

DIVIDE:
   div       5
   swap1     3
   jump      8

Total 35 gas, which is still worse than with JUMPSUB.

We could even take advantage of DIVIDE happening to directly follow TEST_DIV:

TEST_DIV:
   jumpdest  1
   0x02      3
   0x03      3
DIVIDE:
   jumpdest  1
   div       5
   swap1     3
   jump      8

Total 24. A savings of 10 gas in an atypical case.

However, JUMPSUB can do better with the same optimizations.

TEST_DIV:
   beginsub
   0x02      3
   0x03      3
   DIV_DEST  3
   jump      8

DIV_DEST
   jumpdest  1
DIVIDE:
   beginsub  1
   div       5
   returnsub 5

Total 29.

TEST_DIV:
   beginsub
   0x02      3
   0x03      3
DIVIDE:
   beginsub  1
   div       5
   returnsub 5

Total 17.

And of course these would all do better if JUMPs and JUMPSUBs could take an immediate.

So is saving gas the main goal of this EIP? Would it make sense to make such an analysis on real-world code instead of 2-3 hand-crafted examples?

If gas saving is the main goal, then please note that there are still tons of places to optimize existing code without the need to change the EVM.

Furthermore, unless this saves more than 10% gas on real-world examples, I would prefer our workforce to be directed at scaling solutions instead which allow a factor of 10 to 100 in gas savings.

I haven’t looked through those examples in all detail, but: shouldn’t JUMPDEST only cost gas if walked over, not if jumped to? If so, the tail call optimized version has equal gas cost already.
The other more optimized examples now seem to assume that it’s possible to walk into subroutines and to JUMP to BEGINSUB, do I read that correctly? I guess also the opcode order e.g. in the 35 and 29 gas examples is garbled a bit? In any case those two implicit changes to the spec would indeed allow more optimizations (not considering any other implications they might have). The 21 gas example I find rather confusing - I guess that’s meant to be a BEGINSUB, not a JUMPDEST in the beginning? And it probaly shouldn’t push DIVIDE? That might make more sense.

In any case with those modifications to the spec that are implicit in those examples (I don’t see them reflected in the spec, though), the situation definitely changes a bit and subroutines would probably indeed save some gas on average - but I tend to agree with @chriseth that a big picture view might be more beneficial than arguing about saving a handful of gas or not in hand-crafted examples.

Simpler calling conventions and gas savings have always been the primary goals of this EIP.

Two safety properties result. Since return addresses are not on the stack it is not possible to overwrite them or use the wrong one, and it is not necessary to use jumps dynamically for subroutine returns.

I am trying to compare estimates of the best that compilers can do, which is what @ekpyron was trying to do.

Furthermore, unless this saves more than 10% gas on real-world examples, I would prefer our workforce to be directed at scaling solutions instead which allow a factor of 10 to 100 in gas savings.

For this example we get a 32% performance improvement for routines that are not tail calls. (For tail calls it’s better.) Obviously the percentage improvement would be less as the amount of actual computation gets larger. To get from 32% to 10% the actual computation in this example would have to increase from 11 gas to 120 gas.

I’m not sure what “scaling solutions” you have in mind - please explain. 10X to 100X improvements don’t sound possible just by improved optimization.

Thanks for catching the opcodes out of order. I’ve fixed them and some other things. The overall results don’t change much.

JUMPDEST costs 1 gas regardless. It should probably be 0, but that only means that a tail-optimized JUMP-based routine is as cheap as an un-optimized JUMPSUB-based routine.

Yes, the current draft allows for walking into subroutines. Preventing it didn’t seem to provide safety worth sacrificing tail-call elimination. It doesn’t (yet?) allow jumping to subroutines, so I’ve fixed my example to use an extra gas for a leading JUMPDEST. I’m not sure that I can optimize things much more, but would be glad to be proven wrong.

More realistic examples would be good, but I don’t know the details of the Solidity code generator, and comparing maximally-optimized examples is a close to apples-to-apples as I know how to get.

Ok, allowing back tail call optimizations definitely means that this does less harm to optimization than before. There is still head call optimizations, but I’m not sure how significant they are.

Still to account for actual real world gas saving one has to consider that of course all of these toy functions are generally inlined in practice, because the call overhead far exceeds the cost of executing them in-place. For that matter even a single sstore would still be inlined, for which - if not - the savings would already only be (20045-20029)/20045 = 0.08%
I.e. control flow is only a fraction of the actual gas cost of a contract, so a fractional saving in control flow is minimal gain in practice.

So even if with allowing back tail call optimizations this might in fact save a few gas on average, I’d still say that that alone doesn’t warrant changing the evm for it.

And allowing walking into or jumping to subroutines seems to me like it’s moving away from the independent goal of better analyzability (even though I’d still argue that none of this really makes a difference in analyzing anything anyways).

One more quick note about the tail call optimized example: I think 1+3+3+3+8+5+3+8 is 34 not 35 ;-).
Ah, no, 35 is correct, but it’s now missing a JUMPDEST.
But yeah, in general, I’d say, there is a few more cases that one might want to look at, but given the change to the spec, I would no longer say that this EIP breaking optimization opportunities is a strong argument against it anymore.

@ekpyron I’m using a non-tail-call “toy function” that uses only 11 gas for computation to isolate the overhead of making calls from the overhead of making computations - I show that RETURNSUB improves gas use by 32%, and that an example that uses over 120 gas for computation is still 10% better. I also show that RETURNSUB gives superior tail call optimization. I don’t know if that is “realistic.”

Not all calls are tail calls, and not all tail-calls can be inlined without increasing contract size. So I don’t see that improving normal calls with a fairly easy (and already implemented) change to the EVM - adding a common facility - is a waste.

I’m not hard-set on making BEGINSUB execution invalid - tail-call optimization can be done either way - but code generators wouldn’t execute it by mistake, and humans hand-coding assembly should also be that skillful. And my analysis is that it doesn’t provide any extra safety, and neither does restricting “jumping into functions.”

I have in mind some of the same safety characteristics as EIP-615 – no stack underflows, no invalid jumps or invalid instructions, and linear-time, acyclic traversal of the control-flow graph. EIP-615 makes it makes it difficult to violate and easy to enforce these guarantees by eliminating dynamic jumps. EIP-2315 only makes it possible to avoid the use of dynamic jumps. Currently they are necessary because returns must be implemented by jumping dynamically to an address on the stack (or in memory), which RETURNSUB renders unnecessary.

Just to argue the point: you can still have calls like this:

function F(a) {
  G(a)
  ...[F]...
}
function G(a) {
  H(a)
  ...[G]...
}
function H(a) {
  ...[H]...
}

F(42)
...

in which I can pull out the heads, i.e. generate

ret					3
42					3
F					3
G					3
H					3
jump				8
ret:
...

F:
	jumpdest		1
	...[F]...
	jump			8
G:
	jumpdest		1
	...[G]...
	jump			8
H:
	jumpdest		1
	...[H]...
	jump			8

50 gas

compared to:

42					3
F					3
jumpsub				10
...

F:
	beginsub
	G				3
	jumpsub			10
	...[F]...
	returnsub		5
G:
	beginsub
	H				3
	jumpsub			10
	...[G]...
	returnsub		5
H:
	beginsub
	...[H]...
	returnsub		5

57 gas

and this I’m pretty sure you cannot beat using the subroutines, since it actually relies on the “dynamic” return jumps (which I’d not call dynamic, since they still always have fixed values for each call - they are just on the normal stack instead of hidden in the return stack).
That’s of course also highly artificial and other cases will behave differently. It’s possible to construct artificial cases that tip either way, if one really wants to. But I still maintain that gas savings alone are not a strong argument here.

EIP-2315 doesn’t prevent you from using JUMP in the way that you have. But indeed it is a highly artificial example, not likely to be generated by Solidity or most any other compiler. (I don’t think you could write it that way in Wasm either.) So I’m trying to use examples that are not artificial - just one routine calling another.

The raw improvement in gas cost and the complexity of calling conventions seem obvious to me.

  • 47 gas, 11 instructions reduced to 31 gas, 8 instructions – for an optimal ordinary routine.
  • 32 gas, 7 instructions reduced to 29 gas, 7 instructions – for a tail-optimized routine.

Whether that is enough of a difference to make a difference depends on the program being compiled and how well it is optimized, which is not just a matter of the “typical” Solidity program.

Anyway, given the schedule of upcoming upgrades this EIP has no hope of going in for many months, and would be most valuable as part of a more integrated set of EIPs. Thus I’ve opened discussions at EVM evolution Working Group Formation - #3 by gcolvin.

1 Like

It’s not that artificial actually - it wouldn’t be directly generated nor handwritten like that, but an optimizer can easily recognize the patterns of “PUSH(F) JUMP” and “F: [something cheap] JUMP” and partially inline it depending on the precise cost of “something cheap”, the number of such patterns occurring, etc. (which becomes more complicated with several competing calling conventions). The artificial part is rather that I specifically chose one such example that actually has the gas cost tip this way, while others will not :-).
But yeah, if this EIP has additional merit as natural part of an integrated set of EIPs things can easily look quite different in general!

It seems everyone has been assuming otherwise. Why are they wrong?

Why the 0x00ff?

Taking my own bait. Extending the turbo-geth code to handle JUMPSUB (original spec - to JUMPDESTs) I get the following sketch. I can’t see any problem – this jumpdest analysis should find and mark all of the immediate data for the PUSH* opcodes and any others we introduce, so at runtime these will not be considered valid destinations.

	for pc := 0; pc < len(code); {
		op := OpCode(code[pc])
		pc++
		if op >= PUSH1 && op <= PUSH32 {
			numbits := int(op - PUSH1 + 1)
			// ... set bits
			pc += numbits
		} else if op == JUMPSUB {
			// ... set bits
			pc += 2
		} // else if op == any other op taking immediate data
	}

But way back over here I find an example of the problem I don’t really understand, from @gumb0

@chriseth put it as:

If the argument to a newly-introduced multi-byte opcode is the byte value of JUMPDEST, then this was a valid jump destination before the hard fork, but it is no longer a valid jump destination after the hard fork.

I think this can happen only where code preceding the JUMPDEST wasn’t executabl – at some point I could scan the blockchain to see if the danger has ever manifested. This danger could be eliminated going forward by demanding that unreachable code be 0 (or INVALID, or something) so that new opcodes will never occur outside of immediate data.

And this is the problem with my analysis algorithm - it can’t tell whether it’s seeing old data or a new bytecode. Perhaps it can do better.

Not sure exactly what you are suggesting or how it helps.

What strikes me this morning is that if the problem is that the immediate data might look like a JUMPDEST the answer is just to never use 0x5b == 0x01011011 as immediate data. That could be fixed by always setting the high bit of immediate data to 1 and masking it away before use. That does reduce the number of bits for specifying immediate jumps to 14, and 2^14 is only 16384 – not enough to span an entire code array. We can either live with this restriction, use three bytes of immediate data, or use a UTF-8 style bit-ladder.

The remaining problem would be a JUMPDEST occurring before the new opcode, which in the past would have aborted with an error, but instead would execute. I think this is just another case of the fact that adding a bytecode always risks causing an invalid program to become valid - a risk we have taken many times before.

A few months down the road and I still like this idea. I suspect some base-64 variation on VLQ Variable-length quantity - Wikipedia would be best, setting bit 7 on to mark immediate data bytes (and prevent collisions with JUMPDEST), using bit 6 as a continuation bit, and the remaining 6 bits for data. (Various schemes can be used for negative numbers, including using the leading data bit as a sign bit and using 2’s complement.) It’s backwards-compatible, not needing any sort of versioning. And it’s future-proof, allowing for the size of the bytecode addressed to increase over time. It does require more decoding time than a fixed two-byte offset. It may use more or less space, depending on whether the offset requires more than 12 or less than 7 bits to represent.

And a little further down the line and I’ve moved the validation code to it’s own PR, as I realized it does not depend on these new opcodes, only deprecating dynamic use of JUMP and JUMPI.