EIP-2315 Simple Subroutines for the EVM

gcolvin · March 7, 2021, 9:08pm

We are also setting bad precedents.

Don’t bother authoring core EIPs - your work may be in vain.
Don’t bother implementing core EIPs - your work may be in vain.
If you wish to stop an EIP - make your blocking objections known as late as possible.

In this way you can maximize the amount of time and resources you waste and the amount of discouragement you create for authors and developers. Thank you.

gcolvin · March 7, 2021, 9:23pm

In the Forth model arguments other than constants are not generally pushed on the stack: expression evaluation up to the subroutine call leaves arguments on the stack, ready for evaluation by the called subroutine, which again leaves its results on the stack as return values. I’m not sure how to have the return address on the data stack without it interfering - Forth has a return stack for this reason. There is a safety benefit as well - it is not possible to modify the return address.

gcolvin · March 8, 2021, 6:20am

I reply here to some of the detailed points made by @AlexeyAkhunov at Proposal to remove EIP-2315 (simple subroutines) from Berlin · Issue #263 · ethereum/pm · GitHub

I think from what @lightclient posted above (great work, BTW), it is clear that there were 2 points of contention with regards to the properties of the subroutines:

Whether the JUMPSUB uses jump location from the stack, or from the immediate as a part of the opcode. Eventually the stack solution was used, and I agree with Chris (after my own analysis) that this effectively made this variant of subroutines non beneficial for one of initially stated goals - ease of static analysis due to lack of dynamic jumps. It actually makes that stated goal harder to achieve because to do so, JUMPSUB opcode will need to be deprecated as well as JUMP and JUMPI.

Ease of static analysis was never a stated goal of EIP-2315. The goal was to make “almost the smallest change that provides native subroutines without breaking backwards compatibility.”

More on deprecation below.

Whether the jumping from the middle of one subroutine into the middle of another subroutine is allowed. Eventually this was allowed, I believe to simplify the implementation and therefore expedite the acceptance of EIP. I believe (after my own analysis) that this decision has unfortunately mostly eroded another stated benefit of the subroutines - strict “first in/first out” behaviour of subroutines calling each other (as we are used as programmers). As a result, simple emulation of the implemented behaviour by using combination of PUSH and JUMP opcodes seems to suffice.

Again, this was never a stated benefit of EIP-2315.

EIP-615 imposed syntax on the bytecode to enforce safety.

EIP-2315 is pure mechanism – the core mechanism of EIP-615.

(I don’t think “jumping in and out” is the real issue anyway. It’s whether the stack is always aligned at the JUMPs, and the JUMPSUBs and RETURNSUBs balance.)

There were other observations made after my analysis:

The claim (now apparently retracted) by the EIP author that subroutines are present in every modern virtual machine

My claim was, “These are provided in some form by most all physical and virtual machines going back at least 50 years.” I stand by it, though “some form”, and “most all” leaves room for disagreement.

I chose a simple form for EIP-2315 – I state that, “Our design is modeled on the original Forth two-stack machine of 1970. The data stack is supplemented with a return stack to provide simple support for subroutines, as specified below.”

There are of course other models, both more simple and more complex. These include the JVM method invocation and Wasm function table models.

EIP-2315 is also compatible with Nick Johnson’s EIP-3337: Frame pointer support for memory load and store operations. Together they can be used to provide a richer model of subroutine more similar to EIP-615 and to Vax, Intel, and other CPUs.

Much of the rest of your critique seems to be details of your disagreement as to what should count as a subroutine.

… I find the argument for still using subroutines in the current EIP-2315 form (without cross-jump restriction) as a safety feature, much weaker and perhaps non-existent.

Again, such safety was not a goal or a promise.

The only safety directly provided by EIP-2315 is that it is impossible to overwrite the return address for a subroutine call.

Future EIPs might impose stricter requirements, but by now I do not think that deprecating opcodes is an option – too many years have passed, and backwards-compatibility is too important.

At best we can restrict the JUMP* opcodes to constant arguments or some form of restricted arguments, such as loading them from a data table. (per EIP-2327: BEGINDATA opcode)

Reference implementation of the currently proposed form of this EIP was done in go-ethererum. After the review of the implementation, I concluded that it does not incur extra performance overhead on other opcodes. However, in evmone implementation, EIP-2315 seems to be introducing overhead to all the execution …
… this issue has not been flagged until the evmone implementation was benchmarked.

I have not seen a full report of this testing. My impression was that the performance hit was small, that some of it was just the increased code size of adding any new opcodes at all, and that work was still underway to see whether it could be reduced. It’s unfortunate that the evmone implementation was so delayed

I don’t know whether a slight performance hit in one implementation is a showstopper for this proposal.

gcolvin · March 11, 2021, 4:33pm

A new Draft of this EIP is up at

EIP-2315: Simple Subroutines for the EVM

It lays out a few simple rules and a validation algorithm for safe, statically analyzable code using the EVM with this EIP.

gcolvin · March 29, 2021, 6:27am

Thanks, @ekpyron. I stand corrected. In your example it looks like most of the problem is just that JUMPSUB costs more than JUMP, and all the extra gas buys is readability. How would your optimized MULTIPLY look if was to be called in multiple places? And we should use a more realistic leaf routine.

In the current draft I’ve removed the appendix for now, as what is needed is a more complete analysis of the current Solidity calling conventions compared to what is possible with this EIP, (For Solidity and other languages) and I’ll need help with that.

For what it’s worth this EIP intentionally does not force you to use subroutines at all, so they don’t prevent you from optimizing your code in the ways you have.

gcolvin · March 29, 2021, 6:39am

A good summary of the history of subroutines, from Lovelace and Turing onward

https://people.cs.clemson.edu/~mark/subroutines.html

and the subroutine facilities provided by many CPUs

https://people.cs.clemson.edu/~mark/subroutines/

including the following:

704
709x
alpha
amd29k
b5000
cdc6600
i960
itanium
m68k
m88k
mips
pdp8
pdp11
ppc
s360
sparc
vax
x86

gcolvin · April 6, 2021, 2:09am

@ekpyron @chriseth @holiman @AlexeyAkhunov

I’ve been working through this basic example more carefully, writing hand optimized code for all cases. I’m finding that JUMPSUB-based code simplifies calling conventions and saves gas compared to JUMP-based code in the general and optimized cases. Please do check my work.

TEST_DIV:
beginsub
0x02        3
0x03        3
DIVIDE      3
jumpsub    10
returnsub   5

DIVIDE:
beginsub
div         5
returnsub   5

Total 34 gas.

The same code, using JUMP.

TEST_DIV:
   jumpdest    1
   RTN_DIV     3
   0x02        3
   0x03        3
   DIVIDE      3
   jump        8
RTN_DIV:
   jumpdest    1
   swap1       3
   jump        8

DIVIDE:
   jumpdest    1
   div         5
   swap1       3
   jump        8

50 gas total. Both approaches need to push two arguments and divide = 11 gas, so control flow gas is 39 using JUMP versus 23 using JUMPSUB.

That’s (39-23)/39 = 41% savings in control flow cost.
That’s (50-34)/50 = 32% savings in total cost.

In the general case of one routine calling another I don’t think the JUMP version can do better. Of course we can optimize the tail call, so that the final jump in DIVIDE actually returns from TEST_DIV.

TEST_DIV:
   jumpdest  1
   0x02      3
   0x03      3
   DIVIDE    3
   jump      8

DIVIDE:
   div       5
   swap1     3
   jump      8

Total 35 gas, which is still worse than with JUMPSUB.

We could even take advantage of DIVIDE happening to directly follow TEST_DIV:

TEST_DIV:
   jumpdest  1
   0x02      3
   0x03      3
DIVIDE:
   jumpdest  1
   div       5
   swap1     3
   jump      8

Total 24. A savings of 10 gas in an atypical case.

However, JUMPSUB can do better with the same optimizations.

TEST_DIV:
   beginsub
   0x02      3
   0x03      3
   DIV_DEST  3
   jump      8

DIV_DEST
   jumpdest  1
DIVIDE:
   beginsub  1
   div       5
   returnsub 5

Total 29.

TEST_DIV:
   beginsub
   0x02      3
   0x03      3
DIVIDE:
   beginsub  1
   div       5
   returnsub 5

Total 17.

And of course these would all do better if JUMPs and JUMPSUBs could take an immediate.

chriseth · April 6, 2021, 9:34am

So is saving gas the main goal of this EIP? Would it make sense to make such an analysis on real-world code instead of 2-3 hand-crafted examples?

If gas saving is the main goal, then please note that there are still tons of places to optimize existing code without the need to change the EVM.

Furthermore, unless this saves more than 10% gas on real-world examples, I would prefer our workforce to be directed at scaling solutions instead which allow a factor of 10 to 100 in gas savings.

ekpyron · April 6, 2021, 10:53am

I haven’t looked through those examples in all detail, but: shouldn’t JUMPDEST only cost gas if walked over, not if jumped to? If so, the tail call optimized version has equal gas cost already.
The other more optimized examples now seem to assume that it’s possible to walk into subroutines and to JUMP to BEGINSUB, do I read that correctly? I guess also the opcode order e.g. in the 35 and 29 gas examples is garbled a bit? In any case those two implicit changes to the spec would indeed allow more optimizations (not considering any other implications they might have). The 21 gas example I find rather confusing - I guess that’s meant to be a BEGINSUB, not a JUMPDEST in the beginning? And it probaly shouldn’t push DIVIDE? That might make more sense.

In any case with those modifications to the spec that are implicit in those examples (I don’t see them reflected in the spec, though), the situation definitely changes a bit and subroutines would probably indeed save some gas on average - but I tend to agree with @chriseth that a big picture view might be more beneficial than arguing about saving a handful of gas or not in hand-crafted examples.

gcolvin · April 6, 2021, 10:36pm

Simpler calling conventions and gas savings have always been the primary goals of this EIP.

Two safety properties result. Since return addresses are not on the stack it is not possible to overwrite them or use the wrong one, and it is not necessary to use jumps dynamically for subroutine returns.

I am trying to compare estimates of the best that compilers can do, which is what @ekpyron was trying to do.

Furthermore, unless this saves more than 10% gas on real-world examples, I would prefer our workforce to be directed at scaling solutions instead which allow a factor of 10 to 100 in gas savings.

For this example we get a 32% performance improvement for routines that are not tail calls. (For tail calls it’s better.) Obviously the percentage improvement would be less as the amount of actual computation gets larger. To get from 32% to 10% the actual computation in this example would have to increase from 11 gas to 120 gas.

I’m not sure what “scaling solutions” you have in mind - please explain. 10X to 100X improvements don’t sound possible just by improved optimization.

gcolvin · April 6, 2021, 11:50pm

Thanks for catching the opcodes out of order. I’ve fixed them and some other things. The overall results don’t change much.

JUMPDEST costs 1 gas regardless. It should probably be 0, but that only means that a tail-optimized JUMP-based routine is as cheap as an un-optimized JUMPSUB-based routine.

Yes, the current draft allows for walking into subroutines. Preventing it didn’t seem to provide safety worth sacrificing tail-call elimination. It doesn’t (yet?) allow jumping to subroutines, so I’ve fixed my example to use an extra gas for a leading JUMPDEST. I’m not sure that I can optimize things much more, but would be glad to be proven wrong.

More realistic examples would be good, but I don’t know the details of the Solidity code generator, and comparing maximally-optimized examples is a close to apples-to-apples as I know how to get.

ekpyron · April 7, 2021, 8:43am

Ok, allowing back tail call optimizations definitely means that this does less harm to optimization than before. There is still head call optimizations, but I’m not sure how significant they are.

Still to account for actual real world gas saving one has to consider that of course all of these toy functions are generally inlined in practice, because the call overhead far exceeds the cost of executing them in-place. For that matter even a single sstore would still be inlined, for which - if not - the savings would already only be (20045-20029)/20045 = 0.08%
I.e. control flow is only a fraction of the actual gas cost of a contract, so a fractional saving in control flow is minimal gain in practice.

So even if with allowing back tail call optimizations this might in fact save a few gas on average, I’d still say that that alone doesn’t warrant changing the evm for it.

And allowing walking into or jumping to subroutines seems to me like it’s moving away from the independent goal of better analyzability (even though I’d still argue that none of this really makes a difference in analyzing anything anyways).

ekpyron · April 7, 2021, 4:40pm

One more quick note about the tail call optimized example: I think 1+3+3+3+8+5+3+8 is 34 not 35 ;-).
Ah, no, 35 is correct, but it’s now missing a JUMPDEST.
But yeah, in general, I’d say, there is a few more cases that one might want to look at, but given the change to the spec, I would no longer say that this EIP breaking optimization opportunities is a strong argument against it anymore.

gcolvin · April 7, 2021, 4:46pm

@ekpyron I’m using a non-tail-call “toy function” that uses only 11 gas for computation to isolate the overhead of making calls from the overhead of making computations - I show that RETURNSUB improves gas use by 32%, and that an example that uses over 120 gas for computation is still 10% better. I also show that RETURNSUB gives superior tail call optimization. I don’t know if that is “realistic.”

Not all calls are tail calls, and not all tail-calls can be inlined without increasing contract size. So I don’t see that improving normal calls with a fairly easy (and already implemented) change to the EVM - adding a common facility - is a waste.

I’m not hard-set on making BEGINSUB execution invalid - tail-call optimization can be done either way - but code generators wouldn’t execute it by mistake, and humans hand-coding assembly should also be that skillful. And my analysis is that it doesn’t provide any extra safety, and neither does restricting “jumping into functions.”

I have in mind some of the same safety characteristics as EIP-615 – no stack underflows, no invalid jumps or invalid instructions, and linear-time, acyclic traversal of the control-flow graph. EIP-615 makes it makes it difficult to violate and easy to enforce these guarantees by eliminating dynamic jumps. EIP-2315 only makes it possible to avoid the use of dynamic jumps. Currently they are necessary because returns must be implemented by jumping dynamically to an address on the stack (or in memory), which RETURNSUB renders unnecessary.

ekpyron · April 7, 2021, 5:51pm

Just to argue the point: you can still have calls like this:

function F(a) {
  G(a)
  ...[F]...
}
function G(a) {
  H(a)
  ...[G]...
}
function H(a) {
  ...[H]...
}

F(42)
...

in which I can pull out the heads, i.e. generate

ret					3
42					3
F					3
G					3
H					3
jump				8
ret:
...

F:
	jumpdest		1
	...[F]...
	jump			8
G:
	jumpdest		1
	...[G]...
	jump			8
H:
	jumpdest		1
	...[H]...
	jump			8

50 gas

compared to:

42					3
F					3
jumpsub				10
...

F:
	beginsub
	G				3
	jumpsub			10
	...[F]...
	returnsub		5
G:
	beginsub
	H				3
	jumpsub			10
	...[G]...
	returnsub		5
H:
	beginsub
	...[H]...
	returnsub		5

57 gas

and this I’m pretty sure you cannot beat using the subroutines, since it actually relies on the “dynamic” return jumps (which I’d not call dynamic, since they still always have fixed values for each call - they are just on the normal stack instead of hidden in the return stack).
That’s of course also highly artificial and other cases will behave differently. It’s possible to construct artificial cases that tip either way, if one really wants to. But I still maintain that gas savings alone are not a strong argument here.

gcolvin · April 7, 2021, 8:15pm

EIP-2315 doesn’t prevent you from using JUMP in the way that you have. But indeed it is a highly artificial example, not likely to be generated by Solidity or most any other compiler. (I don’t think you could write it that way in Wasm either.) So I’m trying to use examples that are not artificial - just one routine calling another.

The raw improvement in gas cost and the complexity of calling conventions seem obvious to me.

47 gas, 11 instructions reduced to 31 gas, 8 instructions – for an optimal ordinary routine.
32 gas, 7 instructions reduced to 29 gas, 7 instructions – for a tail-optimized routine.

Whether that is enough of a difference to make a difference depends on the program being compiled and how well it is optimized, which is not just a matter of the “typical” Solidity program.

Anyway, given the schedule of upcoming upgrades this EIP has no hope of going in for many months, and would be most valuable as part of a more integrated set of EIPs. Thus I’ve opened discussions at EVM evolution Working Group Formation - #3 by gcolvin.

ekpyron · April 7, 2021, 8:36pm

It’s not that artificial actually - it wouldn’t be directly generated nor handwritten like that, but an optimizer can easily recognize the patterns of “PUSH(F) JUMP” and “F: [something cheap] JUMP” and partially inline it depending on the precise cost of “something cheap”, the number of such patterns occurring, etc. (which becomes more complicated with several competing calling conventions). The artificial part is rather that I specifically chose one such example that actually has the gas cost tip this way, while others will not :-).
But yeah, if this EIP has additional merit as natural part of an integrated set of EIPs things can easily look quite different in general!

gcolvin · April 8, 2021, 12:44pm

It seems everyone has been assuming otherwise. Why are they wrong?

Why the 0x00ff?

gcolvin · April 14, 2021, 4:48pm

Taking my own bait. Extending the turbo-geth code to handle JUMPSUB (original spec - to JUMPDESTs) I get the following sketch. I can’t see any problem – this jumpdest analysis should find and mark all of the immediate data for the PUSH* opcodes and any others we introduce, so at runtime these will not be considered valid destinations.

	for pc := 0; pc < len(code); {
		op := OpCode(code[pc])
		pc++
		if op >= PUSH1 && op <= PUSH32 {
			numbits := int(op - PUSH1 + 1)
			// ... set bits
			pc += numbits
		} else if op == JUMPSUB {
			// ... set bits
			pc += 2
		} // else if op == any other op taking immediate data
	}

gcolvin · April 17, 2021, 7:20pm

But way back over here I find an example of the problem I don’t really understand, from @gumb0

@chriseth put it as:

If the argument to a newly-introduced multi-byte opcode is the byte value of JUMPDEST, then this was a valid jump destination before the hard fork, but it is no longer a valid jump destination after the hard fork.

I think this can happen only where code preceding the JUMPDEST wasn’t executabl – at some point I could scan the blockchain to see if the danger has ever manifested. This danger could be eliminated going forward by demanding that unreachable code be 0 (or INVALID, or something) so that new opcodes will never occur outside of immediate data.

And this is the problem with my analysis algorithm - it can’t tell whether it’s seeing old data or a new bytecode. Perhaps it can do better.