EIP-2315 Simple Subroutines for the EVM

EIP-2327 as far as I’m concerned is a good thing, but it’s pretty much independent of the issues with this EIP (and also doesn’t depend on this EIP), so it won’t help, unfortunately. So far I don’t see a way to change much in this EIP in the future to that end - defining subroutines as a low-level feature just makes certain low-level jump optimizations impossible (unless one just doesn’t use them of course). So we will have to evaluate, if it still makes sense to use them in some cases, but not in others, but that may be too complicated, so it may also happen that we won’t be using them at all.
There may of course be some magic hidden way to have the benefits of subroutines as suggested here that still allows the optimizations I am concerned about, but unfortunately I don’t see it so far at least.
In its entirety EIP-615 or generally moving towards a more wasm-like style would maybe be a different story, but I wasn’t under the impression that that was on the table even in the foreseeable future on top of this.

Also the de facto benefits that this EIP can bring (i.e. avoiding some stack shuffling) could have potentially been achieved by different means that would have been less restrictive otherwise and I’m not sure, if the existence of subroutines will make it less likely for such alternative mechanisms to be introduced in the future… but all of this would require a lot of research and analysis in general.

The main point being that it’s not like this EIP will just save you some gas, so it’s useful and beneficial and that’s it. It’s more complicated than that and ideally, of course, that would have been considered earlier. Whether it can or cannot be adjusted in the future to mitigate the downsides is hard to tell out of my hat, but I’m not overly optimistic.

1 Like

Our process can always be improved, but this EIP (like others) was submitted for review here. In January of last year. It has been listed on the Agenda for most every ACD call since February as up for discussion, EFI, accepted, or scheduled for Berlin. Anyone interested in the development of Ethereum has had plenty of time to contribute to this discussion (and others) in well-known forums. But it seems to be a tradition to wait until just before deployment to notice that EIPs are going in.

And a meta point – if we can’t get from proposal to deployment in over a year’s time then we are moving way too slowly to compete in the 21st century.

1 Like

I would agree that in case this EIP is really not a thing of itself, but paves the way for future changes, it would be very helpful to clearly communicate that, if not as formal future EIPs, but at least as some informal writeup. It feels like its benefits are percieved differently because of some miscommunication of its value.

1 Like

I have benchmarked the EIP-2315 overhead in evmone. I used the EIP-2315 implementation proposed by @yperbasis from Silkworm. The implementation adds EIP-2315 support to both evmone’s interpreters: advanced and baseline. This is not the final version (aka “not merged yet”).

Partly aggregated data from Haswell CPU 4.0 GHz, Clang 11 are available in this spreadsheet.

In summary, for “main” benchmark set:

  • advanced analysis time has increased by +2.3%,
  • advanced total execution time (including analysis) has increased by +1.4%,
  • baseline total execution time (including jumpdest analysis) has increased by +4.1%.

The aggregation uses arithmetic average instead of geometric one - there are not such option in pivot table in Google Spreadsheet.

Just to strengthen my point a bit. After having another quick look at the " Appendix: Comparative costs." example in the EIP. The proposed version after the EIP, annotated with gas costs as specified by the IP is the following:

TEST:
     beginsub
     0x02			3
     0x03			3
     TEST_MUL		3
     jumpsub		10
     returnsub		5
TEST_MUL:
     beginsub
     MULTIPLY		3
     jumpsub		10
     returnsub		5
MULTIPLY:
     beginsub
     mul			(5)
     returnsub		5

It claims that calling this will cost 32 gas plus 5 for the multiplication. If I add this up I get 47 plus 5, though.

Now it compares this to some solidity code which has way more guarantees like function arguments are alive until the function exit at fixed stack slots for easier debugging, etc. (we will probably optimize this further eventually, but hunting down a few swaps and dups is not the priority compared to for example reducing storage loads and writes, if possible). So this is comparing apples and oranges.
If I write the same code myself optimally using jumps without subroutines I get the following instead:

TEST:
     jumpdest 
     0x02		3
     0x03		3
     TEST_MUL	3
     jump		8
TEST_MUL:
     jumpdest	
     MULTIPLY	3
     jump		8
MULTIPLY:
     mul		(5)
     swap1		3
     jump		8

This is 39 gas plus 5 for the multiplication. To be fair there is an additional hidden 3 gas outside the function pushing the return address of the whole thing making it 42 plus 5. Or put differently: jumping to the whole thing once costs 3+3+8 gas without subroutines giving a total of 53+5 gas. In the subroutine version jumping to it is 3+10 gas, so in total 60+5 gas.

So if I see this correctly, the example of the gas savings of this EIP, if analysed correctly, actually has the subroutine version be more expensive. And this case does not even hit any of the cases I couldn’t optimize anymore, if using subroutines, that I was mainly concerned about earlier. Of course this will skew if I assume different amounts of calls to functions, etc. But the gas difference is minimal in general. It’s easy to construct artificial cases in which one or the other will be cheaper by some small margin, but I’d argue that gas savings are not suitable to arguing for this EIP. Without saying that there may not be other merits, if the plan is to extend this in the future to something else, or if static analysis experts were to agree that this is hugely beneficial to them or anything like that.

1 Like

A very quick take.

With this proposal, calling a subroutine looks like this:

  1. get arguments onto stack
  2. push subroutine address
  3. JUMPSUB

And writing a subroutine looks like this:

  1. use arguments on stack
  2. leave return value(s) on stack
  3. RETURNSUB.

Without them, calling a subroutine can look like this:

  1. get arguments onto stack
  2. push return address
  3. push subroutine address
  4. JUMP

And writing a subroutine would look like this:

  1. move arguments over return address on stack
  2. use arguments on stack
  3. leave return value(s) on stack
  4. move return address over return value(s)
  5. JUMP

Other calling conventions are possible. I think they all have the overhead of dealing with the return address explicitly. The exact gas cost differences will of course depend on the calling conventions and the subroutine called. But it seems there must be some cost in complexity and gas.

The interesting question is really looking at optimal calling conventions. I can of course push the return address after the arguments to create the need to then move the arguments over the return address, but why would one do that (solidity currently doesn’t handle calls optimally, but this for example it doesn’t do even now :-)). I can also define weird calling conventions when using the subroutine opcodes, so yeah: the optimal case is the interesting one.

A more optimal version for calls without subroutines to consider is this:

  1. push return address
  2. push arguments
  3. push subroutine address
  4. JUMP

And writing a subroutine looks like:

  1. use arguments on stack
  2. produce return values in a clever order (i.e. the first/deepest return value is on the stack top)
  3. Use one swap (independently of the number of return values. Granted solidity is currently lacking some cleverness here, but that’s besides the point) to swap the first return value and the return address.
  4. JUMP

So the additional complexity in the worst case boils down to one swap. Three gas.

On the other hand there’s an abundance of cases, in which not using subroutines allows for head-call or tail-call optimizations which may eliminate the need for this swap, etc, resulting in cheaper code (that’s what makes the example given in the EIP actually more expensive with subroutines compared to without them using optimal calling conventions).
In general, it will depend on the case, but I don’t see much reason to believe that this will tip in favor of subroutines without hard analysis over a large array of actual real world cases. In fact if I were to dare a guess, I’d expect the opposite in practice.

I’m mainly coming from the Forth point of view - as the archetypal stack machine. It has a separate return stack because that swap really is a PITA. The idiom there is to arrange things so that most all calculations leave the stack ready for function calls, and all function calls leave the stack with results in place for easy use.

Well, I’d never disagree that that swap is far from aesthetically pleasing ;-). But it’s not a significant cost. Cleverly arranging things is needed to get optimal results and is desirable in either case, with and without subroutines, there’s not much difference there. On the other hand the second stack for the return addresses is a burden on the clients and this needs to be reflected in the gas cost of the opcodes manipulating it (as of course done in the EIP), which already pretty much levels out the plain gas costs.

And then tail call optimizations alone are really powerful and can be used all over the place. So if one really wants to push for gas, I’m pretty sure that’s the way to do it. And then on top of that there’s even more advanced stuff like partially inlining functions with head calls, etc.

But I may be repeating myself a bit there.
I’m just saying that the argument that this EIP allows gas savings in general does not hold up to scrutiny from where I’m standing and I think I have quite a solid basis for saying so.

Yes, I understand you now. I will probably remove the Appendix, but it doesn’t much change my motivation, and doesn’t change anything for implementors.

I will ask - how much do such optimizations defeat Alexey’s arguments that EVM code is already statically analyzable?

And for Forth-style code it’s not just aesthetics - it’s a major performance obstacle.

1 Like

It is not EVM code that is statically analysable, but Solidity-generated EVM code (without using assembly tricks to compute jump destinations)

Ok, thank you, that’s already some weight lifted from my heart ;-).

With regards to static analyzability, I must say I’m not an expert.
But on the one hand this EIP is not really statically analyzable in general either. I could jump dynamically as long as I make sure that I hit an BEGINSUB, just as I may jump dynamically now, as long as I still make sure I hit a JUMPDEST.
So being able to statically analyze still relies on “good will” and good practice of generated code.
I’m not saying that this couldn’t be restricted in the future in some way, but I’m not precisely aware of a rigorous and thought through plan of moving towards that that has general support. In fact I was under the impression that the rejection of EIP-615 means that this is not on the table (don’t get me wrong, I actually liked EIP-615, but that’s a whole different topic and I haven’t looked into it in all detail) - if that’s on the general agenda it might be nice to document that in an accessible manner (if it is and I’m just not aware, I’d be thankful for being pointed in the right direction).

So the big question is the return address in the case without subroutines. But since we’re assuming benevolent code anyways, the code will have some important properties:

  1. Any jump will be of the form PUSH(addr) JUMP, except jumps returning from subroutines.
  2. From any jump that isn’t immediately preceded by a push, there will only be code paths (moving backwards) with traceable stack balance culminating in all call sites of the subroutine and specifically in a PUSH(returnaddress).

Doing the analysis needed for 2 is of course a burden. It’s complex and nobody will be excited to have to do that, but I would expect it to be possible in all cases (don’t shoot me if I’m wrong with that, though, as I said, I’m not an expert and this also wasn’t the focus of my attention so far).
But it would appear to me that statically analyzing is still theoretically possible in both cases (assuming “benevolently generated code without trickery” that is of course - but in both cases). So without further arguments and a thorough plan for the future that shifts things, I would say that saving cost and complexity on the EVM and on client implementations weighs greater than saving complexity in off-chain static analysis (even though I agree that that may hurt).

Right. I think we are speaking of highly optimized Solidity output.

My question is - can we do the optimizations you mention while maintaining these invariants?

Yes, I would think so.
It may for example fuse a function tail-calling another function into one function in the analysis, but I don’t see that being a problem, it will still remain static.
There’s probably crazy things one can do optimization wise that really breaks static analysis, but it’s possible to avoid these (as part of “benevolent code generation”).

What I see is a solution looking for a problem. I think changes to the EVM should be a last resort, for things that can not effectively be addressed by software, which I cannot see at all to be the case here. Maybe Forth is great, maybe it’s not, I don’t really care, what problem is this solving? It sounds like the problem it is solving is the EVM not being like Forth enough. Or is it about saving a few gas at call sites? Let’s be honest if we were to make a prioritized list of ways we could save gas on mainnet this would basically come last.

In general I find this conversation pretty fascinating, it involves both EVM and compiler developers raising concerns about a change in the Ethereum protocol, and then someone basically replying “too bad so sad what’s done is done”. As far as I am aware Berlin has not happened. Hardforks are a process exactly for this reason. Not everyone has to time to implement every single proposal, but then things are slated to get into a hardfork and everyone starts looking at it, implementors start implementing it, people start making realizations about it. The compiler people might realize “hmmm maybe this isn’t so useful after all”. The EVM people realize “wow this actually makes the code a lot more complicated and we can no longer do an optimization we used to be able to do”. The only answers I have seen to that is “sorry not sorry you should have spoken up earlier”. So how exactly do we change this hardfork process in order that it’s not always “too late” to incorporate the feedback from the people actually doing the work of implementing the hardforks?

This proposal seems to raise a ton of valid issues and if I had veto powers it would be a clear no. It also seems like if we took the politics of the process out of this, I see very few technical arguments about why this is important or even desired for Berlin.

5 Likes

We are now entering a very slippery slope of stopping EIPs without legitimate concerns that have already been approved and implemented in clients ready for a fork. Claiming it should not be included because nobody will use it has been proven wrong. This is not the first time a small group of people have hijacked the implementation of an EIP. Both times the same community has orchestrated these interventions. Going forward we need to more quickly recognized the application of special interests in hijacking EIPs.

Without legitimate concerns? Technical debt and performance regressions seem like pretty real concerns to me. This “small group of people” “hijacking” the EIP as far as I can tell, are the developers working on the thing that EIP is about… Interesting perspective. Is the standard organization some like more elite class of contributor that delivers down EIPs from the sky as Truth and the peasant developers should be very careful about having any opinions about it or rising up against bad ideas, aka “hijacking”. Seems like there are clearly some tension here by the people who think they know how things should be delivering down orders to people who don’t see it the same way, and now a whole lot of politics about this is now somehow “hijacking” by expressing clearly legitimate technical concerns.

We are on a very slippery slope if we just dogmatically accept bad ideas because we have previously agreed to and even with new information we will just cover our ear and stick to the plan for the reason of some sort of moral concept of sticking to plans being the priority over outcome.

3 Likes

As a heads up more discussion took place here: https://github.com/ethereum/pm/issues/263

And on the AllCoreDevs channels and call of today.