EIP-2315 Simple Subroutines for the EVM

ekpyron · March 1, 2021, 5:06pm

I’m not really convinced by this EIP. I identified three supposed arguments for it in the discussion:

This is supposed to be some kind of standard and other architectures have this, thus it is worth reproducing.
This is supposed to make static analysis easier.
This is supposed to save gas.

With regards to these I would say:

This point seems a bit questionable to me. I don’t know of any actively used modern architecture that has and actually uses native low level subroutines. x86 may look like it has them, but in fact it just has opcodes that implicitly push return addresses on the stack, so effectively it doesn’t. ARM doesn’t, RISC-V doesn’t, etc. Actually I rather find the absence of native subroutine support in all modern architectures noteworthy. So I’m not sure where the idea of all architectures converging on some consensus that this is a good thing is coming from. But I generally find it rather mute to argue this point. If it was generally beneficial to have them, it should be possible to argue the merits directly.
I’m not an expert, but I neither think that reading subroutines using any calling convention like the one solidity uses in the absence of dynamic jumps is particularly hard and I don’t think this EIP will really help. It’s not like we will end up with plain straight opcode blocks starting with BEGINSUB ending in RETURNSUB, all of which clearly belonging to a function, anyways. In practice first of all optimized code will have deduplicated blocks, i.e. tails of functions or branches will be shared, etc. Secondly, one would still need to verify the stack layout before a RETURNSUB to make sure it matches across returns and fits the expectations on the call site anyways. But as I said, I’m not an expert on this, so if people doing a lot of static analysis agree that this is beneficial, I won’t argue with it. Do they?
This is the main point I have concerns about. Solidity code generation and optimization has maybe been a bit lacking in this area, but I don’t think that’s a good basis for a premature change to the EVM. That being said, we for example recently introduced a jump-based inliner as part of the solidity optimizer (https://github.com/ethereum/solidity/pull/10761 as a base version with the plan to extend it further) that can move code blocks behind known jump destinations. This can yield quite some gas savings, but can actually be made harder by this EIP in some cases.

For example, consider a function jumping to another function at the beginning.

  MAIN_CONT // return address
  F1
  JUMP // jump to F1

F2:
  ...
  JUMP // return from F2

F1:
  0x42 // potential function argument of F2
  F1_CONT // return address
  F2
  JUMP // call to F2
F1_CONT:
  ...
  JUMP // return from F1

MAIN_CONT:
  STOP

// This is basically the following situation in Yul
//
// function f1() {
//   f2(42)
//   ...
// }
// function f2(a) { ... }
// f1()
//
// And the optimization has the call to f1 directly jump to f2.
// Situations like these for example occur in the ABIEncoderV2 code in nearly every contract.

We can inline the head call and transform this to:

  MAIN_CONT // return address
  0x42 // potential function argument of F2
  F1_CONT // return address
  F2
  JUMP // call to F2

F2:
  ...
  JUMP // return from F2

F1_CONT:
  ...
  JUMP // return from F1


MAIN_CONT:
  STOP

From this stage there are further optimization opportunities (like removing the jump to F2 and instead falling through), which again will become impossible if subroutines were used.

That’s one (and maybe not the best) example of an optimization that wouldn’t be possible if we used subroutines and I don’t think it’s the only case.
In other cases, of course, avoiding having to rotate the return address up in the stack using subroutines may of course also save gas cost, but I don’t think it is easy to say which weighs more heavily in practice without extensive analysis. I’m also not sure that subroutines are really the easiest way to avoid this shuffling cost (For example, opcodes for stack rotations were proposed earlier as a comment here. Or if we had just one or two general purpose registers, none of this would be necessary - and those really are standard and consensus among architectures for decades, if an argument like that amounts to anything…).

I would have loved to look into this further before posting, but since this EIP is considered for Berlin I found it worthwhile to share some concerns now.
I’m not necessarily saying that subroutines and this EIP are definitively a bad thing - but I find the argument for it to be a bit lacking so far and am not convinced that it’s readily apparent that this will bring sufficient merits to justify the change at this point.