There is some inconsistency between the test case and the spec in EIP-2315.
The spec says that BEGINSUB is not supposed to be executed and its execution will cause error. JUMPSUB will land on the next instruction after BEGINSUB.
However, the test case (and the current open-ethereum implementation) assumes BEGINSUB can be executed as a noop. JUMPSUB will land on the BEGINSUB instead of the next.
Note that I believe the spec makes more sense from the security perspective. It prevents unintended control flow behavior in EVM crossing routine boundaries.
List of issues:
BEGINSUB should not be executed (or return OUT_OF_GAS). But the last test case in the EIP spec (Subroutine at end of code) counts it executed with gas 1.
The test case in openthereum also assumes BEGINSUB can be executed. If following the spec, it should be something like this:
A further comment on this: I think there is some logic behind putting Cost and Codes under Implementation in https://eips.ethereum.org/EIPS/eip-2315, but usually EIPs put these definitions into the Specification and that is where most people expect to find them.
(Cost and Codes currently contains the opcode numbers and the gas costs.)
Current costs are defined as "BEGINSUB be base (2) , JUMPSUB be mid (8) , and RETURNSUB be verylow (3)"
@holiman suggests: "I don’t see why RETURNSUB should be so cheap. I’d actually prefer it to be same as JUMPSUB – or, more specifically, that cost of(JUMPSUB+RETURNSUB) == cost of (JUMP + JUMP) . Which currently would put it at mid == 8"
I suggest that since JUMPSUB and RETURNSUB both need to push to/pop from the return_stack, they should be more expensive than JUMP (mid). Maybe the difference is not measurable too much, but still they should not be the same. I suggest mid + 1 or mid + 2 as a hunch.
I would argue that a RETURNSUB is inherently cheaper, since it doesn’t have to validate the destination. It just needs to pop a stack and set PC. I mean POP is cheap (pops one). So my hunch is 10 for JUMPSUB and maybe 3 or 5 for RETURNSUB.
That means JUMSUB+RETURNSUB == 15, slighlty advantageous over JUMP+JUMP=16.
If the proposal from @chriseth et al goes through, maybe we need to adjust something.
@xwvvvvwx@holiman The spec takes it to be implied by the Yellow Paper, but on a closer read it isn’t. Not sure whether the Yellow Paper should change to make it clear that no instructions - not just JUMPDEST - are valid inside of PUSH data, or whether this spec (and eventually the Yellow Paper) should make that clear that BEGINSUB is not valid there.
I’m not really convinced by this EIP. I identified three supposed arguments for it in the discussion:
This is supposed to be some kind of standard and other architectures have this, thus it is worth reproducing.
This is supposed to make static analysis easier.
This is supposed to save gas.
With regards to these I would say:
This point seems a bit questionable to me. I don’t know of any actively used modern architecture that has and actually uses native low level subroutines. x86 may look like it has them, but in fact it just has opcodes that implicitly push return addresses on the stack, so effectively it doesn’t. ARM doesn’t, RISC-V doesn’t, etc. Actually I rather find the absence of native subroutine support in all modern architectures noteworthy. So I’m not sure where the idea of all architectures converging on some consensus that this is a good thing is coming from. But I generally find it rather mute to argue this point. If it was generally beneficial to have them, it should be possible to argue the merits directly.
I’m not an expert, but I neither think that reading subroutines using any calling convention like the one solidity uses in the absence of dynamic jumps is particularly hard and I don’t think this EIP will really help. It’s not like we will end up with plain straight opcode blocks starting with BEGINSUB ending in RETURNSUB, all of which clearly belonging to a function, anyways. In practice first of all optimized code will have deduplicated blocks, i.e. tails of functions or branches will be shared, etc. Secondly, one would still need to verify the stack layout before a RETURNSUB to make sure it matches across returns and fits the expectations on the call site anyways. But as I said, I’m not an expert on this, so if people doing a lot of static analysis agree that this is beneficial, I won’t argue with it. Do they?
This is the main point I have concerns about. Solidity code generation and optimization has maybe been a bit lacking in this area, but I don’t think that’s a good basis for a premature change to the EVM. That being said, we for example recently introduced a jump-based inliner as part of the solidity optimizer (https://github.com/ethereum/solidity/pull/10761 as a base version with the plan to extend it further) that can move code blocks behind known jump destinations. This can yield quite some gas savings, but can actually be made harder by this EIP in some cases.
For example, consider a function jumping to another function at the beginning.
MAIN_CONT // return address
F1
JUMP // jump to F1
F2:
...
JUMP // return from F2
F1:
0x42 // potential function argument of F2
F1_CONT // return address
F2
JUMP // call to F2
F1_CONT:
...
JUMP // return from F1
MAIN_CONT:
STOP
// This is basically the following situation in Yul
//
// function f1() {
// f2(42)
// ...
// }
// function f2(a) { ... }
// f1()
//
// And the optimization has the call to f1 directly jump to f2.
// Situations like these for example occur in the ABIEncoderV2 code in nearly every contract.
We can inline the head call and transform this to:
MAIN_CONT // return address
0x42 // potential function argument of F2
F1_CONT // return address
F2
JUMP // call to F2
F2:
...
JUMP // return from F2
F1_CONT:
...
JUMP // return from F1
MAIN_CONT:
STOP
From this stage there are further optimization opportunities (like removing the jump to F2 and instead falling through), which again will become impossible if subroutines were used.
That’s one (and maybe not the best) example of an optimization that wouldn’t be possible if we used subroutines and I don’t think it’s the only case.
In other cases, of course, avoiding having to rotate the return address up in the stack using subroutines may of course also save gas cost, but I don’t think it is easy to say which weighs more heavily in practice without extensive analysis. I’m also not sure that subroutines are really the easiest way to avoid this shuffling cost (For example, opcodes for stack rotations were proposed earlier as a comment here. Or if we had just one or two general purpose registers, none of this would be necessary - and thosereally are standard and consensus among architectures for decades, if an argument like that amounts to anything…).
I would have loved to look into this further before posting, but since this EIP is considered for Berlin I found it worthwhile to share some concerns now.
I’m not necessarily saying that subroutines and this EIP are definitively a bad thing - but I find the argument for it to be a bit lacking so far and am not convinced that it’s readily apparent that this will bring sufficient merits to justify the change at this point.
Thanks for you critique. I’m not too concerned with physical machines. The EVM is a virtual machine. This proposal models the Forth subroutine mechanism. which has seen fifty years of success. Other standard VMs support subroutine calls, including JVM and Wasm. And regardless, this proposal is going into Berlin now.
Well, given that in its current form the EIP has extremely limited practical use, if any at all, and, if used at all, has limited value, while it on the other hand proactively complicates other optimization opportunities, as far as I’m concerned, this should never have been accepted to Berlin in the first place, especially on such a thin basis. In fact it makes me seriously question the process of acceptance, even more so if there appears to be a tendency to dogmatically stick to it after the fact.
Slightly off-topic, but also sort of on-topic since it is part of your opposition: I don’t know any contract authors that optimize for deployment cost over runtime cost, so this seems like a bad optimization (adds a JMP+RET instruction at runtime in order to save some deployment bytes).
Clearly, I disagree with your harsh assessment of the proposal, as have many others. And I don’t see following through on previous commitments as dogmatism. After months of discussion we decided to accept, clients have implemented it, it has been running on test nets, and it’s scheduled for Berlin.
I don’t understand your point Micah. You can use these opcodes to substantially reduce the gas burned calling subroutines. Anyway, it’s done. Discussions of how to use it, when (not) to use it, and EIPs to increase its usefulness are more productive at this point.
Well, maybe my assessment was indeed (at least formulated) a bit overly harsh, sorry about that, I didn’t mean to offend.
Still, it is unfortunate that there hasn’t been more real-world use case deliberations about this in the discussion so far. To be honest I have seen this EIP earlier than now, but never thought it to be ready or a likely candidate for inclusion at all in its current form and was rather shocked when I realized that it was staged for Berlin. There was clearly some communication errors involved here.
And if we really have to go through with it now, it at the very least creates a rather unfortunate situation for us, because while there is some reason for using it in Solidity, there is sufficient reason for not using it outweighing the benefits as far as I can judge so far, so this is of course some source of frustration.
E.g. I may save some swapping, but I pay for it by potentially not getting rid of entire jumps. It’s not only deploy time cost savings, but very much runtime code savings that are at stake here.
Are any of your concerns about “practical use” addressed by EIP-2327: BEGINDATA opcode, which is being proposed for the fork after Berlin? 2315 was the result of “stripping down” an earlier proposal, 615, because it was deemed too complex. I suspect that adding complexity back to it 2315 may result in us re-hashing the entire debate we had around 615.
There was clearly some communication errors involved here.
Agreed, working on improving this for the next upgrade.