EIP-2315 "Simple Subroutines for the EVM" - Analysis

I think this is exactly the right solution. Have any unfixed conceptual problems been found? How hard is it to implement?

[quote=“gcolvin, post:30, topic:4229, full:true”]

I think this is exactly the right solution. Have any unfixed conceptual problems been found? How hard is it to implement?

To be clear, the section that you refer to is “prevent walking into subroutine”. This is pretty trivial to implement.

So, does that mean you are not favouring the “prevent jumping into another subroutine” ? That one is the more difficult one, but which I think is fully doable, both from an evm-perspective and from a backwards-compatibility perspective.

I am not favoring more than the minimal “prevent walking into another subroutine”. It’s a clear improvement that presents no difficulties. Beyond that I’m less much less convinced, as I’ve stated at intervals above.

1 Like

@chriseth, @holiman,

Maybe that was missed, but I suggested to limit “create init” code to twice the deployed code size limit, i.e. 2x 24576 == 49152. This should significantly bound the possible number of jumpdests and subroutines. Thoughts?

Yes I think that’s a very good suggestion. Should definitely help

I’m editing the spec now. How would that go?

If this were to be applied to the network I would really suggest to make it into a separate EIP, required by 2315, otherwise mixing different restrictions into a feature EIP sounds weird.

I could be convinced on the contrary though.

2 Likes

What’s the current thoughts about tail calls and e.g. @chfast’s suggestion of TAILJUMPSUB? I’ve personally been wanting to implement tail call optimizations in solidity for a while (although I think there’s no consensus about how beneficial they woud be - however, e.g. potential more functional languages might suffer even more from lacking the possibility of such an optimization). So I’m wondering, since the current proposal reads like that will no longer be possible when using subroutines.
(or wait: actually BEGINSUB JUMPDEST ... RETURNSUB and then jumping to that jumpdest from another subroutine would work? If so, then that would encourage jumping into subroutines for this, which @axic wants to avoid; I need to read the proposal a bit more carefully)
EDIT: ok, reading the proposal a bit more carefully makes me realize that the only benefit tail call optimizations would bring would be to save one slot in the return_stack - apart from that the combination of JUMPSUB RETURNSUB should do exactly what I want already. And the size of return_stack is probably a reasonable limitation, so further support for tail call optimizations might really not be needed.

So far as preventing jumping into subroutines goes – my problem remains that it has no semantics. It’s syntax, not mechanism, and the Rationale for this proposal is the least amount of mechanism needed to implement subroutines.

Throughout this discussion my take becomes stronger that syntactic restrictions should be no part of the specification of the opcodes.

  • Opcodes control state-machine transitions, and should be constrained only by the validity of the states.

This lack of syntax should pose no obstacles to compilers from high level languages to EVM code – they can enforce whatever syntax they like. I understand that when compiling from EVM to lower level code this matters, but I’ve generally seen compilers just “throw up their hands” and generate bad code when their input has too little structure. We can avoid this if we want …

We may later want to propose a separate EIP for a more-constrained EVM. It can enforce code structure at deployment time that is better suited for streaming compilation and other purposes. So along the lines of EIP-615, taking recent conversations and experience with EIP-2315 into account.

Subroutine map with capped executable code size

Current max size of deployable code is 24576 (0x6000). We propose to also limit the size of executable code to 2x the above limit, i.e. 49152 (0xc000).

This gives nice properties:

  • instruction offset in code fits 16-bit value,
  • code size fits 16-bit value.

Possible implementations of subroutine maps

Shadow array

Max memory footprint: 0xc000

We create a byte array of the size of the code shadowing the instructions in the code. The array is initially filled with zeros.

  • a JUMPDEST instruction is indicated by 0x80 mask in the shadow array.
    To validate jump destination: (shadow_array[location] & 0x80) != 0.
  • a BEGINSUB instruction is indicated by 0x40 mask in the shadow array.
  • the size of a subroutine is encoded in the remaining 6-bits using variable-length encoding analogous to LEB128. Examples:
    • 40 - subroutine of length zero,
    • 5f - subroutine of length 31 (0x1f), the max length that fits single byte encoding,
    • 6001 - subroutine of length 32 (0x00 + 2^5 * 0x01)
    • 7f - subroutine of length 1023 (0x1f + 2^5 * 0x1f), the max length that fits two-byte encoding,
    • 6081 - subroutine of length 32 followed by a JUMPDEST instruction.

Bitset with linear search

Max memory footprint: 0xc000 / 4 == 12288

We create a bitset indicating an instruction is JUMPDEST and another bitset indicating an instruction is BEGINSUB. During execution we keep track of where the current subroutine starts.

To validate if a jump stays within a subroutine:

  • when jumping backwards, just check if juspdest_location >= current_subroutine_begin,
  • when jumping forward, find the beginning of the next subroutine starting from current PC (where the jump is). This is done by linear search in subroutine bitset. We can process 64 bits at a time by using 64-bit load and checking if value is not zero. This may be optimized further using SSE2 (128-bit) or AVX (256-bit).

In worst case we need 0xc000 / 64 == 768 comparisons.

Span map

Max memory footprint: 0xc000 * 2 == 98304 + jumpdest map.

We keep an array of all BEGINSUB instructions offsets. Single value is uint16 and values are already sorted.

To verify JUMPSUB check if location is in the array using binary search.

When entering a subroutine remember subroutines begin and end location. The end location is the next value in the array.

To verify if a jump stays within a subroutine check if jumpdest_location is between subroutine’s begin and end.


I’m not able to prove it, but that was exactly my intention.

The deploy-time code validation is the precise boundary I don’t want to cross. See EIP-2315 "Simple Subroutines for the EVM" - Analysis - #19 by AlexeyAkhunov.

One detail that you don’t mention explicitly, but might be worth to note: I believe all three implementations would require that JUMPSUB pushes current subroutine start into control stack togetter with return location, so that after RETURNSUB we can remember to which one we returned.

Here is a complete (untested) C++ implementation of a variant of @chfast’s idea of the shadow array. It does not use a complicated variable-length encoding and needs less memory, though. It needs two bit fields with code_size number of bits: The first stores at bit i whether code[i] is push data or not. The second stores the length of each subroutine such that the length of the subroutine that starts at byte i is stored in binary starting at the ith bit.

It does not need a stop marker for the length encoding because it reads the length bit-by-bit and checks if the end of the subroutine has been reached already. This can easily be removed by making the size of the bitset twice as large or by using a more complicated (prefix-free) number encoding.

This PR incorporates the fix to prevent the EVM from stepping (as opposed to jumping) to BEGINSUB – JUMPSUB transfers control to the instruction after BEGINSUB; BEGINSUB itself aborts. This PR does not prevent “jumping into subroutines.”
PR
EIP

@chfast @gumb0 @axic @chriseth @holiman

it looks like the example is wrong https://github.com/ethereum/EIPs/blob/1e3659c86d284473fde869a7a8129c5e7af6fe7e/EIPS/eip-2315.md#subroutine-at-end-of-code. BeginSub shouldn’t be present. Right?

Here is a specification for the restrictions on the jump and jumpi opcodes: https://github.com/ethereum/EIPs/pull/2663

(the “do not walk into subroutines” part is already merged into EIP-2315)

Subroutine maps help answering following questions (sometimes also by using information in the code itself):

  • is this instruction a JUMPDEST?
  • is this instruction a BEGINSUB?
  • what is the length of this subroutine?

For subroutine maps we have two classes of implementations (as identified with @chriseth).

  1. “Code shadow array”
    The size is proportional to the code length. We believe the variant with the lowest memory footprint would use 2 bits per byte of code (proposed by @chriseth). Up to one byte per byte of code presented as “Shadow array” in EIP-2315 "Simple Subroutines for the EVM" - Analysis - #41 by chfast. This class of implementations is generic and does not depend on the discussed executable code length cap.
  2. “Span map”
    This is a map/set of subroutine begins/ends but can be efficiently stored as an array and binary search can be used to lookup values in the map. The size is proportional to the number of subroutines. But in worst case the code may be only 1-byte long subroutines.
    This depends on the code length cap because with the cap we can use uint16 type, otherwise we need uint32. In the worst case with uint32 type the memory footprint is 4x the code length.

True. I actually did not realized that I need to remember the all the subroutines on the control stack.

This is however not needed for “span map”. One binary search is enough to find both begin and end of a subroutine.

That’s what I meant here: