EVM Immediates

All current EVM instructions other than the push opcodes take all of their operands from the stack. Many instructions have been proposed in the past that would’ve liked to use immediate operands, where the operand value is a constant hardcoded in the bytecode instead.

No such instruction has been accepted yet because of concerns about breaking existing EVM bytecode. EOFv1 (EIP-7692) would have addressed this, but the proposal was ultimately rejected.

This post looks at the problem anew and describes options available for instructions that would like to use immediates. Only backward-compatible options are considered.

Problem: Backward compatibility

Since SWAPN/DUPN were first proposed as EIP-663 in 2019, it’s been known that introducing instructions with immediates is a breaking change if their immediates must be masked for jumpdest analysis, i.e., if a JUMPDEST opcode (5b) found as part of an immediate should not be considered a valid jump target, just like the PUSH operand is masked.

To understand this breaking change, consider the bytecode sequence e6 5b that currently disassembles and behaves as INVALID JUMPDEST. If an upgrade makes the opcode e6 into an instruction “OP” with a 1-byte immediate, the same bytecode sequence would then disassemble to OP 0x5b, with no JUMPDEST and thus invalidating a jump target the contract may need to function. A similar thing can happen with push opcodes, for example, e6 60 5b would transform from INVALID PUSH1 0x5b to OP 0x60 JUMPDEST, creating a new valid jump target. And this may have a cascading effect on arbitrarily many subsequent instructions (consider e6 60 60 …).

Since the EVM doesn’t offer a dedicated section for data, arbitrary data may be embedded in code and serve as immutable data storage accessed via CODECOPY. Therefore, we must assume that arbitrary byte sequences may be found in contract code that should not be broken.

Option 1: No immediates

It would be possible for the EVM to keep PUSH* as the only immediate-carrying instructions and to continue requiring that all operands go through the stack.

This raises the question of why we might want immediates instead. I believe this comes down to two reasons:

  1. Efficiency: If operands are always known statically it is more efficient for the VM to skip stack manipulation and gas accounting.
  2. Static analyzability: Values in the stack can come from user input and thus take arbitrary values. This can make static analysis of bytecode exponentially more difficult. Static analyses that are instead easy could be safely used in the execution layer without introducing DoS attack vectors, or could be used to improve the security of the application layer.

Option 2: Immediates

Option 2.1: Changes to jumpdest analysis

The simplest way to introduce immediates without breaking existing code is to use EOF-like bytecode versioning as enabled by EIP-3541. A prefix such as ef00 at the beginning of the code might indicate that a variant of jumpdest analysis should be used instead of the original one, where certain opcodes would imply additional masking.

Option 2.2: No changes to jumpdest analysis

Option 2.2.1: Disallowed immediate bytes

Observe that issues only arise when the JUMPDEST or PUSH* opcodes are found in the immediates. A new instruction can be designed so that this never happens in functional code: the bytes 5b and 60 to 7f are banned (only failing at runtime, with no validation during contract creation), and the immediate bytes are encoded at assembly or compile time, and decoded during execution, in a way that works around the banned bytes to make sure that a contiguous range of operand values can be used.

For a worked out example of this approach see EIP-8024.

The downside of this option is the complexity of decoding, which must be carefully designed to balance efficiency and expressivity of the instruction, and arguably that it obfuscates the true immediate value in bytecode that is read without the help of a disassembler.

Option 2.2.2: PUSH prefix

Since the PUSH* instructions are already masked for jumpdest analysis, they can be used for immediates by mandating that an instruction be preceded by a PUSH (probably of a specific size) in the code.

An example of this approach is found in EIP-7912.

The downside is that the operand must still go through the stack and gas accounting, especially for an EVM implementation that executes bytecode directly without a preceding parsing step. As a result this approach only provides the benefit of static analyzability.

Option 2.2.3: PUSH postfix

Alternatively, an instruction could require that it be followed by PUSH, with it thus becoming part of the instruction encoding, since the PUSH would not be executed normally but instead can be read directly by the prior instruction and the program counter adjusted to skip over it.

2 Likes

I’m generally in favor of 2.1 – I think it’s the cleanest option so far as easy decoding and readability. The postfix PUSH is also good in that respect, and the extra byte “future-proofs” the immediate argument.

I propose the PUSH prefix in EIP-7979 because I am restricting validated uses of JUMP to PUSH const JUMP (and adding PUSH const CALLSUB) and don’t want to change the syntax.

I do propose to use immediate arguments in EIP-8013 which is all new opcodes intended to work only in MAGIC (0xE0) code.

I’m generally in favor of 2.1 – I think it’s the cleanest option so far as easy decoding and readability. The postfix PUSH is also good in that respect, and the extra byte “future-proofs” the immediate argument.

I propose the PUSH prefix in EIP-7979 because I am restricting validated uses of JUMP to PUSH const JUMP (and adding PUSH const CALLSUB) and don’t want to change the syntax.

I do propose to use immediate arguments in EIP-8013 which is all new opcodes intended to work only in MAGIC (0xE0) code.

1 Like