All current EVM instructions other than the push opcodes take all of their operands from the stack. Many instructions have been proposed in the past that would’ve liked to use immediate operands, where the operand value is a constant hardcoded in the bytecode instead.
No such instruction has been accepted yet because of concerns about breaking existing EVM bytecode. EOFv1 (EIP-7692) would have addressed this, but the proposal was ultimately rejected.
This post looks at the problem anew and describes options available for instructions that would like to use immediates. Only backward-compatible options are considered.
Problem: Backward compatibility
Since SWAPN/DUPN were first proposed as EIP-663 in 2019, it’s been known that introducing instructions with immediates is a breaking change if their immediates must be masked for jumpdest analysis, i.e., if a JUMPDEST opcode (5b
) found as part of an immediate should not be considered a valid jump target, just like the PUSH operand is masked.
To understand this breaking change, consider the bytecode sequence e6 5b
that currently disassembles and behaves as INVALID JUMPDEST
. If an upgrade makes the opcode e6
into an instruction “OP” with a 1-byte immediate, the same bytecode sequence would then disassemble to OP 0x5b
, with no JUMPDEST and thus invalidating a jump target the contract may need to function. A similar thing can happen with push opcodes, for example, e6 60 5b
would transform from INVALID PUSH1 0x5b
to OP 0x60 JUMPDEST
, creating a new valid jump target. And this may have a cascading effect on arbitrarily many subsequent instructions (consider e6 60 60 …
).
Since the EVM doesn’t offer a dedicated section for data, arbitrary data may be embedded in code and serve as immutable data storage accessed via CODECOPY. Therefore, we must assume that arbitrary byte sequences may be found in contract code that should not be broken.
Option 1: No immediates
It would be possible for the EVM to keep PUSH* as the only immediate-carrying instructions and to continue requiring that all operands go through the stack.
This raises the question of why we might want immediates instead. I believe this comes down to two reasons:
- Efficiency: If operands are always known statically it is more efficient for the VM to skip stack manipulation and gas accounting.
- Static analyzability: Values in the stack can come from user input and thus take arbitrary values. This can make static analysis of bytecode exponentially more difficult. Static analyses that are instead easy could be safely used in the execution layer without introducing DoS attack vectors, or could be used to improve the security of the application layer.
Option 2: Immediates
Option 2.1: Changes to jumpdest analysis
The simplest way to introduce immediates without breaking existing code is to use EOF-like bytecode versioning as enabled by EIP-3541. A prefix such as ef00
at the beginning of the code might indicate that a variant of jumpdest analysis should be used instead of the original one, where certain opcodes would imply additional masking.
Option 2.2: No changes to jumpdest analysis
Option 2.2.1: Disallowed immediate bytes
Observe that issues only arise when the JUMPDEST or PUSH* opcodes are found in the immediates. A new instruction can be designed so that this never happens in functional code: the bytes 5b
and 60
to 7f
are banned (only failing at runtime, with no validation during contract creation), and the immediate bytes are encoded at assembly or compile time, and decoded during execution, in a way that works around the banned bytes to make sure that a contiguous range of operand values can be used.
For a worked out example of this approach see EIP-8024.
The downside of this option is the complexity of decoding, which must be carefully designed to balance efficiency and expressivity of the instruction, and arguably that it obfuscates the true immediate value in bytecode that is read without the help of a disassembler.
Option 2.2.2: PUSH prefix
Since the PUSH* instructions are already masked for jumpdest analysis, they can be used for immediates by mandating that an instruction be preceded by a PUSH (probably of a specific size) in the code.
An example of this approach is found in EIP-7912.
The downside is that the operand must still go through the stack and gas accounting, especially for an EVM implementation that executes bytecode directly without a preceding parsing step. As a result this approach only provides the benefit of static analyzability.
Option 2.2.3: PUSH postfix
Alternatively, an instruction could require that it be followed by PUSH, with it thus becoming part of the instruction encoding, since the PUSH would not be executed normally but instead can be read directly by the prior instruction and the program counter adjusted to skip over it.