EIP-8024: Backward compatible SWAPN, DUPN, EXCHANGE

frangio · September 16, 2025, 7:56pm

Discussion topic for EIP-8024.

This is a fork of EIP-663 without a dependency on EOF.

Update Log

2025-09-06: initial proposal to update EIP-663
2025-09-15: decision to fork as new EIP
2025-11-04: updated encode_pair, decode_pair to fix inconsistency between spec and examples

frangio · September 27, 2025, 9:55pm

Discussion of alternatives:

charles-cooper · December 17, 2025, 7:28pm

I want to voice opposition against including this EIP in Glamsterdam for the same reasons that I opposed EIP-663 inclusion with EOF (cf. the Compiler Complexity Reduction section in EOF: When Complexity Outweighs Necessity - HackMD).

First of all, I want to point out that Vyper does not have the stack-too-deep problem. We have two pipelines, legacy and venom, with different tradeoffs (overuse of memory vs sometimes inefficient stack spilling), but we are making rapid progress on them. I believe Solidity is also making progress on this as well. Therefore, adding more opcodes to the VM is solving an application level problem at the VM level, needlessly increasing complexity, as enumerated below.

The solution presented by this EIP has several problems, as mentioned in the hackmd article, but I’ll enumerate them again below:

it hurts EVM-to-native compilation (either JIT or AOT, but both have to execute within a defined performance envelope) due to the register allocation problem. Linear-time register allocation also exists, but it is suboptimal and increases memory pressure. Memory pressure should be paid for explicitly(!) by actually using EVM memory!
it burns valuable L1 cache space – the “hot” portion of the stack grows from 512 bytes to 235 * 32 = 7.5KB, which is a substantial portion of the L1 budget (most CPUs have between 32kb and 64kb of L1 cache). allowing user-access to more stack space increases cache thrashing / stack space that is likely to be kept in cache
the backwards-compatibility encoding to avoid JUMPDEST analysis is a complexity that clients will pay forever

It also burns opcodes needlessly, while solving the wrong problem – the reason the Solidity compiler needs deep stack access is because they can’t predict where their dynamically allocated memory will end.

EIP-7923 proposes a more general solution which solves multiple problems at once:

heap vs call stack separation, allowing better EVM compiler design
EVM languages can allocate reserved areas of memory for stack spilling which they can guarantee do not get trampled by other allocators
developers get a modern memory model, not one from the 80s
it is a general fix that improves the design space for compilers and low-level developers without adding EVM complexity besides a pricing change

norswap · December 17, 2025, 9:32pm

I second Charles’ analysis, fixing the memory model would be a much more elegant solution that avoids the downsides.

frangio · December 17, 2025, 11:17pm

I agree that “stack too deep” can be solved without these opcodes! However, solc hasn’t been able to solve it yet, and we can’t wait forever. This error is not acceptable in what should by now be a mature tech stack, and with Solidity being the most popular language for the EVM, I don’t think it’s unreasonable for Ethereum to take this measure at the VM level.

I think the listed downsides are only minor issues or can be mitigated:

Cache: This is a fair point that needs to be measured and reflected in the gas schedule. For example, if SWAPN/DUPN are likely to read from L2 cache instead of L1 cache, the cost might be 10x that of SWAP/DUP (which would be 10 gas if repriced according to EIP-7904).
Decoding complexity: We’re talking about 25 lines of self-contained code consisting of simple arithmetic and conditionals. However, if this is deemed too complex, the EIP can very easily be switched to use push-postfix immediates, which require no decoding.
Compilation to native: I think this requires a more careful statement and discussion. I’m personally not convinced that the addition of SWAPN/DUPN makes compilation any more difficult than it already is. If I understand correctly, the argument is that these opcodes can potentially increase the number of simultaneously live values, but IMO they don’t: if a sequence of instructions grows the stack to height N, we can assume there are N live values, whether they are accessed via SWAPN/DUPN or only accessed when they’re in the top 16 stack slots. A compiler has to figure out what to do with the other N-16 values, with or without SWAPN/DUPN. Most likely, they will be spilled to memory, and that’s where SWAPN/DUPN should read them from. This would also support a higher gas cost, but as far as I know compilation to native is not a consideration that goes into choosing the gas schedule.
Opcode use: I believe there are currently 106 free opcodes, so definitely not a shortage. If this is a real concern, the same impact can be achieved by just including DUPN.

I’m supportive of changing the memory model. I agree that it’d be a good improvement with broader impact than this EIP. It wouldn’t be as immediate a fix for “stack too deep” as this EIP though.

charles-cooper · December 18, 2025, 6:33am

Maybe try a different language or compiler, which does solve it!

I totally agree

Thanks to the EF pushing and providing the bulk of its compiler/languages budget to it all these years. For example, ethereum.org makes no mention of other languages besides Solidity. Why don’t they start promoting languages which don’t have stack too deep errors?

I disagree. It increases complexity of the instruction set and VM, and it’s unnecessary given that a simpler solution has presented which not only increases reasonability of the VM’s memory model, but also improves the design space for compilers and users in a more general way.

Which is more expensive than memory at current prices, at which point applications should just use memory.

Doesn’t this increase decoding time? OTOH, using push-postfix immediates would increase the instruction size to 3 bytes per SWAPN or DUPN, which is think is undesirable from a codesize perspective.

frangio:

Compilation to native: I think this requires a more careful statement and discussion. I’m personally not convinced that the addition of SWAPN/DUPN makes compilation any more difficult than it already is. If I understand correctly, the argument is that these opcodes can potentially increase the number of simultaneously live values, but IMO they don’t: if a sequence of instructions grows the stack to height N, we can assume there are N live values, whether they are accessed via SWAPN/DUPN or only accessed when they’re in the top 16 stack slots. A compiler has to figure out what to do with the other N-16 values, with or without SWAPN/DUPN. Most likely, they will be spilled to memory, and that’s where SWAPN/DUPN should read them from. This would also support a higher gas cost, but as far as I know compilation to native is not a consideration that goes into choosing the gas schedule.

With only 16 addressable stack slots, compilation to native on an architecture is kind of super straightforward – you map stack slots to registers (on x86 you can even map to zmm0-15), when they go out of scope (depth 17 or deeper) you write them to memory, when they go back into scope you recover them from memory. SWAP instructions even map to register renames, which is extremely efficient on modern CPUs. No register allocation is needed.

OTOH if you have 235 addressable values, you need to run a register allocation routine, which finds the live ranges of each stack slot, spills them to memory when there are not enough registers, and then pops them back when there is enough “register space” again. As I’ve mentioned before, the best known algorithms are polynomial time; linear-time algorithms also exist but the quality of the code will be poorer. I highly recommend anybody interested in this problem and who is wondering about the effect of EIP-8024 (and 663) on EVM-to-native read the wikipedia article on register allocation: Register allocation - Wikipedia.

As I understand, you are saying that the 17th+ stack items should be stored in memory as in the first scheme I have described above (where when they go out of scope they get written to memory). That’s fine, but there is already an abstraction modeling memory access in the EVM, which is memory, users should just use that. (And yes, the memory pricing model should be fixed! As I described in my previous post and also below.)

I don’t think this is actually even true. Let me explain: with EIP-8024, the “fix” for stack too deep would be to emit deeper dup/swap instructions, rather than spilling.

However, EIP-8024 also only pushes the problem out. You need to fall back to spilling in the case that the user has more than 235 variables on the stack (which can easily happen if functions are inlined or more kinds of variables are lifted to the stack). So you need to make sure you have a correct spilling algorithm anyways!

With EIP-7923, the solution instead would be to spill at a high address. This solution is actually even easier to implement in the compiler (at least in Vyper) - calculating the safe spill point is trivial with EIP-7923, it’s just a high number. (As mentioned we still have to implement spilling anyways, so it’s not like EIP-8024 lets us skip that).

chfast · February 9, 2026, 11:26am

The immediate bytes decoding with the [0x5b–0x7f] forbidden range (current spec) can by significantly improved by using different normalization mapping: consider the byte to be a signed 8-bit integer.

Proof of Concept: Compiler Explorer

I can send the spec update. However, it is unclear to me what is the status of Update EIP-8024: Switch ecoding to push-postfix by frangio · Pull Request #11094 · ethereum/EIPs · GitHub.

frangio · February 9, 2026, 2:06pm

Good idea! Please submit a PR and we can bring it up on ACDE this Thursday. I’d like to merge it. I don’t think the signed arithmetic is doing any work here other than simplifying the domain guard (although the assembly for that part is the same), I think we should just express this as modular 8 bit unsigned arithmetic.

Have you looked into ways to simplify the EXCHANGE immediate decoding?

Re: push-postfix, I’ve become conviced that it’s not a good idea. No official decision has been made to reject it but no one objected to my comment in the PR saying so. We could bring this up at the same time.

chfast · February 9, 2026, 2:24pm

I haven’t considered EXCHANGE yet, because I don’t understand what is the desired range of arguments (n, m). It looks like the current spec chooses something dictated by the decoding capabilities.

If you ignore decoding, what is your ideal split for the values n, m considering the ranges:

[0–218] for the single-byte immediate,
[0–48179] for the two-byte immediate?

But the new approach separates the concerns. The decode_single_v2() function from the PoC gives you a value in the range [0–218]. How you later use it in any specific instruction is independent. E.g. for DUPN you offset it by +17.

For EXCHANGE this is no different. You have to decide how to split the value into two. By reusing divmod(_, 16) approach:

q, r = divmod(x, 16)  # [0–13], [0–15]
n = r + 1             # [1–16]
m = n + q + 1         # [2–30]
swap(stack[top-n], stack[top-m])

frangio · February 9, 2026, 2:57pm

See EIP-8024 Rationale: EXCHANGE immediate

The range of the current formula has this shape:

(The axes should be n, m, sorry.)

Using addition like your example results in this shape:

This means elements are swappable if they’re close enough to each other, while we prefer being able to move deeper elements closer to the top of the stack.

But we could swap out k = x if x <= 79 else x - 48 for k = (x + 0x80) & 0xff and preserve the rest of the formula, which arrives at the same range:

Colors indicate what half of the domain a pair comes from.

frangio · February 13, 2026, 12:38pm

Update proposed here.

One thing I realized is that the operation that maps the valid immediate domain to a contiguous one is simply XOR with 0x80.

GCC/LLVM are aware of all of these equivalences and they produce addition or XOR depending on the target.

chfast · February 13, 2026, 3:05pm

I was playing with EXCHANGE uint8 → (int, int) mapping a bit. One idea is to modify one line toreturn r + 1, 29 - q.

============================================================
 Original: b = 29 - q, x ∈ [0,79] ∪ [128,255]
 208 unique pairs from 208 inputs
============================================================

29 |  80
28 |  90  91
27 |  A0  A1  A2
26 |  B0  B1  B2  B3
25 |  C0  C1  C2  C3  C4
24 |  D0  D1  D2  D3  D4  D5
23 |  E0  E1  E2  E3  E4  E5  E6
22 |  F0  F1  F2  F3  F4  F5  F6  F7
21 |  00  01  02  03  04  05  06  07  08
20 |  10  11  12  13  14  15  16  17  18  19
19 |  20  21  22  23  24  25  26  27  28  29  2A
18 |  30  31  32  33  34  35  36  37  38  39  3A  3B
17 |  40  41  42  43  44  45  46  47  48  49  4A  4B  4C
16 |  8F  9F  AF  BF  CF  DF  EF  FF  0F  1F  2F  3F  4F
15 |  8E  9E  AE  BE  CE  DE  EE  FE  0E  1E  2E  3E  4E
14 |  8D  9D  AD  BD  CD  DD  ED  FD  0D  1D  2D  3D  4D
13 |  8C  9C  AC  BC  CC  DC  EC  FC  0C  1C  2C  3C
12 |  8B  9B  AB  BB  CB  DB  EB  FB  0B  1B  2B
11 |  8A  9A  AA  BA  CA  DA  EA  FA  0A  1A
10 |  89  99  A9  B9  C9  D9  E9  F9  09
 9 |  88  98  A8  B8  C8  D8  E8  F8
 8 |  87  97  A7  B7  C7  D7  E7
 7 |  86  96  A6  B6  C6  D6
 6 |  85  95  A5  B5  C5
 5 |  84  94  A4  B4
 4 |  83  93  A3
 3 |  82  92
 2 |  81
 1 |
   +──────────────────────────────────────────────────────→ a
       1   2   3   4   5   6   7   8   9  10  11  12  13


============================================================
 Modified: b = 30 - q, x ∈ [0,90] ∪ [128,255]
 219 unique pairs from 219 inputs
============================================================

30 |  80
29 |  90  91
28 |  A0  A1  A2
27 |  B0  B1  B2  B3
26 |  C0  C1  C2  C3  C4
25 |  D0  D1  D2  D3  D4  D5
24 |  E0  E1  E2  E3  E4  E5  E6
23 |  F0  F1  F2  F3  F4  F5  F6  F7
22 |  00  01  02  03  04  05  06  07  08
21 |  10  11  12  13  14  15  16  17  18  19
20 |  20  21  22  23  24  25  26  27  28  29  2A
19 |  30  31  32  33  34  35  36  37  38  39  3A  3B
18 |  40  41  42  43  44  45  46  47  48  49  4A  4B  4C
17 |  50  51  52  53  54  55  56  57  58  59  5A
16 |  8F  9F  AF  BF  CF  DF  EF  FF  0F  1F  2F  3F  4F
15 |  8E  9E  AE  BE  CE  DE  EE  FE  0E  1E  2E  3E  4E
14 |  8D  9D  AD  BD  CD  DD  ED  FD  0D  1D  2D  3D  4D
13 |  8C  9C  AC  BC  CC  DC  EC  FC  0C  1C  2C  3C
12 |  8B  9B  AB  BB  CB  DB  EB  FB  0B  1B  2B
11 |  8A  9A  AA  BA  CA  DA  EA  FA  0A  1A
10 |  89  99  A9  B9  C9  D9  E9  F9  09
 9 |  88  98  A8  B8  C8  D8  E8  F8
 8 |  87  97  A7  B7  C7  D7  E7
 7 |  86  96  A6  B6  C6  D6
 6 |  85  95  A5  B5  C5
 5 |  84  94  A4  B4
 4 |  83  93  A3
 3 |  82  92
 2 |  81
 1 |
   +──────────────────────────────────────────────────────→ a
       1   2   3   4   5   6   7   8   9  10  11  12  13

frangio · February 13, 2026, 3:50pm

This is what I meant by the discontinuity here. I don’t think this would be good for compilers to deal with.

Check out the new one I’ve proposed in that PR, I was able to add two swappable pairs.

chfast · February 13, 2026, 8:59pm

============================================================
 XOR: k = x ^ 143, b = 29 - q, x ∈ [0,81] ∪ [128,255]
 210 unique pairs from 210 inputs
============================================================

29 |  8F
28 |  9F  9E
27 |  AF  AE  AD
26 |  BF  BE  BD  BC
25 |  CF  CE  CD  CC  CB
24 |  DF  DE  DD  DC  DB  DA
23 |  EF  EE  ED  EC  EB  EA  E9
22 |  FF  FE  FD  FC  FB  FA  F9  F8
21 |  0F  0E  0D  0C  0B  0A  09  08  07
20 |  1F  1E  1D  1C  1B  1A  19  18  17  16
19 |  2F  2E  2D  2C  2B  2A  29  28  27  26  25
18 |  3F  3E  3D  3C  3B  3A  39  38  37  36  35  34
17 |  4F  4E  4D  4C  4B  4A  49  48  47  46  45  44  43
16 |  80  90  A0  B0  C0  D0  E0  F0  00  10  20  30  40  50
15 |  81  91  A1  B1  C1  D1  E1  F1  01  11  21  31  41  51
14 |  82  92  A2  B2  C2  D2  E2  F2  02  12  22  32  42
13 |  83  93  A3  B3  C3  D3  E3  F3  03  13  23  33
12 |  84  94  A4  B4  C4  D4  E4  F4  04  14  24
11 |  85  95  A5  B5  C5  D5  E5  F5  05  15
10 |  86  96  A6  B6  C6  D6  E6  F6  06
 9 |  87  97  A7  B7  C7  D7  E7  F7
 8 |  88  98  A8  B8  C8  D8  E8
 7 |  89  99  A9  B9  C9  D9
 6 |  8A  9A  AA  BA  CA
 5 |  8B  9B  AB  BB
 4 |  8C  9C  AC
 3 |  8D  9D
 2 |  8E
 1 |
   +──────────────────────────────────────────────────────────→ a
       1   2   3   4   5   6   7   8   9  10  11  12  13  14

chfast · March 11, 2026, 1:49pm

SWAP Chain Analysis: Opportunities for the EXCHANGE Instruction

Methodology

We instrumented Zilkworm to track consecutive sequences of SWAP instructions during execution of Ethereum mainnet blocks. A “SWAP chain” is a run of two or more SWAP instructions with no other opcodes in between.

Dataset: ~2,720 mainnet blocks (block range 24398374–24621109), representing typical recent Ethereum transaction workloads.

Collection: Runtime instrumentation — we count how many times each unique consecutive SWAP pattern is executed, not just how many exist in bytecode. This reflects actual gas expenditure.

Key Results

Each replacement of 3 SWAPs (3 bytes, 9 gas) with 1 EXCHANGE (2 bytes, 3 gas) eliminates 2 instructions, saves 1 byte and 6 gas.

Metric	Value
Total SWAP chain occurrences	67,359,504
Total SWAP instructions in chains	160,252,881
Unique chain patterns	4,599
Gas spent in chains (at 3 gas/SWAP)	480,758,643
Instructions eliminated by EXCHANGE	25,947,384 (16.2%)
Bytes saved	12,973,692
Gas saved	77,842,152 (16.2%)
Per-block average instructions eliminated	~9,539
Per-block average bytes saved	~4,769
Per-block average gas saved	~28,618

EXCHANGE Instruction Background (EIP-8024)

The EXCHANGE instruction (0xe8) is proposed in EIP-8024 for legacy EVM (not EOF-only). It swaps two stack items that are not the top of stack:

Encoding: 0xe8 imm where imm is a single byte
The immediate is decoded via decode_pair(x) yielding (n, m): k = x ^ 143; q, r = divmod(k, 16); if q < r: (n, m) = (q+1, r+1), else (n, m) = (r+1, 29-q)
Constraints: 1 <= n < m, n + m <= 30
Effect: swap the (n+1)th stack item with the (m+1)th stack item (i.e., stack[n] with stack[m], 0-indexed from top)
Valid n range: 1–14, valid m range: 2–29
Gas cost: 3 (same as SWAP)
Certain immediate values (0x5b, 0x60–0x7f) are disallowed to preserve JUMPDEST and PUSH semantics in legacy bytecode

Unlike SWAP (which always involves stack[0]), EXCHANGE operates on deeper stack positions. This is exactly what many SWAP chains achieve: swapping two non-top elements by temporarily routing through stack[0].

Pattern Categories

1. Exact EXCHANGE Replacements (3 SWAPs → 1 EXCHANGE)

The pattern SWAP_a SWAP_b SWAP_a (where a != b) has the net effect of swapping stack[min(a,b)] with stack[max(a,b)] while leaving stack[0] unchanged. This is exactly one EXCHANGE instruction. Each replacement eliminates 2 instructions, saves 1 byte and 6 gas.

130 unique patterns, 12,046,816 occurrences. 24,093,632 instructions eliminated, 12,046,816 bytes saved, 72,280,896 gas saved.

Top patterns:

Count	Pattern	Replacement	Effect
4,100,981	SWAP2 SWAP1 SWAP2	EXCHANGE(1,1)	stack[1] ↔ stack[2]
3,553,495	SWAP1 SWAP2 SWAP1	EXCHANGE(1,1)	stack[1] ↔ stack[2]
1,186,570	SWAP4 SWAP1 SWAP4	EXCHANGE(1,3)	stack[1] ↔ stack[4]
1,169,171	SWAP3 SWAP1 SWAP3	EXCHANGE(1,2)	stack[1] ↔ stack[3]
619,504	SWAP6 SWAP1 SWAP6	EXCHANGE(1,5)	stack[1] ↔ stack[6]
382,698	SWAP5 SWAP1 SWAP5	EXCHANGE(1,4)	stack[1] ↔ stack[5]
147,086	SWAP3 SWAP2 SWAP3	EXCHANGE(2,1)	stack[2] ↔ stack[3]
112,397	SWAP2 SWAP3 SWAP2	EXCHANGE(2,1)	stack[2] ↔ stack[3]
101,031	SWAP9 SWAP1 SWAP9	EXCHANGE(1,8)	stack[1] ↔ stack[9]
95,989	SWAP7 SWAP1 SWAP7	EXCHANGE(1,6)	stack[1] ↔ stack[7]
85,593	SWAP8 SWAP1 SWAP8	EXCHANGE(1,7)	stack[1] ↔ stack[8]
63,039	SWAP10 SWAP1 SWAP10	EXCHANGE(1,9)	stack[1] ↔ stack[10]
62,252	SWAP13 SWAP1 SWAP13	EXCHANGE(1,12)	stack[1] ↔ stack[13]
61,830	SWAP12 SWAP1 SWAP12	EXCHANGE(1,11)	stack[1] ↔ stack[12]
60,708	SWAP11 SWAP1 SWAP11	EXCHANGE(1,10)	stack[1] ↔ stack[11]

The dominant sub-pattern is SWAP_x SWAP1 SWAP_x — swapping stack[1] with a deeper element. This accounts for the vast majority of exact replacements.

2. Partial EXCHANGE Replacements (within longer chains)

Longer chains often contain SWAP_a SWAP_b SWAP_a sub-patterns that can be extracted and replaced, reducing the chain length.

1,047 unique patterns, 926,876 sub-pattern replacements. 1,853,752 instructions eliminated, 926,876 bytes saved, 5,561,256 gas saved.

Top examples:

Count	Original	After EXCHANGE
101,181	`SWAP3 SWAP6 SWAP3 SWAP5` [4 ops]	`EXCHANGE(3,3) SWAP5` [2 ops]
77,153	`SWAP3 SWAP1 SWAP2 SWAP1` [4 ops]	`SWAP3 EXCHANGE(1,1)` [2 ops]
52,174	`SWAP1 SWAP3 SWAP2 SWAP3` [4 ops]	`SWAP1 EXCHANGE(2,1)` [2 ops]
44,443	`SWAP1 SWAP3 SWAP1 SWAP2` [4 ops]	`EXCHANGE(1,2) SWAP2` [2 ops]
39,550	`SWAP3 SWAP2 SWAP1 SWAP2` [4 ops]	`SWAP3 EXCHANGE(1,1)` [2 ops]

3. Staircase Patterns (not directly replaceable)

Descending staircases (SWAPn SWAPn-1 ... SWAP1) perform a left rotation, bringing a deep element to the top. Ascending staircases (SWAP1 SWAP2 ... SWAPn) push the top element down to depth n.

These cannot be replaced by a single EXCHANGE since they fundamentally change stack[0]. However, they could benefit from a hypothetical ROTATE n instruction.

Descending staircases: 59 patterns, 1,424,636 occurrences

Count	Pattern	Effect
528,154	SWAP3 SWAP2 SWAP1	Rotate top 4 left
478,047	SWAP4 SWAP3 SWAP2 SWAP1	Rotate top 5 left
175,318	SWAP4 SWAP3 SWAP2	Rotate top 3 of [1..4] left
61,955	SWAP6 SWAP5 SWAP4 SWAP3 SWAP2 SWAP1	Rotate top 7 left
59,247	SWAP5 SWAP4 SWAP3 SWAP2 SWAP1	Rotate top 6 left

Ascending staircases: ~186,870 occurrences

Count	Pattern
78,674	SWAP1 SWAP2 SWAP3 SWAP4 SWAP5
56,268	SWAP1 SWAP2 SWAP3
30,609	SWAP1 SWAP2 SWAP3 SWAP4
14,192	SWAP1 SWAP2 SWAP3 SWAP4 SWAP5 SWAP6

4. Two-Element Pairs (dominant by volume)

Pairs account for 60.7% of all SWAP chain instructions (97.3M instructions). These cannot be reduced by EXCHANGE since they are only 2 instructions and EXCHANGE requires 3 SWAPs to justify a replacement.

The most common pairs are adjacent descending pairs forming rotations:

Count	Pattern	Semantic
14,926,659	SWAP2 SWAP1	Rotate top 3 left
13,086,432	SWAP3 SWAP2	Rotate top 4 left (partial)
5,359,213	SWAP4 SWAP3	Rotate top 5 left (partial)
4,810,215	SWAP1 SWAP2	Rotate top 3 right
2,129,201	SWAP1 SWAP3	Move stack[3] toward top

Chain Length Distribution

Length	Patterns	Occurrences	Instructions	% of Total
2	224	48,659,466	97,318,932	60.7%
3	807	14,933,556	44,800,668	28.0%
4	1,101	2,457,743	9,830,972	6.1%
5	793	591,118	2,955,590	1.8%
6	593	275,495	1,652,970	1.0%
7+	1,081	442,126	3,694,749	2.3%

EXCHANGE Usage Distribution

The following table shows which EXCHANGE(n, m) pairs would be used and how often, combining both exact and partial replacements. 104 unique pairs out of 210 possible are utilized.

The distribution is heavily skewed toward n=1 (swapping stack[1] with a deeper element), which accounts for 91.0% of all replacements. EXCHANGE(1,2) alone covers 60.6%.

Top 20 pairs:

EXCHANGE(n,m)	Effect	Count	%
EXCHANGE(1,2)	stack[1] ↔ stack[2]	7,859,407	60.58%	`████████████████████████████████████████`
EXCHANGE(1,3)	stack[1] ↔ stack[3]	1,271,267	9.80%	`██████`
EXCHANGE(1,4)	stack[1] ↔ stack[4]	1,222,069	9.42%	`██████`
EXCHANGE(1,6)	stack[1] ↔ stack[6]	643,284	4.96%	`███`
EXCHANGE(1,5)	stack[1] ↔ stack[5]	408,369	3.15%	`██`
EXCHANGE(2,3)	stack[2] ↔ stack[3]	344,159	2.65%	`█`
EXCHANGE(1,7)	stack[1] ↔ stack[7]	118,183	0.91%	`▌`
EXCHANGE(3,6)	stack[3] ↔ stack[6]	113,528	0.88%	`▌`
EXCHANGE(2,5)	stack[2] ↔ stack[5]	110,596	0.85%	`▌`
EXCHANGE(1,9)	stack[1] ↔ stack[9]	105,314	0.81%	`▌`
EXCHANGE(1,8)	stack[1] ↔ stack[8]	104,035	0.80%	`▌`
EXCHANGE(1,10)	stack[1] ↔ stack[10]	73,893	0.57%
EXCHANGE(3,5)	stack[3] ↔ stack[5]	66,863	0.52%
EXCHANGE(1,11)	stack[1] ↔ stack[11]	66,763	0.51%
EXCHANGE(3,4)	stack[3] ↔ stack[4]	66,499	0.51%
EXCHANGE(1,13)	stack[1] ↔ stack[13]	66,419	0.51%
EXCHANGE(1,12)	stack[1] ↔ stack[12]	66,381	0.51%
EXCHANGE(5,6)	stack[5] ↔ stack[6]	39,275	0.30%
EXCHANGE(4,5)	stack[4] ↔ stack[5]	39,084	0.30%
EXCHANGE(2,4)	stack[2] ↔ stack[4]	35,085	0.27%

Heatmap of EXCHANGE usage across the full EIP-8024 encoding space (values are log10 of count, . = valid but unused):

n\m    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29
 1   6.9  6.1  6.1  5.6  5.8  5.1  5.0  5.0  4.9  4.8  4.8  4.8  3.2  2.4  0.7    .    .    .    .    .    .    .    .    .    .    .    .    .
 2        5.5  4.5  5.0  3.7  3.4  3.5  3.2  2.2  2.4  1.9  2.9  1.6  2.7  1.1    .    .    .    .    .    .    .    .    .    .    .    .
 3             4.8  4.8  5.1  3.8  3.8  3.5  2.7  2.5  2.1  3.0  1.3    .  1.0    .    .    .    .    .    .    .    .    .    .    .
 4                  4.6  4.2  3.9  3.3  3.2  0.8  3.0  2.3  0.3  2.5  1.1    .    .    .    .    .    .    .    .    .    .    .
 5                       4.6  3.7  4.1  3.5  3.3  1.8  0.0  3.0  2.7  0.0    .    .    .    .    .    .    .    .    .    .
 6                            3.8  3.5  2.6  2.0  2.6  2.1  1.6    .  1.6  0.0    .    .    .    .    .    .    .    .
 7                                 4.5  3.5  2.8  0.3  1.0  1.6  3.3  1.2    .    .    .    .    .    .    .    .
 8                                      3.4  3.2  2.2  0.5  2.6  1.7    .    .    .    .    .    .    .    .
 9                                           3.8  1.9  0.8    .  1.3    .    .    .    .    .    .    .
10                                                3.3  1.3  3.5  0.0  1.6  2.7    .    .    .    .
11                                                     2.7  1.7  1.6    .    .    .    .    .
12                                                          2.3  0.7  0.3    .    .    .
13                                                               2.5  1.4    .    .
14                                                                    2.7    .

All usage is concentrated in the left portion (m <= 16) since SWAP instructions only reach stack position 16. The top-left corner (low n, low m) is the hottest region. The first row (n=1) dominates, confirming that the most common use case is swapping stack[1] with deeper positions. The diagonal (m = n+1, adjacent swaps) also shows elevated usage. The right half of the encoding space (m > 16) is entirely unused — these positions are unreachable from SWAP chain patterns.

EXCHANGE Immediate Range Coverage

All 130 exact replacement patterns (mapping to 90 unique (n, m) pairs) fit within the EIP-8024 EXCHANGE constraints (1 <= n < m, n + m <= 30):

Maximum n used: 14
Maximum m used: 15
Maximum n + m: 29

Since SWAP only addresses positions 1–16 and EXCHANGE can reach up to n + m = 30, the EIP-8024 encoding covers 100% of cases arising from SWAP chain optimization.

Conclusions

16.2% of SWAP chain instructions (25.9M) can be eliminated by replacing SWAP_a SWAP_b SWAP_a patterns with a single EXCHANGE, saving 77.8M gas and 13.0M bytes across the dataset. This is a mechanical pattern substitution requiring no algorithmic changes.
The EIP-8024 EXCHANGE encoding is sufficient. All observed patterns fit within 1 <= n < m, n + m <= 30. No patterns exceed the encoding limits.
The dominant exact pattern is SWAP_x SWAP1 SWAP_x — swapping stack[1] with a deeper element. This accounts for the overwhelming majority of replaceable occurrences.
60.7% of chain instructions are in 2-element pairs that cannot be reduced by EXCHANGE. These are fundamentally rotations that move elements through stack[0]. A dedicated rotation instruction would be needed to optimize these.
Staircase patterns (descending/ascending) represent another significant class (~1.6M occurrences) that EXCHANGE alone cannot address. These perform full rotations of the top N stack elements. A ROTATE n instruction could replace an entire staircase with a single opcode.

frangio · March 11, 2026, 2:22pm

Thank you for this analysis.

I think this is true of stack layout-preserving bytecode optimization, but we’ll find that compilers are able to emit EXCHANGE where they would have emitted 2 SWAP instructions, when a value needs to be moved to a certain stack depth (for an instruction or procedure call).