EIP-8024: Backward compatible SWAPN, DUPN, EXCHANGE

Discussion topic for EIP-8024.

This is a fork of EIP-663 without a dependency on EOF.

Update Log

  • 2025-09-06: initial proposal to update EIP-663
  • 2025-09-15: decision to fork as new EIP
  • 2025-11-04: updated encode_pair, decode_pair to fix inconsistency between spec and examples
1 Like

Discussion of alternatives:

I want to voice opposition against including this EIP in Glamsterdam for the same reasons that I opposed EIP-663 inclusion with EOF (cf. the Compiler Complexity Reduction section in EOF: When Complexity Outweighs Necessity - HackMD).

First of all, I want to point out that Vyper does not have the stack-too-deep problem. We have two pipelines, legacy and venom, with different tradeoffs (overuse of memory vs sometimes inefficient stack spilling), but we are making rapid progress on them. I believe Solidity is also making progress on this as well. Therefore, adding more opcodes to the VM is solving an application level problem at the VM level, needlessly increasing complexity, as enumerated below.

The solution presented by this EIP has several problems, as mentioned in the hackmd article, but I’ll enumerate them again below:

  • it hurts EVM-to-native compilation (either JIT or AOT, but both have to execute within a defined performance envelope) due to the register allocation problem. Linear-time register allocation also exists, but it is suboptimal and increases memory pressure. Memory pressure should be paid for explicitly(!) by actually using EVM memory!
  • it burns valuable L1 cache space – the β€œhot” portion of the stack grows from 512 bytes to 235 * 32 = 7.5KB, which is a substantial portion of the L1 budget (most CPUs have between 32kb and 64kb of L1 cache). allowing user-access to more stack space increases cache thrashing / stack space that is likely to be kept in cache
  • the backwards-compatibility encoding to avoid JUMPDEST analysis is a complexity that clients will pay forever

It also burns opcodes needlessly, while solving the wrong problem – the reason the Solidity compiler needs deep stack access is because they can’t predict where their dynamically allocated memory will end.

EIP-7923 proposes a more general solution which solves multiple problems at once:

  • heap vs call stack separation, allowing better EVM compiler design
  • EVM languages can allocate reserved areas of memory for stack spilling which they can guarantee do not get trampled by other allocators
  • developers get a modern memory model, not one from the 80s
  • it is a general fix that improves the design space for compilers and low-level developers without adding EVM complexity besides a pricing change
3 Likes

I second Charles’ analysis, fixing the memory model would be a much more elegant solution that avoids the downsides.

1 Like

I agree that β€œstack too deep” can be solved without these opcodes! However, solc hasn’t been able to solve it yet, and we can’t wait forever. This error is not acceptable in what should by now be a mature tech stack, and with Solidity being the most popular language for the EVM, I don’t think it’s unreasonable for Ethereum to take this measure at the VM level.

I think the listed downsides are only minor issues or can be mitigated:

  • Cache: This is a fair point that needs to be measured and reflected in the gas schedule. For example, if SWAPN/DUPN are likely to read from L2 cache instead of L1 cache, the cost might be 10x that of SWAP/DUP (which would be 10 gas if repriced according to EIP-7904).
  • Decoding complexity: We’re talking about 25 lines of self-contained code consisting of simple arithmetic and conditionals. However, if this is deemed too complex, the EIP can very easily be switched to use push-postfix immediates, which require no decoding.
  • Compilation to native: I think this requires a more careful statement and discussion. I’m personally not convinced that the addition of SWAPN/DUPN makes compilation any more difficult than it already is. If I understand correctly, the argument is that these opcodes can potentially increase the number of simultaneously live values, but IMO they don’t: if a sequence of instructions grows the stack to height N, we can assume there are N live values, whether they are accessed via SWAPN/DUPN or only accessed when they’re in the top 16 stack slots. A compiler has to figure out what to do with the other N-16 values, with or without SWAPN/DUPN. Most likely, they will be spilled to memory, and that’s where SWAPN/DUPN should read them from. This would also support a higher gas cost, but as far as I know compilation to native is not a consideration that goes into choosing the gas schedule.
  • Opcode use: I believe there are currently 106 free opcodes, so definitely not a shortage. If this is a real concern, the same impact can be achieved by just including DUPN.

I’m supportive of changing the memory model. I agree that it’d be a good improvement with broader impact than this EIP. It wouldn’t be as immediate a fix for β€œstack too deep” as this EIP though.

2 Likes

Maybe try a different language or compiler, which does solve it!

I totally agree

Thanks to the EF pushing and providing the bulk of its compiler/languages budget to it all these years. For example, ethereum.org makes no mention of other languages besides Solidity. Why don’t they start promoting languages which don’t have stack too deep errors?

I disagree. It increases complexity of the instruction set and VM, and it’s unnecessary given that a simpler solution has presented which not only increases reasonability of the VM’s memory model, but also improves the design space for compilers and users in a more general way.

Which is more expensive than memory at current prices, at which point applications should just use memory.

Doesn’t this increase decoding time? OTOH, using push-postfix immediates would increase the instruction size to 3 bytes per SWAPN or DUPN, which is think is undesirable from a codesize perspective.

With only 16 addressable stack slots, compilation to native on an architecture is kind of super straightforward – you map stack slots to registers (on x86 you can even map to zmm0-15), when they go out of scope (depth 17 or deeper) you write them to memory, when they go back into scope you recover them from memory. SWAP instructions even map to register renames, which is extremely efficient on modern CPUs. No register allocation is needed.

OTOH if you have 235 addressable values, you need to run a register allocation routine, which finds the live ranges of each stack slot, spills them to memory when there are not enough registers, and then pops them back when there is enough β€œregister space” again. As I’ve mentioned before, the best known algorithms are polynomial time; linear-time algorithms also exist but the quality of the code will be poorer. I highly recommend anybody interested in this problem and who is wondering about the effect of EIP-8024 (and 663) on EVM-to-native read the wikipedia article on register allocation: Register allocation - Wikipedia.

As I understand, you are saying that the 17th+ stack items should be stored in memory as in the first scheme I have described above (where when they go out of scope they get written to memory). That’s fine, but there is already an abstraction modeling memory access in the EVM, which is memory, users should just use that. (And yes, the memory pricing model should be fixed! As I described in my previous post and also below.)

I don’t think this is actually even true. Let me explain: with EIP-8024, the β€œfix” for stack too deep would be to emit deeper dup/swap instructions, rather than spilling.

However, EIP-8024 also only pushes the problem out. You need to fall back to spilling in the case that the user has more than 235 variables on the stack (which can easily happen if functions are inlined or more kinds of variables are lifted to the stack). So you need to make sure you have a correct spilling algorithm anyways!

With EIP-7923, the solution instead would be to spill at a high address. This solution is actually even easier to implement in the compiler (at least in Vyper) - calculating the safe spill point is trivial with EIP-7923, it’s just a high number. (As mentioned we still have to implement spilling anyways, so it’s not like EIP-8024 lets us skip that).

2 Likes

The immediate bytes decoding with the [0x5b–0x7f] forbidden range (current spec) can by significantly improved by using different normalization mapping: consider the byte to be a signed 8-bit integer.

Proof of Concept: Compiler Explorer

I can send the spec update. However, it is unclear to me what is the status of Update EIP-8024: Switch ecoding to push-postfix by frangio Β· Pull Request #11094 Β· ethereum/EIPs Β· GitHub.

Good idea! Please submit a PR and we can bring it up on ACDE this Thursday. I’d like to merge it. I don’t think the signed arithmetic is doing any work here other than simplifying the domain guard (although the assembly for that part is the same), I think we should just express this as modular 8 bit unsigned arithmetic.

Have you looked into ways to simplify the EXCHANGE immediate decoding?

Re: push-postfix, I’ve become conviced that it’s not a good idea. No official decision has been made to reject it but no one objected to my comment in the PR saying so. We could bring this up at the same time.

I haven’t considered EXCHANGE yet, because I don’t understand what is the desired range of arguments (n, m). It looks like the current spec chooses something dictated by the decoding capabilities.

If you ignore decoding, what is your ideal split for the values n, m considering the ranges:

  • [0–218] for the single-byte immediate,
  • [0–48179] for the two-byte immediate?

But the new approach separates the concerns. The decode_single_v2() function from the PoC gives you a value in the range [0–218]. How you later use it in any specific instruction is independent. E.g. for DUPN you offset it by +17.

For EXCHANGE this is no different. You have to decide how to split the value into two. By reusing divmod(_, 16) approach:

q, r = divmod(x, 16)  # [0–13], [0–15]
n = r + 1             # [1–16]
m = n + q + 1         # [2–30]
swap(stack[top-n], stack[top-m])

See EIP-8024 Rationale: EXCHANGE immediate

The range of the current formula has this shape:

(The axes should be n, m, sorry.)

Using addition like your example results in this shape:

This means elements are swappable if they’re close enough to each other, while we prefer being able to move deeper elements closer to the top of the stack.

But we could swap out k = x if x <= 79 else x - 48 for k = (x + 0x80) & 0xff and preserve the rest of the formula, which arrives at the same range:

Colors indicate what half of the domain a pair comes from.

Update proposed here.

One thing I realized is that the operation that maps the valid immediate domain to a contiguous one is simply XOR with 0x80.

GCC/LLVM are aware of all of these equivalences and they produce addition or XOR depending on the target.

I was playing with EXCHANGE uint8 β†’ (int, int) mapping a bit. One idea is to modify one line toreturn r + 1, 29 - q.

============================================================
 Original: b = 29 - q, x ∈ [0,79] βˆͺ [128,255]
 208 unique pairs from 208 inputs
============================================================

29 |  80
28 |  90  91
27 |  A0  A1  A2
26 |  B0  B1  B2  B3
25 |  C0  C1  C2  C3  C4
24 |  D0  D1  D2  D3  D4  D5
23 |  E0  E1  E2  E3  E4  E5  E6
22 |  F0  F1  F2  F3  F4  F5  F6  F7
21 |  00  01  02  03  04  05  06  07  08
20 |  10  11  12  13  14  15  16  17  18  19
19 |  20  21  22  23  24  25  26  27  28  29  2A
18 |  30  31  32  33  34  35  36  37  38  39  3A  3B
17 |  40  41  42  43  44  45  46  47  48  49  4A  4B  4C
16 |  8F  9F  AF  BF  CF  DF  EF  FF  0F  1F  2F  3F  4F
15 |  8E  9E  AE  BE  CE  DE  EE  FE  0E  1E  2E  3E  4E
14 |  8D  9D  AD  BD  CD  DD  ED  FD  0D  1D  2D  3D  4D
13 |  8C  9C  AC  BC  CC  DC  EC  FC  0C  1C  2C  3C
12 |  8B  9B  AB  BB  CB  DB  EB  FB  0B  1B  2B
11 |  8A  9A  AA  BA  CA  DA  EA  FA  0A  1A
10 |  89  99  A9  B9  C9  D9  E9  F9  09
 9 |  88  98  A8  B8  C8  D8  E8  F8
 8 |  87  97  A7  B7  C7  D7  E7
 7 |  86  96  A6  B6  C6  D6
 6 |  85  95  A5  B5  C5
 5 |  84  94  A4  B4
 4 |  83  93  A3
 3 |  82  92
 2 |  81
 1 |
   +──────────────────────────────────────────────────────→ a
       1   2   3   4   5   6   7   8   9  10  11  12  13


============================================================
 Modified: b = 30 - q, x ∈ [0,90] βˆͺ [128,255]
 219 unique pairs from 219 inputs
============================================================

30 |  80
29 |  90  91
28 |  A0  A1  A2
27 |  B0  B1  B2  B3
26 |  C0  C1  C2  C3  C4
25 |  D0  D1  D2  D3  D4  D5
24 |  E0  E1  E2  E3  E4  E5  E6
23 |  F0  F1  F2  F3  F4  F5  F6  F7
22 |  00  01  02  03  04  05  06  07  08
21 |  10  11  12  13  14  15  16  17  18  19
20 |  20  21  22  23  24  25  26  27  28  29  2A
19 |  30  31  32  33  34  35  36  37  38  39  3A  3B
18 |  40  41  42  43  44  45  46  47  48  49  4A  4B  4C
17 |  50  51  52  53  54  55  56  57  58  59  5A
16 |  8F  9F  AF  BF  CF  DF  EF  FF  0F  1F  2F  3F  4F
15 |  8E  9E  AE  BE  CE  DE  EE  FE  0E  1E  2E  3E  4E
14 |  8D  9D  AD  BD  CD  DD  ED  FD  0D  1D  2D  3D  4D
13 |  8C  9C  AC  BC  CC  DC  EC  FC  0C  1C  2C  3C
12 |  8B  9B  AB  BB  CB  DB  EB  FB  0B  1B  2B
11 |  8A  9A  AA  BA  CA  DA  EA  FA  0A  1A
10 |  89  99  A9  B9  C9  D9  E9  F9  09
 9 |  88  98  A8  B8  C8  D8  E8  F8
 8 |  87  97  A7  B7  C7  D7  E7
 7 |  86  96  A6  B6  C6  D6
 6 |  85  95  A5  B5  C5
 5 |  84  94  A4  B4
 4 |  83  93  A3
 3 |  82  92
 2 |  81
 1 |
   +──────────────────────────────────────────────────────→ a
       1   2   3   4   5   6   7   8   9  10  11  12  13

This is what I meant by the discontinuity here. I don’t think this would be good for compilers to deal with.

Check out the new one I’ve proposed in that PR, I was able to add two swappable pairs.

============================================================
 XOR: k = x ^ 143, b = 29 - q, x ∈ [0,81] βˆͺ [128,255]
 210 unique pairs from 210 inputs
============================================================

29 |  8F
28 |  9F  9E
27 |  AF  AE  AD
26 |  BF  BE  BD  BC
25 |  CF  CE  CD  CC  CB
24 |  DF  DE  DD  DC  DB  DA
23 |  EF  EE  ED  EC  EB  EA  E9
22 |  FF  FE  FD  FC  FB  FA  F9  F8
21 |  0F  0E  0D  0C  0B  0A  09  08  07
20 |  1F  1E  1D  1C  1B  1A  19  18  17  16
19 |  2F  2E  2D  2C  2B  2A  29  28  27  26  25
18 |  3F  3E  3D  3C  3B  3A  39  38  37  36  35  34
17 |  4F  4E  4D  4C  4B  4A  49  48  47  46  45  44  43
16 |  80  90  A0  B0  C0  D0  E0  F0  00  10  20  30  40  50
15 |  81  91  A1  B1  C1  D1  E1  F1  01  11  21  31  41  51
14 |  82  92  A2  B2  C2  D2  E2  F2  02  12  22  32  42
13 |  83  93  A3  B3  C3  D3  E3  F3  03  13  23  33
12 |  84  94  A4  B4  C4  D4  E4  F4  04  14  24
11 |  85  95  A5  B5  C5  D5  E5  F5  05  15
10 |  86  96  A6  B6  C6  D6  E6  F6  06
 9 |  87  97  A7  B7  C7  D7  E7  F7
 8 |  88  98  A8  B8  C8  D8  E8
 7 |  89  99  A9  B9  C9  D9
 6 |  8A  9A  AA  BA  CA
 5 |  8B  9B  AB  BB
 4 |  8C  9C  AC
 3 |  8D  9D
 2 |  8E
 1 |
   +──────────────────────────────────────────────────────────→ a
       1   2   3   4   5   6   7   8   9  10  11  12  13  14
1 Like

SWAP Chain Analysis: Opportunities for the EXCHANGE Instruction

Methodology

We instrumented Zilkworm to track consecutive sequences of SWAP instructions during execution of Ethereum mainnet blocks. A β€œSWAP chain” is a run of two or more SWAP instructions with no other opcodes in between.

Dataset: ~2,720 mainnet blocks (block range 24398374–24621109), representing typical recent Ethereum transaction workloads.

Collection: Runtime instrumentation β€” we count how many times each unique consecutive SWAP pattern is executed, not just how many exist in bytecode. This reflects actual gas expenditure.

Key Results

Each replacement of 3 SWAPs (3 bytes, 9 gas) with 1 EXCHANGE (2 bytes, 3 gas) eliminates 2 instructions, saves 1 byte and 6 gas.

Metric Value
Total SWAP chain occurrences 67,359,504
Total SWAP instructions in chains 160,252,881
Unique chain patterns 4,599
Gas spent in chains (at 3 gas/SWAP) 480,758,643
Instructions eliminated by EXCHANGE 25,947,384 (16.2%)
Bytes saved 12,973,692
Gas saved 77,842,152 (16.2%)
Per-block average instructions eliminated ~9,539
Per-block average bytes saved ~4,769
Per-block average gas saved ~28,618

EXCHANGE Instruction Background (EIP-8024)

The EXCHANGE instruction (0xe8) is proposed in EIP-8024 for legacy EVM (not EOF-only). It swaps two stack items that are not the top of stack:

  • Encoding: 0xe8 imm where imm is a single byte
  • The immediate is decoded via decode_pair(x) yielding (n, m): k = x ^ 143; q, r = divmod(k, 16); if q < r: (n, m) = (q+1, r+1), else (n, m) = (r+1, 29-q)
  • Constraints: 1 <= n < m, n + m <= 30
  • Effect: swap the (n+1)th stack item with the (m+1)th stack item (i.e., stack[n] with stack[m], 0-indexed from top)
  • Valid n range: 1–14, valid m range: 2–29
  • Gas cost: 3 (same as SWAP)
  • Certain immediate values (0x5b, 0x60–0x7f) are disallowed to preserve JUMPDEST and PUSH semantics in legacy bytecode

Unlike SWAP (which always involves stack[0]), EXCHANGE operates on deeper stack positions. This is exactly what many SWAP chains achieve: swapping two non-top elements by temporarily routing through stack[0].

Pattern Categories

1. Exact EXCHANGE Replacements (3 SWAPs β†’ 1 EXCHANGE)

The pattern SWAP_a SWAP_b SWAP_a (where a != b) has the net effect of swapping stack[min(a,b)] with stack[max(a,b)] while leaving stack[0] unchanged. This is exactly one EXCHANGE instruction. Each replacement eliminates 2 instructions, saves 1 byte and 6 gas.

130 unique patterns, 12,046,816 occurrences. 24,093,632 instructions eliminated, 12,046,816 bytes saved, 72,280,896 gas saved.

Top patterns:

Count Pattern Replacement Effect
4,100,981 SWAP2 SWAP1 SWAP2 EXCHANGE(1,1) stack[1] ↔ stack[2]
3,553,495 SWAP1 SWAP2 SWAP1 EXCHANGE(1,1) stack[1] ↔ stack[2]
1,186,570 SWAP4 SWAP1 SWAP4 EXCHANGE(1,3) stack[1] ↔ stack[4]
1,169,171 SWAP3 SWAP1 SWAP3 EXCHANGE(1,2) stack[1] ↔ stack[3]
619,504 SWAP6 SWAP1 SWAP6 EXCHANGE(1,5) stack[1] ↔ stack[6]
382,698 SWAP5 SWAP1 SWAP5 EXCHANGE(1,4) stack[1] ↔ stack[5]
147,086 SWAP3 SWAP2 SWAP3 EXCHANGE(2,1) stack[2] ↔ stack[3]
112,397 SWAP2 SWAP3 SWAP2 EXCHANGE(2,1) stack[2] ↔ stack[3]
101,031 SWAP9 SWAP1 SWAP9 EXCHANGE(1,8) stack[1] ↔ stack[9]
95,989 SWAP7 SWAP1 SWAP7 EXCHANGE(1,6) stack[1] ↔ stack[7]
85,593 SWAP8 SWAP1 SWAP8 EXCHANGE(1,7) stack[1] ↔ stack[8]
63,039 SWAP10 SWAP1 SWAP10 EXCHANGE(1,9) stack[1] ↔ stack[10]
62,252 SWAP13 SWAP1 SWAP13 EXCHANGE(1,12) stack[1] ↔ stack[13]
61,830 SWAP12 SWAP1 SWAP12 EXCHANGE(1,11) stack[1] ↔ stack[12]
60,708 SWAP11 SWAP1 SWAP11 EXCHANGE(1,10) stack[1] ↔ stack[11]

The dominant sub-pattern is SWAP_x SWAP1 SWAP_x β€” swapping stack[1] with a deeper element. This accounts for the vast majority of exact replacements.

2. Partial EXCHANGE Replacements (within longer chains)

Longer chains often contain SWAP_a SWAP_b SWAP_a sub-patterns that can be extracted and replaced, reducing the chain length.

1,047 unique patterns, 926,876 sub-pattern replacements. 1,853,752 instructions eliminated, 926,876 bytes saved, 5,561,256 gas saved.

Top examples:

Count Original After EXCHANGE
101,181 SWAP3 SWAP6 SWAP3 SWAP5 [4 ops] EXCHANGE(3,3) SWAP5 [2 ops]
77,153 SWAP3 SWAP1 SWAP2 SWAP1 [4 ops] SWAP3 EXCHANGE(1,1) [2 ops]
52,174 SWAP1 SWAP3 SWAP2 SWAP3 [4 ops] SWAP1 EXCHANGE(2,1) [2 ops]
44,443 SWAP1 SWAP3 SWAP1 SWAP2 [4 ops] EXCHANGE(1,2) SWAP2 [2 ops]
39,550 SWAP3 SWAP2 SWAP1 SWAP2 [4 ops] SWAP3 EXCHANGE(1,1) [2 ops]

3. Staircase Patterns (not directly replaceable)

Descending staircases (SWAPn SWAPn-1 ... SWAP1) perform a left rotation, bringing a deep element to the top. Ascending staircases (SWAP1 SWAP2 ... SWAPn) push the top element down to depth n.

These cannot be replaced by a single EXCHANGE since they fundamentally change stack[0]. However, they could benefit from a hypothetical ROTATE n instruction.

Descending staircases: 59 patterns, 1,424,636 occurrences

Count Pattern Effect
528,154 SWAP3 SWAP2 SWAP1 Rotate top 4 left
478,047 SWAP4 SWAP3 SWAP2 SWAP1 Rotate top 5 left
175,318 SWAP4 SWAP3 SWAP2 Rotate top 3 of [1..4] left
61,955 SWAP6 SWAP5 SWAP4 SWAP3 SWAP2 SWAP1 Rotate top 7 left
59,247 SWAP5 SWAP4 SWAP3 SWAP2 SWAP1 Rotate top 6 left

Ascending staircases: ~186,870 occurrences

Count Pattern
78,674 SWAP1 SWAP2 SWAP3 SWAP4 SWAP5
56,268 SWAP1 SWAP2 SWAP3
30,609 SWAP1 SWAP2 SWAP3 SWAP4
14,192 SWAP1 SWAP2 SWAP3 SWAP4 SWAP5 SWAP6

4. Two-Element Pairs (dominant by volume)

Pairs account for 60.7% of all SWAP chain instructions (97.3M instructions). These cannot be reduced by EXCHANGE since they are only 2 instructions and EXCHANGE requires 3 SWAPs to justify a replacement.

The most common pairs are adjacent descending pairs forming rotations:

Count Pattern Semantic
14,926,659 SWAP2 SWAP1 Rotate top 3 left
13,086,432 SWAP3 SWAP2 Rotate top 4 left (partial)
5,359,213 SWAP4 SWAP3 Rotate top 5 left (partial)
4,810,215 SWAP1 SWAP2 Rotate top 3 right
2,129,201 SWAP1 SWAP3 Move stack[3] toward top

Chain Length Distribution

Length Patterns Occurrences Instructions % of Total
2 224 48,659,466 97,318,932 60.7%
3 807 14,933,556 44,800,668 28.0%
4 1,101 2,457,743 9,830,972 6.1%
5 793 591,118 2,955,590 1.8%
6 593 275,495 1,652,970 1.0%
7+ 1,081 442,126 3,694,749 2.3%

EXCHANGE Usage Distribution

The following table shows which EXCHANGE(n, m) pairs would be used and how often, combining both exact and partial replacements. 104 unique pairs out of 210 possible are utilized.

The distribution is heavily skewed toward n=1 (swapping stack[1] with a deeper element), which accounts for 91.0% of all replacements. EXCHANGE(1,2) alone covers 60.6%.

Top 20 pairs:

EXCHANGE(n,m) Effect Count %
EXCHANGE(1,2) stack[1] ↔ stack[2] 7,859,407 60.58% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
EXCHANGE(1,3) stack[1] ↔ stack[3] 1,271,267 9.80% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
EXCHANGE(1,4) stack[1] ↔ stack[4] 1,222,069 9.42% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
EXCHANGE(1,6) stack[1] ↔ stack[6] 643,284 4.96% β–ˆβ–ˆβ–ˆ
EXCHANGE(1,5) stack[1] ↔ stack[5] 408,369 3.15% β–ˆβ–ˆ
EXCHANGE(2,3) stack[2] ↔ stack[3] 344,159 2.65% β–ˆ
EXCHANGE(1,7) stack[1] ↔ stack[7] 118,183 0.91% β–Œ
EXCHANGE(3,6) stack[3] ↔ stack[6] 113,528 0.88% β–Œ
EXCHANGE(2,5) stack[2] ↔ stack[5] 110,596 0.85% β–Œ
EXCHANGE(1,9) stack[1] ↔ stack[9] 105,314 0.81% β–Œ
EXCHANGE(1,8) stack[1] ↔ stack[8] 104,035 0.80% β–Œ
EXCHANGE(1,10) stack[1] ↔ stack[10] 73,893 0.57%
EXCHANGE(3,5) stack[3] ↔ stack[5] 66,863 0.52%
EXCHANGE(1,11) stack[1] ↔ stack[11] 66,763 0.51%
EXCHANGE(3,4) stack[3] ↔ stack[4] 66,499 0.51%
EXCHANGE(1,13) stack[1] ↔ stack[13] 66,419 0.51%
EXCHANGE(1,12) stack[1] ↔ stack[12] 66,381 0.51%
EXCHANGE(5,6) stack[5] ↔ stack[6] 39,275 0.30%
EXCHANGE(4,5) stack[4] ↔ stack[5] 39,084 0.30%
EXCHANGE(2,4) stack[2] ↔ stack[4] 35,085 0.27%

Heatmap of EXCHANGE usage across the full EIP-8024 encoding space (values are log10 of count, . = valid but unused):

n\m    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29
 1   6.9  6.1  6.1  5.6  5.8  5.1  5.0  5.0  4.9  4.8  4.8  4.8  3.2  2.4  0.7    .    .    .    .    .    .    .    .    .    .    .    .    .
 2        5.5  4.5  5.0  3.7  3.4  3.5  3.2  2.2  2.4  1.9  2.9  1.6  2.7  1.1    .    .    .    .    .    .    .    .    .    .    .    .
 3             4.8  4.8  5.1  3.8  3.8  3.5  2.7  2.5  2.1  3.0  1.3    .  1.0    .    .    .    .    .    .    .    .    .    .    .
 4                  4.6  4.2  3.9  3.3  3.2  0.8  3.0  2.3  0.3  2.5  1.1    .    .    .    .    .    .    .    .    .    .    .
 5                       4.6  3.7  4.1  3.5  3.3  1.8  0.0  3.0  2.7  0.0    .    .    .    .    .    .    .    .    .    .
 6                            3.8  3.5  2.6  2.0  2.6  2.1  1.6    .  1.6  0.0    .    .    .    .    .    .    .    .
 7                                 4.5  3.5  2.8  0.3  1.0  1.6  3.3  1.2    .    .    .    .    .    .    .    .
 8                                      3.4  3.2  2.2  0.5  2.6  1.7    .    .    .    .    .    .    .    .
 9                                           3.8  1.9  0.8    .  1.3    .    .    .    .    .    .    .
10                                                3.3  1.3  3.5  0.0  1.6  2.7    .    .    .    .
11                                                     2.7  1.7  1.6    .    .    .    .    .
12                                                          2.3  0.7  0.3    .    .    .
13                                                               2.5  1.4    .    .
14                                                                    2.7    .

All usage is concentrated in the left portion (m <= 16) since SWAP instructions only reach stack position 16. The top-left corner (low n, low m) is the hottest region. The first row (n=1) dominates, confirming that the most common use case is swapping stack[1] with deeper positions. The diagonal (m = n+1, adjacent swaps) also shows elevated usage. The right half of the encoding space (m > 16) is entirely unused β€” these positions are unreachable from SWAP chain patterns.

EXCHANGE Immediate Range Coverage

All 130 exact replacement patterns (mapping to 90 unique (n, m) pairs) fit within the EIP-8024 EXCHANGE constraints (1 <= n < m, n + m <= 30):

  • Maximum n used: 14
  • Maximum m used: 15
  • Maximum n + m: 29

Since SWAP only addresses positions 1–16 and EXCHANGE can reach up to n + m = 30, the EIP-8024 encoding covers 100% of cases arising from SWAP chain optimization.

Conclusions

  1. 16.2% of SWAP chain instructions (25.9M) can be eliminated by replacing SWAP_a SWAP_b SWAP_a patterns with a single EXCHANGE, saving 77.8M gas and 13.0M bytes across the dataset. This is a mechanical pattern substitution requiring no algorithmic changes.

  2. The EIP-8024 EXCHANGE encoding is sufficient. All observed patterns fit within 1 <= n < m, n + m <= 30. No patterns exceed the encoding limits.

  3. The dominant exact pattern is SWAP_x SWAP1 SWAP_x β€” swapping stack[1] with a deeper element. This accounts for the overwhelming majority of replaceable occurrences.

  4. 60.7% of chain instructions are in 2-element pairs that cannot be reduced by EXCHANGE. These are fundamentally rotations that move elements through stack[0]. A dedicated rotation instruction would be needed to optimize these.

  5. Staircase patterns (descending/ascending) represent another significant class (~1.6M occurrences) that EXCHANGE alone cannot address. These perform full rotations of the top N stack elements. A ROTATE n instruction could replace an entire staircase with a single opcode.

2 Likes

Thank you for this analysis.

I think this is true of stack layout-preserving bytecode optimization, but we’ll find that compilers are able to emit EXCHANGE where they would have emitted 2 SWAP instructions, when a value needs to be moved to a certain stack depth (for an instruction or procedure call).