Can we completely remove SELFDESTRUCT?

SELFDESTRUCT usage report—post-Cancun Mainnet, blocks 20M–25M

Published dataset (472,641 events): https://gist.github.com/chfast/7923cc97710e6e788b2fcd840c17ffb0

1. Data format

The dataset is CSV with one header line and one row per SD-SAMETX event. Rows are appended in collection order, which is roughly chronological by block. All hex fields are lowercase without a 0x prefix.

Header

block,tx,addr,ben,balance,origin,caller,initcode,codesize,codehash

Columns

Column Type Meaning
block decimal uint64 block number containing the SELFDESTRUCT
tx 32-byte hex transaction hash
addr 20-byte hex address of the contract being destroyed
ben 20-byte hex beneficiary (recipient of addr’s ETH balance)
balance decimal wei ETH balance at SD opcode time
origin 20-byte hex top-level signer EOA (evm.TxContext.Origin)
caller 20-byte hex immediate caller of the destroyed contract — typically the factory in the deploy-and-die-in-constructor pattern; equals origin for direct contract-creation transactions
initcode 0 or 1 1 ⇒ SD fired from inside the constructor (runtime code never persisted); 0 ⇒ SD fired after construction (runtime code is currently in state)
codesize decimal bytes (only when initcode=0; empty otherwise) size of the persisted runtime code
codehash 32-byte hex (only when initcode=0; empty otherwise) keccak256 of the persisted runtime code

Detection note for initcode: it’s set from IntraBlockState.GetCodeSize(self) at the SD opcode. During a constructor frame the runtime code has not yet been deposited (SetCode runs after evm.Run() returns), so codesize == 0 reliably marks the in-constructor case. A persistent contract with empty runtime code can’t reach SD (calls to empty-code addresses don’t dispatch any opcode), so the equivalence initcode == 1 ⇔ in-constructor-frame holds in practice.

Scope: SD-SAMETX only

This dataset contains only SD-SAMETX events (every SELFDESTRUCT of a same-tx-created contract post-Cancun, i.e., the path where EIP-6780 actually destroys the account). The Erigon instrumentation that produced this data also emits three other tags — SD-BURN, SD-BROKEN-BURN, FINAL-BURN — covering orthogonal phenomena (burn accounting, EIP-6780-blocked burns, deferred destruction). They are not in this CSV and not analyzed here, because they don’t bear on the “what would EIP-6780 removal cost in state bloat” question this report investigates.

2. Headline numbers (SD-SAMETX only)

Metric Value
Total SD-SAMETX events 472,641
Block range 20,000,000 → 24,999,999 (5.00 M blocks, ~695 days @ 12s)
Unique CREATE2 addresses (= wasted accounts if SD disabled) 413,508
Hard-fail events (events − unique addrs; CREATE2 collisions on redeploy) 59,133 (12.51%)
Wasted code bytes (Σ runtime codesize over unique addrs, initcode=0) 3,268,302 B (≈ 3.27 MB)
Unique runtime codehashes among initcode=0 contracts 305
Wasted code bytes deduplicated by codehash (one copy per unique hash) 909,207 B (≈ 909 KB)

How each metric is computed

The model is: in a no-SD world, the first deploy at each unique CREATE2 address succeeds and leaves a persistent account; every later deploy at the same address hard-fails (the CREATE2 collision check rejects deployment when the target address has nonce != 0 or non-empty codehash).

wasted_accounts        =  N_unique_addrs
                          (each unique CREATE2 address leaves exactly one
                           persistent account — class and initcode don't matter
                           for the count, only for what's *in* the account)

hard_fail_events       =  N_total_events − N_unique_addrs
                          (events at addrs that already exist in state would
                           hit the CREATE2 collision and revert)

wasted_code_bytes      =  Σ codesize  over unique addrs with initcode=0
                          (initcode=1 means runtime code never gets deposited
                           — the persistent account would carry empty code,
                           so it doesn't contribute to code bytes)

wasted_code_bytes_dedup = Σ codesize  over unique codehashes with initcode=0
                          (Erigon's state stores Code keyed by codehash, so
                           identical templates deployed at many addresses
                           share a single code blob in the trie)

The 3.27 MB → 909 KB compression ratio (≈ 3.6×) reflects that the same Deposit/Wrapper templates are deployed at many distinct addresses. State would actually only gain ≈ 909 KB of new code but ≈ 414k new account records.

3. Classes of usage

Each event is classified into one of six classes by the following decision tree (first match wins):

A_DirectScript            origin == caller
                          (top-level CREATE tx; no factory contract)

B_SharedFactory           caller has ≥ 2 distinct origins
                          (one factory contract used by multiple operators)

C_SybilBatch              origin uses ≥ 3 distinct factories AND events/tx ≥ 3
                          (single operator rotating across many factories,
                           batch-deploying many ephemerals per tx)

D_MultiTenantAggregator   (origin, caller) has ≥ 5 distinct bens
                          AND events/ben < 50
                          (one operator + one factory routing to many
                           customer wallets, light per-ben volume)

E_DepositForwarder        the event's addr appears ≥ 2 times in the log
                          (CEX-style: same CREATE2 address redeployed)

F_OneShot                 otherwise
                          (single-tenant deposit forwarder without address
                           reuse — fresh CREATE2 each time)

The cutoffs (f ≥ 3 && ept ≥ 3 for sybil, bens ≥ 5 && events/ben < 50 for multi-tenant) were tuned on a smaller sample so the multi-tenant detector wouldn’t trip on single-tenant operators rotating across a handful of hot wallets.

Per-class breakdown (combined log, 472,641 events)

Class Events %ev Uniq addrs Hard-fail %hf Addrs w/code Code bytes Origins Callers
A_DirectScript 34,384 7.3% 34,339 45 0.13% 172 180,069 2,098 2,098
B_SharedFactory 98,972 20.9% 97,120 1,852 1.87% 1,849 1,140,264 604 79
C_SybilBatch 37,262 7.9% 34,638 2,624 7.04% 264 45,634 28 190
D_MultiTenantAggregator 10,618 2.2% 3,764 6,854 64.55% 90 1,914 7 7
E_DepositForwarder 59,768 12.6% 12,010 47,758 79.91% 496 107,282 112 129
F_OneShot 231,637 49.0% 231,637 0 0.00% 1,048 1,793,139 504 3,251
Total 472,641 100.0% 413,508 59,133 12.51% 3,919 3,268,302

%hf = hard-fail rate within the class (hard_fail / events); the bold total is the dataset-wide rate.

Reading the table

  • F_OneShot dominates volume (49% of events) and accounts for 56% of unique wasted accounts. These are operators who already use fresh addresses every time — they don’t need address reuse, just want to avoid leaving persistent contracts. Pure state-bloat absorbers.
  • B_SharedFactory is unexpectedly large (21% of events, 1.14 MB of code bytes). 79 distinct shared-factory contracts collectively serve 604 distinct EOAs — significant SaaS-style usage of the pattern.
  • E_DepositForwarder concentrates the hard-fails (47,758 of 59,133 = 80.8% of all hard-fail events). 112 operators across 129 factories. This is the slice that genuinely depends on EIP-6780 same-tx redeploy semantics.
  • D_MultiTenantAggregator is tiny in event count (2.2%) but has the highest hard-fail-to-events ratio (65%) — every customer redeposits, so most events are at reused addrs. Only 7 distinct operators in this class.
  • C_SybilBatch at 7.9% fits the airdrop/mint-farming pattern: 28 operators each rotating across ~7 factories on average (190 callers ÷ 28 origins), batch-deploying ephemeral helpers.
  • A_DirectScript at 7.3% — over 2k distinct EOAs each running their own one-shot init-code-as-script pattern. No factory at all.

4. Frequency per 100k-block bucket

Computed by bucket = floor(block / 100_000) * 100_000; events per bucket are simply the count of SD-SAMETX lines whose block falls in [bucket, bucket + 100_000).

 20,000,000   5,392  #####
 20,100,000   4,255  ####
 20,200,000   4,272  ####
 20,300,000   6,087  ######
 20,400,000   5,129  #####
 20,500,000   4,803  ####
 20,600,000   5,013  #####
 20,700,000   5,308  #####
 20,800,000   4,613  ####
 20,900,000   4,849  ####
 21,000,000   6,090  ######
 21,100,000   6,539  ######
 21,200,000   4,414  ####
 21,300,000   3,761  ###
 21,400,000   3,412  ###
 21,500,000   4,120  ####
 21,600,000   4,185  ####
 21,700,000   5,195  #####
 21,800,000   5,335  #####
 21,900,000   4,714  ####
 22,000,000   5,651  #####
 22,100,000   6,143  ######
 22,200,000   6,928  ######
 22,300,000   6,735  ######
 22,400,000   5,649  #####
 22,500,000   6,599  ######
 22,600,000   7,057  #######
 22,700,000   9,515  #########
 22,800,000   9,049  #########
 22,900,000   8,573  ########
 23,000,000 100,530  ####################################################################################################
 23,100,000   7,647  #######
 23,200,000   7,522  #######
 23,300,000   7,291  #######
 23,400,000   8,049  ########
 23,500,000   8,847  ########
 23,600,000   8,389  ########
 23,700,000   8,121  ########
 23,800,000   7,787  #######
 23,900,000  11,747  ###########
 24,000,000   8,907  ########
 24,100,000   7,757  #######
 24,200,000   9,079  #########
 24,300,000  17,025  #################
 24,400,000  17,207  #################
 24,500,000  15,887  ###############
 24,600,000  13,394  #############
 24,700,000  13,835  #############
 24,800,000  10,026  ##########
 24,900,000  14,209  ##############

(1 char ≈ 1,000 events.)

Trend reading

  • Background regime (20.0 M – 22.6 M): flat at ≈ 4–7 k events / 100k blocks (~50–80 events/day). Stable post-Cancun adoption.
  • First step-up (≈ 22.7 M): ramps to ≈ 7–10 k.
  • Anomaly at 23.0 M: 100,530 events in a single 100k-block window — 13× the surrounding rate. Likely one operator / one batch tx dominating; flagged for separate investigation.
  • Plateau (23.1 M – 23.8 M): settles at ≈ 7–9 k.
  • Second step-up (≈ 24.3 M): doubles to ≈ 14–17 k. This regime persists through the end of the sample.

Net trajectory: usage roughly tripled between the start and end of the window (5 k → ~14 k per 100k blocks), with one exceptional spike at 23.0 M.

3 Likes