Do we need a bitmasking opcode? 23.3% of AAVE V4 Core bytecode is bitmasking

An astonishing amount of all deployed Ethereum bytecode is used for bitmasking. WETH9 is 32.9% bitmasking. The recently deployed AAVE V4 Core uses 23.3% of its bytecode for bitmasking. Working from the zellic dataset of all unique ethereum contract bytecode collected in 2025, one pattern for bit masking out a 20-byte addresses is responsible for 8.16% of all bytecode on ethereum. Adding next the next three most common bitmasking patterns bring the total up to 10.4% of all bytecode. Newer versions of the solidity compiler often interleave bitmasking in with other instructions, which means to actually track bit masking requires static analysis of each contract. However, I believe that the end total is probably somewhere between 12% to 20% of all unique contract bytecode.

Visualization of a portion of AAVE V4 core bytecode. Bit masking operations marked in red. 23.3% bitmasking:

Why is bitmasking such a huge part of EVM deployed bytecode? It’s because the stack is made up of 256 byte values, and when loading types that are smaller than that onto the stack, the high bits need to be cleared after loading. This applies to packed data from storage or memory, and to untrusted data from calldata. Contracts that do a lot of address use get hit hard (which is most contracts), as well as contracts that optimize their storage use (which is most contracts by prominent teams.)

Visualization of a portion of USDC bytecode. 14.7% bitmasking:

The two common approaches to bit masking are other sides of the tradeoff between of smart contract size vs runtime gas usage.

The most gas efficient approach is to just PUSH the bitmask, then AND. For a boolean, this fine, 3 bytes, 6 gas. The most common mask bytecode used, 0x73ffffffffffffffffffffffffffffffffffffffff16 masks an address, and is 22 bytes long, 6 gas. Sometimes when using packed data from storage, a full 32 bytes will be pushed, for 34 bytes, 6 gas.

The storage efficient approach is to compute the mask by shifting a bit and then doing a subtraction by 1 to turn all the lower bits on. This results in any number of right aligned bits for a fixed cost. This pattern looks like PUSH1 1, PUSH1 1, PUSH1 160, SHL, SUB, AND. This makes an address (or any other right aligned mask) be 9 bytes long and take 18 gas.

As said before, Solidity may spread these opcodes out and interleave them with other operations.

Visualization of a portion of an early deployed Morpho Blue contract, using the high gas, low space bitmasking, and not much storage packing. 6.3% masking bytes:

If we wanted to improve this, what could we do?

  1. The absolute silliest thing would be to add a MASKADDRESS opcode that the keeps the rightmost 20-bytes of the stack. This would create the biggest savings, since address masking is both the most common masking, and one of the largest in terms of bytes, and would turn it into 1 byte, 3 gas.
  2. We could add MASKBITS opcode that takes immediate values, in EIP-8024 style. This would turn address masking into 2 bytes, 3 gas, but also work for other sizes of right aligned bytes (for example, 128 bit numbers). Immediate values certainly increase the effort of working with bytecode, but if EIP-8024 was already in place, this would share code.
  3. We could add a MASKBITS opcode that takes a value from the stack. The rightmost byte of this would be number of bytes to keep, and the second byte would be an optional shift left of the mask. So to mask an address would be PUSH1 160, MASKBITS, 3 bytes, 6 gas. To mask a left aligned uint128, would be PUSH2 128 160, MASKBITS, 4 bytes, 6 gas. This has the advantage of being general and a simple opcode to implement and work with.
  4. Without making EVM changes, the current compact pattern used by the solidity compiler could be improved, from 9 bytes, 18 gas to 4 bytes, 9 gas by using MLOAD to select a slice of prestaged memory. The standard solidity preamble would move the free space pointer further on, then add a ZERO, NOT, PUSH1, MSTORE to place an all on set of 256 bits after the reserved zero memory slot, and followed by another 256 bits of zero memory. This would allow selecting any byte aligned, left or right aligned mask just by loading the correct range of memory with PUSH1, SLOAD, AND.

Visualization of the entirety of WETH9. The solidity compiler decided to double many of the masks. 32.9% bitmasking:

Your thoughts on these? There’s room here for around a 9% improvement in effective contract sizes, as well as much cleaner bytecode. Or should we leave things the way they are?

5 Likes

Solidity sucks and is the problem. It masks way too often.

For example, if you have an immutable address, solidity will write the address into the code as 32 bytes during init and then mask it during runtime. Instead, it should write only 20 bytes.

There are other things like this but I don’t remember them all. I use assembly when I want small contracts.

If you’re hitting codesize limits, you can workaround with ERC-8167.

3 Likes

Exactly the recent topic @ X

PUSH0 NOT PUSH1 96 SHR AND is cheaper :slight_smile:

Anyway Solidity masks too much too often. It is almost better to treat everything as bytes32 and convert at the very last moment.

Not sure whether Vyper has the same issue.

Yet masking is important for the security, appreciate MASKADDRESS since another often masked case bool is just PUSH1 1 AND, or might be just comparing to 0.

2 Likes

Another option is a CODELOAD opcode that can load a word from code onto the stack. The DATALOAD opcode was one of the great things about EOF and it did this, though from a data section in the code.

DATALOAD had the same cost as PUSH so it would only be an improvement in code size. And that’s only so long as code chunking isn’t introduced. This makes me think there’s still a good argument for a masking opcode.

3 Likes

I think this solution is the right one. Adding a specific op code for masks is very specific for a local optimisation and end up being another opcode to solve a self inflicted EVM problem. CODELOAD/ DATALOAD is the way to go tbh.

IMO the motivation is stronger than PUSH0 that optimises PUSH1 0.

Looking at EIP-7480: EOF - Data section access instructions - how would that work?

E.g. like

PUSH1 0x20 // assuming 0x7fff...ff is compiled there
CODELOAD
AND

or with immediate CODELOAD 0x20 ?

1 Like

Ah, right, EOF DATALOAD had an immediate and passing the argument to CODELOAD on the stack makes it much less attractive. It can use an immediate using one of these techniques: EVM Immediates

1 Like

OK, comparing the approaches:

DATALOAD like

  • Has one immediate argument,offset, encoded as a 16-bit unsigned big-endian value.

CODELOAD 0x0020 AND → 4 bytes (enabling cca 2k constants); 6 gas to mask address; generic for other masking

vs

MASKADDRESS → 1 byte; 3 gas to mask address; specific only like PUSH0; might become outdated when/if address size increases (which would have much larger impact on the whole ecosystem anyway).

Or you can make DUP/DUPN cheaper than PUSH so the compiler is incentivized to reuse the bitmask from the stack instead of pushing new constant each time.

2 Likes