Create a new opcode for counting leading zeros (CLZ)

Vectorized · September 11, 2022, 2:50am

Count leading zeros (CLZ) is a native instruction in many processor architectures
(even in RISC architectures like ARM).

It is a basic building block used in many important fixed point and binary operations:

lnWad
powWad (due to lnWad)
sqrt
log2
most significant bit (msb)
least significant bit (lsb)
(can be easily implemented with CLZ by isolating the smallest bit via x & -x)

Code examples:

For more applications see: Find first set - Wikipedia

Emulating this opcode is very wasteful in terms of runtime gas and bytecode size
(due to the need to store large constants like 0xffffffffffffffffffffffffffffffff or De Bruijn lookups like 0x0009010a0d15021d0b0e10121619031e080c141c0f111807131b17061a05041f).

Given that most processors have this opcode implemented natively, which can be used to efficiently implement the opcode in EVM, it feels even more wasteful not to have it available in the EVM.

Having this opcode in the EVM will enable cheaper runtime gas, cheaper validator computation, and smaller bytecode.

For the implementation of CLZ, I’d propose a gas cost of 3
(to keep in line with the gas cost of shl, shr, not, xor, etc).

This CLZ opcode will take in a single variable on the stack, and output a single variable onto the stack.

For the case of 0 as the input, CLZ returns 256 (the bit width of the input) to stay consistent with how it behaves in most processors, and for lesser bytecode required to handle the output.

CLZ has the following advantages over other bit counting operations:

log2(x) can be computed via 255 - clz(x), which automatically reverts for the case of 0 if using checked arithmetic. This makes it preferred over find first set (FFS).
Count trailing zeros (CTZ) is way cheaper to emulate with current Yul opcodes compared to CLZ. See the LibBit.sol. Inclusion of CLZ will make CTZ easy to compute, but not vice versa.

Basically, CLZ is the most common bit counting operation in processors because other bit counting operations can be easily implemented with CLZ.

axic · November 11, 2022, 5:57pm

Can you leave an example of an efficient ctz implementation using clz?

Well, not automatically, but with code generated by the compiler

Would you be able to rewrite the solady code using the hypothetical clz opcode? It would be a good way to see the complexity reduction.

Zoltan · September 2, 2024, 9:16pm

Here is a lookup table for that.

521 bytes long.

Test it on evm.codes !

You just need to open an instruction’s info, then click on " Reproduce in playground", because if you click on the menu’s playground link it will only accept Solidity code.

Here is the raw table in hex:

0000010002ea0000038bebab0099000004008c00ec36ac0000d09a000078004a050000008dd60000ed003700ad7100dd00e5d1429ba4009100fe796000004b00067f0000004700da8e00d70000000000ee12003d38000000ae0072000000de000025e632d2a043009c1ca5000000920000c7ff007a2061f1005200f84cb600150766800000a90000000048000000db408f5e0000d800003b0000003000000000eff6130000003e00392e000000000000af000000730000b100bc0000dfc1000000002600e7963375d36ea100440000009d001db3a6000000000000be930000000057c800000000e17b0e21c3625af2000000530000cbf9004d00b7000000162808006700810000e9008aaa980000003500cf0077490000d500000070dce441a390fd5f00007e0046d900000000113c00000000000024319f001b000000c6001ff051f7b5146500a8000000003f5d00003a002f0000f50000002d000000000000b0bb00c000000095746d00000000b2000000bd0000560000e00dc259000000ca0000000027000000e889970034ce7600d4006fe3a2fc007d45000010000000239e1a00c51e50b464a700005c000000f4002c000000babf00946c000000000055000c5800c9000000008800cd0000e2fb7c000f002219c44f63005b00f32b00b9006b0000540b00000087cc00fa0000184e00002ab86a000a0086000017002969098500006884008382

Zoltan · September 2, 2024, 10:47pm

Please put this EIP onto the table on Github!

https://github.com/search?q=repo%3Aethereum%2FEIPs%20zeroes&type=code

https://github.com/search?q=repo%3Aethereum%2FEIPs+leading+zeroes&type=code

https://github.com/search?q=repo%3Aethereum%2FEIPs+clz&type=code

Vectorized · September 4, 2024, 1:03pm

Some updates.

As of the latest Solady at the time of writing, solady/src/utils/LibBit.sol at 24b6272d83fba9a4233bafafbcd7f879e3b0948b · Vectorized/solady · GitHub

CLZ: 184 gas
FFS: 136 gas

Having a specialized CLZ opcode can reduce the amount of gas to about 8 gas for CLZ, and about 30 gas for FFS (due to the need to isolate the least significant bit before CLZ).

CLZ will be useful in sqrt, cbrt, logarithms, bitmap lookups.

As of now, the codebase that will benefit the most if CLZ were to be shipped immediately will be Uniswap V4. But since the Uniswap V4 codebase has already been submitted for final audits, and the singleton will be launched soon, the benefits of having this opcode in EVM may be limited going forward.

After considering the amount of EVM fragmentation this may cause, I am hesistant on pushing core devs to implement this opcode.