Create a new opcode for counting leading zeros (CLZ)

Count leading zeros (CLZ) is a native instruction in many processor architectures
(even in RISC architectures like ARM).

It is a basic building block used in many important fixed point and binary operations:

  • lnWad
  • powWad (due to lnWad)
  • sqrt
  • log2
  • most significant bit (msb)
  • least significant bit (lsb)
    (can be easily implemented with CLZ by isolating the smallest bit via x & -x)

Code examples:

For more applications see: Find first set - Wikipedia

Emulating this opcode is very wasteful in terms of runtime gas and bytecode size
(due to the need to store large constants like 0xffffffffffffffffffffffffffffffff or De Bruijn lookups like 0x0009010a0d15021d0b0e10121619031e080c141c0f111807131b17061a05041f).

Given that most processors have this opcode implemented natively, which can be used to efficiently implement the opcode in EVM, it feels even more wasteful not to have it available in the EVM.

Having this opcode in the EVM will enable cheaper runtime gas, cheaper validator computation, and smaller bytecode.

For the implementation of CLZ, I’d propose a gas cost of 3
(to keep in line with the gas cost of shl, shr, not, xor, etc).

This CLZ opcode will take in a single variable on the stack, and output a single variable onto the stack.

For the case of 0 as the input, CLZ returns 256 (the bit width of the input) to stay consistent with how it behaves in most processors, and for lesser bytecode required to handle the output.

CLZ has the following advantages over other bit counting operations:

  • log2(x) can be computed via 255 - clz(x), which automatically reverts for the case of 0 if using checked arithmetic. This makes it preferred over find first set (FFS).

  • Count trailing zeros (CTZ) is way cheaper to emulate with current Yul opcodes compared to CLZ. See the LibBit.sol. Inclusion of CLZ will make CTZ easy to compute, but not vice versa.

Basically, CLZ is the most common bit counting operation in processors because other bit counting operations can be easily implemented with CLZ.


Can you leave an example of an efficient ctz implementation using clz?

Well, not automatically, but with code generated by the compiler :slight_smile:

Would you be able to rewrite the solady code using the hypothetical clz opcode? It would be a good way to see the complexity reduction.