Count leading zeros (CLZ) is a native instruction in many processor architectures
(even in RISC architectures like ARM).
It is a basic building block used in many important fixed point and binary operations:
 lnWad
 powWad (due to lnWad)
 sqrt
 log2
 most significant bit (msb)
 least significant bit (lsb)
(can be easily implemented with CLZ by isolating the smallest bit viax & x
)
Code examples:
 solady/FixedPointMathLib.sol at main · Vectorized/solady · GitHub
 solady/LibBit.sol at main · Vectorized/solady · GitHub
For more applications see: Find first set  Wikipedia
Emulating this opcode is very wasteful in terms of runtime gas and bytecode size
(due to the need to store large constants like 0xffffffffffffffffffffffffffffffff
or De Bruijn lookups like 0x0009010a0d15021d0b0e10121619031e080c141c0f111807131b17061a05041f
).
Given that most processors have this opcode implemented natively, which can be used to efficiently implement the opcode in EVM, it feels even more wasteful not to have it available in the EVM.
Having this opcode in the EVM will enable cheaper runtime gas, cheaper validator computation, and smaller bytecode.
For the implementation of CLZ, I’d propose a gas cost of 3
(to keep in line with the gas cost of shl
, shr
, not
, xor
, etc).
This CLZ opcode will take in a single variable on the stack, and output a single variable onto the stack.
For the case of 0 as the input, CLZ returns 256 (the bit width of the input) to stay consistent with how it behaves in most processors, and for lesser bytecode required to handle the output.
CLZ has the following advantages over other bit counting operations:

log2(x)
can be computed via255  clz(x)
, which automatically reverts for the case of 0 if using checked arithmetic. This makes it preferred over find first set (FFS). 
Count trailing zeros (CTZ) is way cheaper to emulate with current Yul opcodes compared to CLZ. See the LibBit.sol. Inclusion of CLZ will make CTZ easy to compute, but not vice versa.
Basically, CLZ is the most common bit counting operation in processors because other bit counting operations can be easily implemented with CLZ.