Gather for an EIP: Useful Bit manipulation instructions

dominic · June 27, 2022, 9:57am

Hi there,

While optimizing some data structures I’ve realized that the EVM is missing some in-memory bit operations that make it much more gas expensive to do certain calculations compared to x86. I think it would make sense to add them and would love to hear if you’ve come across similar mathematical instructions.

POPCNT - Population Count (Counting all set bits in a uint256)
CLZ - Count Leading Zeros (Getting the highest bit in a uint256)

These two operations need to be simulated in a loop at the moment and become thus pretty expensive. E.g. here is a 700 gas version of getting the highest bit Logarithm math operation in Solidity - Ethereum Stack Exchange

I might turn this into an EIP, so let me know if there are more instructions to add.

chfast · June 28, 2022, 5:44pm

Hi @dominic,

Such instructions are easy to add, but it can take up to 2 years if we start now. The most important part is justification why such instructions are useful. E.g. you would need to show a group of popular contracts what would benefit from new instructions.

chfast · August 24, 2022, 5:19pm

A few words about implementation:

POPCNT is data-parallel at first (perform popcount on 4 words) and finally sum the results. So in theory latency should be better than in ADD. But this is only in case the CPU has native popcount instruction what is not the case for the baseline x86-64.

CLZ seems to have the complexity of SHL, i.e. it is messy. There is a lot of handling of individual words being zero. I have not seen constant-time implementation yet. Having lzcnt instruction available should help a bit.

Native instructions availability for x86-64: Compiler Explorer.

I think you still should proceed with the EIP and try to move it to the “Review” state.

Vectorized · September 11, 2022, 7:35am

@dominic Thanks for making this proposal!

Accidentally made a new thread Create a new opcode for counting leading zeros (CLZ) without finding this first. Thanks @chfast

I think CLZ and POPCNT are best to be in separate EIPs, such that the usefulness of each opcode can be separately evaluated, giving a higher chance that one may pass.

Would you be available to work on an EIP or have been working on one already?
Otherwise, I can take on the effort and include you as an author too.

Imo, CLZ is too useful not to have it as a native opcode. Even 2 years will be worth the wait.

dominic · September 11, 2022, 11:36am

@Vectorized I like that approach let’s push together. CLZ is definitely more important - felt that if going through the process is by itself taking so much time that taking a group of bit manipulation instructions would be more efficient. But definitely better to have CLZ over not having anything.
If you’re around ETHStation or Bogota we can catch up in person, tag team for the EIP draft and the implementation PRs

axic · September 11, 2022, 3:22pm

Just keep in mind, that while from an EVM code perspective they seem simple, due to the EVM operates on 256-bit big endian words and most CPUs operate on 64-bit little endian words, this becomes slightly more complex to implement, and the cost should reflect that. One could argue that not/xor is overpriced compared to shl/shr.

chfast · October 1, 2022, 8:29am

This is not accurate. EVM implementations usually keep the value on the EVM stack in native form (i.e. little endian).

dominic · October 8, 2022, 10:11pm

@chfast @axic @Vectorized and anyone else, if you’re around this week for devcon (I am) I would definitely like to cut out an hour or two to get the EIP draft in place.

axic · November 11, 2022, 5:57pm

Sorry I missed this message, but would be happy to collaborate.

@Vectorized left some comments on the clz topic.