While optimizing some data structures I’ve realized that the EVM is missing some in-memory bit operations that make it much more gas expensive to do certain calculations compared to x86. I think it would make sense to add them and would love to hear if you’ve come across similar mathematical instructions.
POPCNT - Population Count (Counting all set bits in a uint256)
CLZ - Count Leading Zeros (Getting the highest bit in a uint256)
Such instructions are easy to add, but it can take up to 2 years if we start now. The most important part is justification why such instructions are useful. E.g. you would need to show a group of popular contracts what would benefit from new instructions.
POPCNT is data-parallel at first (perform popcount on 4 words) and finally sum the results. So in theory latency should be better than in ADD. But this is only in case the CPU has native popcount instruction what is not the case for the baseline x86-64.
CLZ seems to have the complexity of SHL, i.e. it is messy. There is a lot of handling of individual words being zero. I have not seen constant-time implementation yet. Having lzcnt instruction available should help a bit.
I think CLZ and POPCNT are best to be in separate EIPs, such that the usefulness of each opcode can be separately evaluated, giving a higher chance that one may pass.
Would you be available to work on an EIP or have been working on one already?
Otherwise, I can take on the effort and include you as an author too.
Imo, CLZ is too useful not to have it as a native opcode. Even 2 years will be worth the wait.
@Vectorized I like that approach let’s push together. CLZ is definitely more important - felt that if going through the process is by itself taking so much time that taking a group of bit manipulation instructions would be more efficient. But definitely better to have CLZ over not having anything.
If you’re around ETHStation or Bogota we can catch up in person, tag team for the EIP draft and the implementation PRs
Just keep in mind, that while from an EVM code perspective they seem simple, due to the EVM operates on 256-bit big endian words and most CPUs operate on 64-bit little endian words, this becomes slightly more complex to implement, and the cost should reflect that. One could argue that not/xor is overpriced compared to shl/shr.
@chfast@axic@Vectorized and anyone else, if you’re around this week for devcon (I am) I would definitely like to cut out an hour or two to get the EIP draft in place.