I believe there is a confusion about the actual goal behind the proposal: e.g. whether it’s prover efficiency or faster direct execution of smart-contracts.
+9000%
In shiny theory - yes. In practice support and tooling matters more.
E.g: How do I catch abi.decode
errors? This feature request is open for over 4(!) years: `try`/`catch` for `abi.decode()` or `abi.tryDecode` · Issue #10381 · ethereum/solidity · GitHub
Similar problems are facing not just Ethereum owned tooling, e.g. Hardhat fails to implement vscode pnpm import resolver for 1.5 years (!) by now: imports autocomplete does not work with pnpm · Issue #522 · NomicFoundation/hardhat-vscode · GitHub
I could go on and on. If you give me ugly syntax but with improvements on that side, I would prefer it.
Totally agree with these parts!
As well as I understand existing implementations do this, because it allows to implement all needed checks and gas computations. “EVM implementation in RISC-V” checks that you don’t increase your balance (creating new ETH out of nothing), that gas is properly calculated, that you don’t overflow stack, etc. If we allow users to submit arbitrary RISC-V code directly, the users will simply increase their own balance.
So, as well as I understand, we cannot simply throw away middleman. It is there for a reason.
(Note: I know nothing about zk. Everything I wrote above is simply my amateur understanding.)
I think this post should be here
Glue and coprocessor architectures some RISC-V, FPGA for intensive task
Please don’t say that " ZK-EVMs today are written as ZK RISC-V". The RISC-V-based zk-VM approach is not the standard of zk-EVM. It is just one of many approaches that have been proposed to address the shortcomings of the traditional zk-EVM circuit.
The RISC-V-based zk-VMs indeed have mitigated some of the shortcomings of the traditional zk-EVMs, but they also introduce new shortcomings: excessive compiler dependency (interpreting EVM as RISC-V). The mathematics in ZKP is not very effective against compiler intervention. In other words, the more we rely on the compiler, the smaller the area of security that ZKP covers. And the integrity of the compiler’s work is completely based on trust.
100% agree here. It looks like there is a (not so) implicit assumption that by going the RISC-V way, we are gaining both prover efficiency and faster smart-contract execution. If it were so then replacing the EVM with RISC-V would be the way to go.
However, it caters only the direct smart-contract execution needs (which is IMHO a very reasonable option here).
For the ZKP side, it at least raises multiple questions. Though I personally believe it’s a sub-optimal choice, especially wrt the 100x prover efficiency objective.
Doesn’t look like really different compared to EVM+EOF. In practice, the Yul’s EVM flavor defines single u256
type and it assumes variables are stack allocated (e.g. Variable Declarations, Function Declarations).
Due to the EVM stack size limit, it might be difficult to overcome, since one wants a predictable behavior here. Though, an imaginary Yul’s CPU flavor might be more flexible.
Unfortunately this is wildly incorrect. (: You’re right that it’s possible, but the details are wrong.
- Implementing safe gas metering isn’t as simple as just assigning a best-case cycle value to each instruction and calling it a day. For example, looking at your list of costs for each instruction I can see that you’ve assigned a cost of 3 cycles to each memory load instruction. It’s possible to relatively easily write a program that will make this instruction takes orders of magnitudes more cycles by deliberately triggering cache misses (and a cache miss can be hundreds of CPU cycles, which is a little more than just 3!). Even if you severely limit the maximum amount of memory the program can access (so that its working set fits in L3 cache) its still possible to exploit various microarchitectural corner cases to make memory accesses significantly more expensive. You can use such simple gas cost model in the average case and it will work, but as soon as someone is motivated enough to take down your chain they can launch a denial of service attack exploiting this.
- You cannot use hardware performance counters to do gas metering, simply because they are not portable across different hardware and they are non deterministic - even if you run exactly the same program on exactly the same hardware you will get a different cycle count!
It’s true that deoptimizing and getting back the bigger picture is impossible but in the case of the 256-bit EVM arithmetic it does not matter.
They are not implemented with AVX2 or Neon, they are implemented with add-with-carry which also does not exist in RISC-V or WASM unfortunately.
- You are definitely correct here, memory load can indeed take variable amount of time. A hit on L1 cache and a total cache miss will certainly have significant different loading time. That being said, I do believe we have to compromise somewhere, I personally doubt any gas metering design can fully capture all the memory loading characteristics of a modern CPU, and even if a gas metering design has been put together to model exactly one CPU, what will happen on a different CPU model? Eventually we will have to stick to an approximation of a CPU. The bottomline I see here is: such a model which is inspired from true CPUs ( actually the original cycle charges in CKB-VM, come exactly from a true CPU: see Section 3.3 of /sifive.cdn.prismic.io/sifive%2F449c97ba-41e6-4b70-b522-8ddde5d3a34e_sifive+u54+manual+v19.08p0p1.pdf ), is already miles better than current EVM’s gas metering. Maybe it makes more sense to build a MMU model so cycle charges are not always the best case as it is in CKB-VM now, but I would argue it might be the best thing we will have now.
- I think there is some confusion, apologies for not making myself clear. But I’m not taking about existing performance counters in modern CPUs, they are indeed indeterministic in a way. I’m just picturing a future where we employ a real RISC-V chip as a blockchain VM, in this sense, we can build a deterministic performance counter following a particular blockchain consensus cycle metering model, so in real hardware we can do cycle metering for blockchains as well.
One thing that’s missing in the discussion so far are calling conventions and register handling.
On a physical CPU, you have a limited amount of registers, hence when a function calls another functions, it needs to save registers and restore them.
The calling convention defines:
- who saves what registers between the caller and the callee
- what registers are used to pass parameters
- what registers are used to pass results
- how the stack space is used (concept of red zone)
People are very encouraged to write small functions, meaning if they are not inlined you waste a lot of proof time proving data movements.
An ISA optimized for ZK would actually optimize for reducing those data movements. They make sense in the physical world because local memory (in register) is 15x to 150x faster than remote memory (L1 cache needs 15 cycles, L2 cache needs 100 cycles, RAM needs ~1000 cycles), but it’s useless for ZK proof.
A function has usually between 4~6 inputs and output, so naively following physical CPU calling conventions requires 2x4~6 proofs of data movements per function.
See Latency Numbers Every Programmer Should Know · GitHub, Latency numbers every programmer should know · GitHub
A close but related concept is addressing mode. Some architectures only allow operations to work on registers and require LOAD/STORE before, but what if you could do replace:
LOAD RegA <-- [addr 0x0001]
LOAD RegB <-- [addr 0x0002]
ADD RegC <-- RegA, RegB
STORE [addr 0x0003] <-- RegC
by
ADD [addr 0x0003] <-- [addr 0x0001], [addr 0x0002]
We get 4x smaller trace and so 4x faster prover.
However they can be implemented more efficiently with AVX512 or AVX512IFMA
However they can be implemented more efficiently with AVX512 or AVX512IFMA
They can’t
If you have an implementation with AVX512 or AVX512IFMA feel free to send it over, I will benchmark it against my library, or you can run the benchmark yourself:
And to further clarify myself(apologies for not making it clear earlier):
- With RISC-V as a blockchain VM, we can only have an approximation of real memory load metrics on modern CPU. It will never be 100% accurate, but I personally would argue it is close enough, and it might be the best we can have in blockchains which require determinism. By limiting the memory of a VM(and we should do this even if not considering memory loads) and introducing a minimal but reasonable MMU model, I doubt if a denial of service attack can be performed by abusing memory loads.
- My initial description, simply states the fact that implementing cycle metering has never been a problem for CKB-VM, our RISC-V VM(some have doubted this, stating for JIT/AOT VM this is not possible). I do admit that right now in CKB-VM we are slightly over optimistic by assuming memory loads would hit cache. We do keep monitor inconsistencies between true running time and charged cycles in CKB-VM smart contracts, and fix those inconsistencies periodically in hardforks when we can.
- My discussion on hardware performance counters, does not refer to performance counters in modern x64 or aarch64 CPUs. I’m merely talking about taking an existing open source RISC-V CPU implementation, one can add a performance counter following CKB-VM’s(or other RISC-V blockchain VM’s) cycle metering design. The result will be a real RISC-V CPU that can also measure blockchain-style cycles independent from CPU’s own internal cycles. I’m merely stating that this is a possible thing to do.
For the record, while RISC-V does not have the adc instruction directly, there are ways to achieve similar effect:
Solution 1: by chaining add and sltu, one can get similar code chains to add w/ adc: /godbolt.org/z/v165TYKqb
Solution 2: in CKB-VM, we have abused the idea of macro-op fusion (please use archive to look for s://riscv.org/wp-content/uploads/2016/07/Tue1130celio-fusion-finalV2.pdf ) and introduce adc to CKB-VM: /github.com/nervosnetwork/rfcs/blob/master/rfcs/0033-ckb-vm-version-1/0033-ckb-vm-version-1.md#42-mop
For 256-bit, RISC-V has a 2x longer trace than ARMv7: Compiler Explorer
using Clang 13 because afterwards, Clang stopped generating code for >128-bit integer with the C ExtInt/BitInt extension.
Solution 2: in CKB-VM, we have abused the idea of macro-op fusion
This is something that can indeed be implemented in an interpreter to reduce VM traces, though the precompile approach is already ubiquitous and more performant for most bottlenecks (cryptography, u256, hashes).
Yes. But I disagree that it’s the best we can do. (: The way to do it is to assume the worst case, and then extend the model to lower those costs in cases when we know it’s safe. If you assume the best/average case then you’re just opening yourself up to denial of service attacks. The fact that you haven’t had any problems so far only means that no one bothered to attack your chain yet, not that your gas metering is safe.
And what makes you think that on this hypothetical future RISC-V CPU the performance counters will be deterministic? (:
In fact, I can guarantee you that they will not be deterministic. The non-determinism of the performance counters is not an arbitrary decision that the hardware designers made; it’s a direct consequence of making a high performance, superscalar, pipelined, out-of-order CPU core with multi-level caches.
Yes, you can in theory add such performance counters to a CPU, but that means the CPU will have to be slow to keep the counters deterministic, so what’s the point of running on a slow CPU with deterministic hardware gas metering if we can get faster performance by doing the gas metering in software on a fast CPU?
This comparison is based on the rv32gc
ISA. If you switch to rv64gc
on the same page, the trace is actually smaller than ARMv7. I’m not certain why most RISC-V based zkVMs prefer rv32
over rv64
, but larger registers naturally lead to fewer operations when handling 256-bit values. In the future, I hope to see more zkVMs based on RISC-V with 64-bit registers. RISC-V even has rv128
well defined which would lead to even fewer operations, although I’m not advocating for it given its limited support by the current compilers.
An ISA optimized for ZK would actually optimize for reducing those data movements.
I think this choice represents a fundamental trade-off between performance and tooling maturity.
Optimizing purely for speed would suggest creating a custom ISA tailored to ZK constraints, but this requires significant research, implementation, auditing and tooling development - ultimately creating higher barriers to adoption and taking more time to fully develop it.
Alternatively, using a stable, well-defined ISA like RISC-V leverages years of development and growing adoption. I believe this approach sacrifices some performance but significantly reduces implementation effort through existing tooling.
For a blockchain base layer intended to be open and widely adopted, I believe leveraging RISC-V’s openness and mature tooling ecosystem maximizes adoption. While custom ISAs may have their place in specific projects, for Ethereum specifically, I would prefer the path that offers greater developer freedom and broader adoption - which is the goal of open standard ISAs like RISC-V.
RISC-V is simple, yes — but not optimized for zero-knowledge. Custom ZK-friendly ISAs can outperform it significantly. There’s nothing inherently ZK-native about RISC-V other than being “not as bad as EVM.” Smaller opcode amount compared to WASM … but this also does not really matter as a’la’carte opcode selecting in proving schemes exist.