First of all, this is just my personal opinion on zero knowledge, my colleagues might have different views on ZK.
There 2 different use cases for RISC-V related to zero knowledge:
The virtual machine / IR used to express a program that will be proved by ZK
The underlying platform where ZK verifiers will run upon
A simple but inaccurate analogy, would be that there is a VM on top of ZK algorithms(bullet 1), and there is a VM beneath ZK algorithms(bullet 2). They should be discussed separately.
With a stricly-no-precompile design, Nervos CKB-VM perfectly fits bullet 2 here: you can compile a ZK verifier code down to RISC-V code, and run the ZK verifier on Nervos CKB-VM. In this sense, Nervos CKB will be flexible to support any arbitrary ZK solutions. In other words, I consider Nervos CKB-VM to be a decent choice as a VM beneath ZK algorithms.
Bullet 1 will be a separate use case, Iâm not so familiar with zero knowledge proof internals to weigh in if RISC-V will be a proper solution. I suspect if certain properties of ZK algorithms, might rule the choices for VMs on top of ZK algorithms.
I might be wrong but I have a feeling that @vbuterin might be talking about bullet 2 here, or a proper VM beneath ZK algorithms, so maybe we donât have to discussion if RISC-V is fit for ZK-proving?
RISC-V may look appealing now, but like all languages, it will be out of fashion (and performance) before we know it. All languages worth their salt evolve fast initially, then migrate into another, better language.
Why limit to one language and not have them all? See how Ethereum could look like, where you plug-and-play new languages in minutes: https://youtu.be/dP3QraNv6tI?si=H9Jdi9BwOJeZ6-bu&t=1406. ZK can and will be universal soon, too, and then you will look back and wonder âWhy did we pick RISC-V, after all?â.
You claim that LLVM is one big bug. You wonât believe it â the Rust compile backend is LLVM. By saying this, youâre essentially stating that all languages with an LLVM backend are full of bugs. Thatâs a very bold statement. Compared to LLVM, Solidity is a helpless ant facing an elephant, and on top of that, it is by design unsafe and aimed at finding the best way to shoot yourself in the head â and not miss.
I find it very concerning that before our latest major rework of Ethereumâs VM, there are already proposals for replacing it. EOF was even touted as âbetter for zk provingâ when pitched to ACD.
The same will be said for EVM. We have to maintain backwards compatibility in this way, because we have a made a strong guarantee that contracts today will continue working. However, if proving is 100x more expensive and EVM is charged proportionally, it is a de facto deprecation.
Not that this is bad, and not that I donât think we shouldnât move to RISC-V â I simply want us to have more confidence in the technical projects we embark on. The userspace is more delicate than system-facing protocol features. There is substantial downstream tooling that is being disregarded.
What we do today should align with our long term goals. We all agree real time ZKPs of mainnet blocks is where we want to go. If we think there is a better path to that than EOF, we should immediately pull EOF from Osaka. There is still a lot work to get EOF ready from clients, compilers, dapps, devs, FV tooling, etc. Going forward with it, knowing we need some different for real time ZKP is a major strategic misstep.
I believe there is a confusion about the actual goal behind the proposal: e.g. whether itâs prover efficiency or faster direct execution of smart-contracts.
As well as I understand existing implementations do this, because it allows to implement all needed checks and gas computations. âEVM implementation in RISC-Vâ checks that you donât increase your balance (creating new ETH out of nothing), that gas is properly calculated, that you donât overflow stack, etc. If we allow users to submit arbitrary RISC-V code directly, the users will simply increase their own balance.
So, as well as I understand, we cannot simply throw away middleman. It is there for a reason.
(Note: I know nothing about zk. Everything I wrote above is simply my amateur understanding.)
Please donât say that " ZK-EVMs today are written as ZK RISC-V". The RISC-V-based zk-VM approach is not the standard of zk-EVM. It is just one of many approaches that have been proposed to address the shortcomings of the traditional zk-EVM circuit.
The RISC-V-based zk-VMs indeed have mitigated some of the shortcomings of the traditional zk-EVMs, but they also introduce new shortcomings: excessive compiler dependency (interpreting EVM as RISC-V). The mathematics in ZKP is not very effective against compiler intervention. In other words, the more we rely on the compiler, the smaller the area of security that ZKP covers. And the integrity of the compilerâs work is completely based on trust.
100% agree here. It looks like there is a (not so) implicit assumption that by going the RISC-V way, we are gaining both prover efficiency and faster smart-contract execution. If it were so then replacing the EVM with RISC-V would be the way to go.
However, it caters only the direct smart-contract execution needs (which is IMHO a very reasonable option here).
For the ZKP side, it at least raises multiple questions. Though I personally believe itâs a sub-optimal choice, especially wrt the 100x prover efficiency objective.
Doesnât look like really different compared to EVM+EOF. In practice, the Yulâs EVM flavor defines single u256 type and it assumes variables are stack allocated (e.g. Variable Declarations, Function Declarations).
Due to the EVM stack size limit, it might be difficult to overcome, since one wants a predictable behavior here. Though, an imaginary Yulâs CPU flavor might be more flexible.
Unfortunately this is wildly incorrect. (: Youâre right that itâs possible, but the details are wrong.
Implementing safe gas metering isnât as simple as just assigning a best-case cycle value to each instruction and calling it a day. For example, looking at your list of costs for each instruction I can see that youâve assigned a cost of 3 cycles to each memory load instruction. Itâs possible to relatively easily write a program that will make this instruction takes orders of magnitudes more cycles by deliberately triggering cache misses (and a cache miss can be hundreds of CPU cycles, which is a little more than just 3!). Even if you severely limit the maximum amount of memory the program can access (so that its working set fits in L3 cache) its still possible to exploit various microarchitectural corner cases to make memory accesses significantly more expensive. You can use such simple gas cost model in the average case and it will work, but as soon as someone is motivated enough to take down your chain they can launch a denial of service attack exploiting this.
You cannot use hardware performance counters to do gas metering, simply because they are not portable across different hardware and they are non deterministic - even if you run exactly the same program on exactly the same hardware you will get a different cycle count!
You are definitely correct here, memory load can indeed take variable amount of time. A hit on L1 cache and a total cache miss will certainly have significant different loading time. That being said, I do believe we have to compromise somewhere, I personally doubt any gas metering design can fully capture all the memory loading characteristics of a modern CPU, and even if a gas metering design has been put together to model exactly one CPU, what will happen on a different CPU model? Eventually we will have to stick to an approximation of a CPU. The bottomline I see here is: such a model which is inspired from true CPUs ( actually the original cycle charges in CKB-VM, come exactly from a true CPU: see Section 3.3 of /sifive.cdn.prismic.io/sifive%2F449c97ba-41e6-4b70-b522-8ddde5d3a34e_sifive+u54+manual+v19.08p0p1.pdf ), is already miles better than current EVMâs gas metering. Maybe it makes more sense to build a MMU model so cycle charges are not always the best case as it is in CKB-VM now, but I would argue it might be the best thing we will have now.
I think there is some confusion, apologies for not making myself clear. But Iâm not taking about existing performance counters in modern CPUs, they are indeed indeterministic in a way. Iâm just picturing a future where we employ a real RISC-V chip as a blockchain VM, in this sense, we can build a deterministic performance counter following a particular blockchain consensus cycle metering model, so in real hardware we can do cycle metering for blockchains as well.
One thing thatâs missing in the discussion so far are calling conventions and register handling.
On a physical CPU, you have a limited amount of registers, hence when a function calls another functions, it needs to save registers and restore them.
The calling convention defines:
who saves what registers between the caller and the callee
what registers are used to pass parameters
what registers are used to pass results
how the stack space is used (concept of red zone)
People are very encouraged to write small functions, meaning if they are not inlined you waste a lot of proof time proving data movements.
An ISA optimized for ZK would actually optimize for reducing those data movements. They make sense in the physical world because local memory (in register) is 15x to 150x faster than remote memory (L1 cache needs 15 cycles, L2 cache needs 100 cycles, RAM needs ~1000 cycles), but itâs useless for ZK proof.
A function has usually between 4~6 inputs and output, so naively following physical CPU calling conventions requires 2x4~6 proofs of data movements per function.
A close but related concept is addressing mode. Some architectures only allow operations to work on registers and require LOAD/STORE before, but what if you could do replace: