Long-term L1 execution layer proposal: replace the EVM with RISC-V

xxuejie · April 23, 2025, 4:41am

One more bullet on ZK:

First of all, this is just my personal opinion on zero knowledge, my colleagues might have different views on ZK.

There 2 different use cases for RISC-V related to zero knowledge:

The virtual machine / IR used to express a program that will be proved by ZK
The underlying platform where ZK verifiers will run upon

A simple but inaccurate analogy, would be that there is a VM on top of ZK algorithms(bullet 1), and there is a VM beneath ZK algorithms(bullet 2). They should be discussed separately.

With a stricly-no-precompile design, Nervos CKB-VM perfectly fits bullet 2 here: you can compile a ZK verifier code down to RISC-V code, and run the ZK verifier on Nervos CKB-VM. In this sense, Nervos CKB will be flexible to support any arbitrary ZK solutions. In other words, I consider Nervos CKB-VM to be a decent choice as a VM beneath ZK algorithms.

Bullet 1 will be a separate use case, I’m not so familiar with zero knowledge proof internals to weigh in if RISC-V will be a proper solution. I suspect if certain properties of ZK algorithms, might rule the choices for VMs on top of ZK algorithms.

I might be wrong but I have a feeling that @vbuterin might be talking about bullet 2 here, or a proper VM beneath ZK algorithms, so maybe we don’t have to discussion if RISC-V is fit for ZK-proving?

gcolvin · April 23, 2025, 5:01am

Why would this have any more chance of success than switching to Wasm did?

grosu · April 23, 2025, 3:54pm

RISC-V may look appealing now, but like all languages, it will be out of fashion (and performance) before we know it. All languages worth their salt evolve fast initially, then migrate into another, better language.

Why limit to one language and not have them all? See how Ethereum could look like, where you plug-and-play new languages in minutes: https://youtu.be/dP3QraNv6tI?si=H9Jdi9BwOJeZ6-bu&t=1406. ZK can and will be universal soon, too, and then you will look back and wonder “Why did we pick RISC-V, after all?”.

mrLSD · April 23, 2025, 6:40pm

You claim that LLVM is one big bug. You won’t believe it – the Rust compile backend is LLVM. By saying this, you’re essentially stating that all languages with an LLVM backend are full of bugs. That’s a very bold statement. Compared to LLVM, Solidity is a helpless ant facing an elephant, and on top of that, it is by design unsafe and aimed at finding the best way to shoot yourself in the head – and not miss.

sorpaas · April 23, 2025, 7:48pm

Just as a thought experiment, I’m trying to see what it would look like if we added 64-bit mode to EVM. Here’s what I came up with: Add EIP: 64-bit mode EVM operations by sorpaas · Pull Request #9687 · ethereum/EIPs · GitHub

mrLSD · April 23, 2025, 8:02pm

Thanks for the detailed and relevant comment.

matt · April 23, 2025, 8:34pm

I find it very concerning that before our latest major rework of Ethereum’s VM, there are already proposals for replacing it. EOF was even touted as “better for zk proving” when pitched to ACD.

The same will be said for EVM. We have to maintain backwards compatibility in this way, because we have a made a strong guarantee that contracts today will continue working. However, if proving is 100x more expensive and EVM is charged proportionally, it is a de facto deprecation.

Not that this is bad, and not that I don’t think we shouldn’t move to RISC-V – I simply want us to have more confidence in the technical projects we embark on. The userspace is more delicate than system-facing protocol features. There is substantial downstream tooling that is being disregarded.

What we do today should align with our long term goals. We all agree real time ZKPs of mainnet blocks is where we want to go. If we think there is a better path to that than EOF, we should immediately pull EOF from Osaka. There is still a lot work to get EOF ready from clients, compilers, dapps, devs, FV tooling, etc. Going forward with it, knowing we need some different for real time ZKP is a major strategic misstep.

ericsson49 · April 23, 2025, 9:50pm

I believe there is a confusion about the actual goal behind the proposal: e.g. whether it’s prover efficiency or faster direct execution of smart-contracts.

peersky · April 23, 2025, 11:04pm

+9000%

In shiny theory - yes. In practice support and tooling matters more.

E.g: How do I catch abi.decode errors? This feature request is open for over 4(!) years: `try`/`catch` for `abi.decode()` or `abi.tryDecode` · Issue #10381 · ethereum/solidity · GitHub

Similar problems are facing not just Ethereum owned tooling, e.g. Hardhat fails to implement vscode pnpm import resolver for 1.5 years (!) by now: imports autocomplete does not work with pnpm · Issue #522 · NomicFoundation/hardhat-vscode · GitHub

I could go on and on. If you give me ugly syntax but with improvements on that side, I would prefer it.

u59149403 · April 24, 2025, 2:08am

Totally agree with these parts!

u59149403 · April 24, 2025, 2:22am

As well as I understand existing implementations do this, because it allows to implement all needed checks and gas computations. “EVM implementation in RISC-V” checks that you don’t increase your balance (creating new ETH out of nothing), that gas is properly calculated, that you don’t overflow stack, etc. If we allow users to submit arbitrary RISC-V code directly, the users will simply increase their own balance.

So, as well as I understand, we cannot simply throw away middleman. It is there for a reason.

(Note: I know nothing about zk. Everything I wrote above is simply my amateur understanding.)

MASDXI · April 24, 2025, 2:35am

I think this post should be here
Glue and coprocessor architectures some RISC-V, FPGA for intensive task

JJang · April 24, 2025, 5:53am

Please don’t say that " ZK-EVMs today are written as ZK RISC-V". The RISC-V-based zk-VM approach is not the standard of zk-EVM. It is just one of many approaches that have been proposed to address the shortcomings of the traditional zk-EVM circuit.

The RISC-V-based zk-VMs indeed have mitigated some of the shortcomings of the traditional zk-EVMs, but they also introduce new shortcomings: excessive compiler dependency (interpreting EVM as RISC-V). The mathematics in ZKP is not very effective against compiler intervention. In other words, the more we rely on the compiler, the smaller the area of security that ZKP covers. And the integrity of the compiler’s work is completely based on trust.

ericsson49 · April 24, 2025, 6:12am

100% agree here. It looks like there is a (not so) implicit assumption that by going the RISC-V way, we are gaining both prover efficiency and faster smart-contract execution. If it were so then replacing the EVM with RISC-V would be the way to go.
However, it caters only the direct smart-contract execution needs (which is IMHO a very reasonable option here).
For the ZKP side, it at least raises multiple questions. Though I personally believe it’s a sub-optimal choice, especially wrt the 100x prover efficiency objective.

ericsson49 · April 24, 2025, 6:25am

Doesn’t look like really different compared to EVM+EOF. In practice, the Yul’s EVM flavor defines single u256 type and it assumes variables are stack allocated (e.g. Variable Declarations, Function Declarations).

Due to the EVM stack size limit, it might be difficult to overcome, since one wants a predictable behavior here. Though, an imaginary Yul’s CPU flavor might be more flexible.

koute · April 24, 2025, 7:56am

xxuejie:

Some have misbeliefs that cycle(think of it just like gas but in the CPU world everyone talks about cycles) calculations will be impossible for RISC-V. This is also false. We have implemented proper cycle calculations ( /github.com/nervosnetwork/rfcs/blob/master/rfcs/0014-vm-cycle-limits/0014-vm-cycle-limits.md) across the whole umbrella of Nervos CKB-VM implementations, including interpeter based VMs and AOT based VMs. It has never been a problem for us to keep cycle consumptions at every step, and error out when a particular smart contracts run out of cycles. In fact, even for a hardware based RISC-V core, we don’t believe cycle consumptions will be a problem. Performance counters( /www.intel.com/content/www/us/en/developer/articles/tool/performance-counter-monitor.html) have long existed in modern CPUs, even the cycles calculated for a particular blockchain are quite different from the internal cycles of a particular RISC-V CPU die, one can definitely implement such blockchain cycles as a CPU performance counter, and have a real CPU die emit those cycles matching blockchain consensus.

Unfortunately this is wildly incorrect. (: You’re right that it’s possible, but the details are wrong.

Implementing safe gas metering isn’t as simple as just assigning a best-case cycle value to each instruction and calling it a day. For example, looking at your list of costs for each instruction I can see that you’ve assigned a cost of 3 cycles to each memory load instruction. It’s possible to relatively easily write a program that will make this instruction takes orders of magnitudes more cycles by deliberately triggering cache misses (and a cache miss can be hundreds of CPU cycles, which is a little more than just 3!). Even if you severely limit the maximum amount of memory the program can access (so that its working set fits in L3 cache) its still possible to exploit various microarchitectural corner cases to make memory accesses significantly more expensive. You can use such simple gas cost model in the average case and it will work, but as soon as someone is motivated enough to take down your chain they can launch a denial of service attack exploiting this.
You cannot use hardware performance counters to do gas metering, simply because they are not portable across different hardware and they are non deterministic - even if you run exactly the same program on exactly the same hardware you will get a different cycle count!

mratsim · April 24, 2025, 8:22am

benaadams:

A difficulty of going as low level as a cpu architecture for the VM means it’s difficult to optimize back up

So at the moment the UInt256 operations in the Evm are usually implemented with Avx2 or Avx512 (equivalent on Arm); operating with 256bit or 512bit registers.

If that was decomposed to RISC-V 64bit or worse 32bit instructions; that then becomes a extremely hard problem to recognise the patterns and then recompose back to 256bit operations when the RISC-V code is then run on AMDx64 or ARM64 that most blockbuilders and validators will be running (as there is currently no high performance RISC-V hardware).

C/C++ compilers; which spend a long time compiling, have a hard time doing auto-vectorisation in this way; and they only auto-vectorize simple repetitive structures, they don’t recognise entire algorithms and convert them to a totally different form (which is normally the case with using specific CPU SIMD instructions effectively).

The risk here is that zk-proving may get better, but blockbuiling and execution will deteriorate significantly?

It’s true that deoptimizing and getting back the bigger picture is impossible but in the case of the 256-bit EVM arithmetic it does not matter.

They are not implemented with AVX2 or Neon, they are implemented with add-with-carry which also does not exist in RISC-V or WASM unfortunately.

xxuejie · April 24, 2025, 8:26am

You are definitely correct here, memory load can indeed take variable amount of time. A hit on L1 cache and a total cache miss will certainly have significant different loading time. That being said, I do believe we have to compromise somewhere, I personally doubt any gas metering design can fully capture all the memory loading characteristics of a modern CPU, and even if a gas metering design has been put together to model exactly one CPU, what will happen on a different CPU model? Eventually we will have to stick to an approximation of a CPU. The bottomline I see here is: such a model which is inspired from true CPUs ( actually the original cycle charges in CKB-VM, come exactly from a true CPU: see Section 3.3 of /sifive.cdn.prismic.io/sifive%2F449c97ba-41e6-4b70-b522-8ddde5d3a34e_sifive+u54+manual+v19.08p0p1.pdf ), is already miles better than current EVM’s gas metering. Maybe it makes more sense to build a MMU model so cycle charges are not always the best case as it is in CKB-VM now, but I would argue it might be the best thing we will have now.
I think there is some confusion, apologies for not making myself clear. But I’m not taking about existing performance counters in modern CPUs, they are indeed indeterministic in a way. I’m just picturing a future where we employ a real RISC-V chip as a blockchain VM, in this sense, we can build a deterministic performance counter following a particular blockchain consensus cycle metering model, so in real hardware we can do cycle metering for blockchains as well.

mratsim · April 24, 2025, 8:36am

One thing that’s missing in the discussion so far are calling conventions and register handling.

On a physical CPU, you have a limited amount of registers, hence when a function calls another functions, it needs to save registers and restore them.

The calling convention defines:

who saves what registers between the caller and the callee
what registers are used to pass parameters
what registers are used to pass results
how the stack space is used (concept of red zone)

People are very encouraged to write small functions, meaning if they are not inlined you waste a lot of proof time proving data movements.

An ISA optimized for ZK would actually optimize for reducing those data movements. They make sense in the physical world because local memory (in register) is 15x to 150x faster than remote memory (L1 cache needs 15 cycles, L2 cache needs 100 cycles, RAM needs ~1000 cycles), but it’s useless for ZK proof.

A function has usually between 4~6 inputs and output, so naively following physical CPU calling conventions requires 2x4~6 proofs of data movements per function.

See Latency Numbers Every Programmer Should Know · GitHub, Latency numbers every programmer should know · GitHub

A close but related concept is addressing mode. Some architectures only allow operations to work on registers and require LOAD/STORE before, but what if you could do replace:

LOAD RegA <-- [addr 0x0001]
LOAD RegB <-- [addr 0x0002]
ADD   RegC <-- RegA, RegB
STORE [addr 0x0003] <-- RegC

by

ADD [addr 0x0003] <-- [addr 0x0001], [addr 0x0002]

We get 4x smaller trace and so 4x faster prover.

benaadams · April 24, 2025, 9:00am

However they can be implemented more efficiently with AVX512 or AVX512IFMA