Jello Paper as Canonical EVM Spec



:+1: Yup, agreed! It’s the same people who are able to accept opcodes and consensus changes. Nothing is stopping others from running other versions of the spec, but the major implementations should agree.

:pray::pray::pray: Thanks for the shout out! Yes, we would love to devote more time to this work, but attention requires funding.

:100::100::100: Could not agree more! Yes, I am involved in some of this work, but I would like to see this regardless of who does the work. We need a solid footing to build this extremely important part of the world’s financial system & highly available censorship resistant open service worth tens of billions of dollars.

These quotes are about the test suite (I agree), bit to maybe expand on why the Jello Paper as the top-level canonical spec rather than competing specs…


Specs and implementations are different beasts. You don’t want multiple sources of truth in your spec. A spec is more abstract and need to have as little ambiguity as possible. Redundancy on implementations are a really great practice, but that’s a different layer in this process.


Client Implementations

Yes, TMR is extremely useful for mission critical implementations. Each implementation in the TMR gate is based on the same high level spec: they need to achieve some common goal. We have something like this built into Ethereum itself: the different clients have to reach consensus about state transitions, and are parallel implementations of the same underlying spec.

TMR is great when you’re interacting with empirical data and concrete implementations: chronometers, flight system computers, deployed blockchain clients, and so on. These are also automated systems, with very easy ways to distinguish the degree of disagreement (results are outside of some error threshold).

To be clear: I fully support doing TMR for the actual EVM software. Super great idea, +1, 5/5 stars, would recommend :+1:

System Specification

Specifications should be a priori and universal. (This is part of the push for a formalized spec). If you have multiple specs, have an unspoken, fuzzy, abstract spec somewhere in the collective consciousness of which the others are simulacra. The versions in everyone’s head may or may not agree.

Multiple competing top-level (canonical) specs aren’t just inconvenient, they’re often actively confusing. For instance, if a feature is added to the Blue Paper after going through the full EIP process, but the Red and Purple Paper maintainers haven’t got around to adding it yet — is the feature available for use? What if the Red Paper is no longer being actively maintained? What if they all differ on parts of certain details in their description of the same feature? Which is the “right” one to base your concrete implementation off of? Should you have to read 3 versions of a tech spec and understand them to the point that you can compare every detail between them?

Since client implementers are doing this by hand, it’s a laborious process, and the likelihood of missing a subtlety between m/n specs is very high. With one canonical, formalized spec, you have a single source of truth.

Exactly! :ok_hand:

Yes, multiple test suites would be awesome (1x K-generated + 1x done by hand, + 1x other) :+1:


I’m not really calling for TMR per se, just the principle. I’m trying to reconcile the fact that people disagree (some strongly) as to whether a better Yellow Paper or a brand-new K spec or something else would be best, and there are strong arguments on all sides.

In my suggestion the specs aren’t competing, they are reinforcing, and maintained by the same team.

A feature is available for use as soon as hits the mainnet. It will need to be implemented before that, but currently EIPs aren’t merged with the Yellow Paper until after that, because we are moving too fast and haven’t the resources to do better. Once the Yellow Paper is updated future implementations can refer to it. Until then you have to read the EIP and the Yellow Paper. Tests get written partly before and partly after an EIP hits the mainnet, depending on how quickly we are moving.

Having two top-level notations for a spec is not unusual. For instance, in the WebAssembly spec we find:

The prose and formal rules are equivalent, so that understanding of the formal notation is not required to read this specification. The formalism offers a more concise description in notation that is used widely in programming languages semantics and is readily amenable to mathematical proof.

This discussion goes back decades in our craft. Probably all the way back to Mesopotamia or further in other crafts. Forgive me for a long quotation from Brooks’ classic Mythical Man Month

English, or any other human language, is not naturally a precision instrument for such definitions. Therefore the manual writer must strain himself and his language to achieve the precision needed. An attractive alternative is to use a formal notation for such definitions. After all, precision is the stock in trade, the raison d’etre of formal notations.

Let us examine the merits and weaknesses of formal definitions. As noted, formal definitions are precise. They tend to be complete; gaps show more conspicuously, so they are filled sooner. What they lack is comprehensibility. With English prose one can show structural principles, delineate structure in stages or levels, and give examples. One can readily mark exceptions and emphasize contrasts. Most important, one can explain why. The formal definitions put forward so far have inspired wonder at their elegance and confidence in their precision. But they have demanded prose explanations to make their content easy to learn and teach. For these reasons, I think we will see future specifications to consist of both a formal definition and a prose definition.

An ancient adage warns, “Never go to sea with two chronometers; take one or three.” The same thing clearly applies to prose and formal definitions. If one has both, one must be the standard, and the other must be a derivative description, clearly labeled as such. Either can be the primary standard.

Brooks takes the same lesson from the chronometers as you do @expede. I’m adding tests to get three co-equal chronometers. But this only works if the work is coordinated. Note that not so long ago Yoichi was meeting with Gavin weekly to better understand and edit the Yellow Paper, authored a Lem spec of the EVM, (and specs of other Ethereum components) and was team lead for a group including formalists and testing. I don’t know if the Foundation is still supporting that group.

If we go with Brooks’ take on things I have my preferences, but all this is just opinions until there is a working group. For most people it shouldn’t matter which spec is primary. They will use the one that works for the purpose at hand, on the promise that they are equivalent. It’s the working group’s job to keep that promise.


Yeah, this is what I basically meant by TMR. The three (written spec, K spec, and JSON test suite) work in unison, and if there’s ever a disagreement usually 2 of them would disagree with the third (easier to debug!)

They should absolutely be maintained in the same repo by the same maintainers: each proposed feature or fix (EIP) corresponds to a PR against the 3 items together. Tested together, deployed together, merged together. There should be a branch corresponding to each relevant network (at least mainnet and testnet/staging)… I think we can see the heirarchy.

The most important thing to remember is that unlike a lot of the client libraries, this repo would be vastly referenced, so strong commit message practices and the like should be encouraged.


I hate to disagree with someone who agrees with me, @fubuloubu (at least until @expede convinces me otherwise :wink: ) but single PRs against one repo seem unworkable, given that there are currently three repos, maintained by different teams working on different schedules. I think I sketched out the current process above. At best they can be synced up at some point after a release. We don’t operate on three-year release cycles with a well-compensated documentation team.


That is… Fair!

For the record, this is exactly how a well-organized, well-compensated engineering team would handle documentation for a high-security, complex system that is safety-/mission-critical.


Not sure which this is that. Bryant, but of course we don’t have any such teams. And it’s time to go dance to some reggae. Later.


Good documentation, basically


You have to pay people for good documentation, because no one wants to write it and maintain it.


I like to write good documentation, and I’m not alone. The pros at it love to do it. And lots of people like to get paid for whatever value they create, because they need food, shelter, and medicine.


Obviously I have some interest in seeing the Jello Paper adopted, so take my $.02 with this in mind.

A feature is available for use as soon as hits the mainnet. It will need to be implemented before that, but currently EIPs aren’t merged with the Yellow Paper until after that, because we are moving too fast and haven’t the resources to do better. Once the Yellow Paper is updated future implementations can refer to it. Until then you have to read the EIP and the Yellow Paper. Tests get written partly before and partly after an EIP hits the mainnet, depending on how quickly we are moving.

I think this is emblematic of the problems with the Yellow Paper. Few people understand it, fewer want to touch it, and at the end of the day it trails production because its utility is somewhat limited to being translational. I think the Jello Paper provides a lot more value in the release cycle that makes it worth updating before releasing code to mainnet.

For example, the Constantinople reentrancy probably would have been caught through a Jello Paper update, assuming KEVM had a re-entrancy model (it currently does not but AFAIK one is being actively investigated); a lot of existing contracts in integration tests likely would have failed symbolic re-entrancy safety checks under the new EIP changes.

For most people it shouldn’t matter which spec is primary. They will use the one that works for the purpose at hand, on the promise that they are equivalent. It’s the working group’s job to keep that promise.

I think what the RV and K folks want to see from the Ethereum community is commitment to long-term maintenance of the Jello Paper and use in documentation / communications material. The Jello Paper and KEVM should be independent of RV Inc., UIUC, or any of its other stakeholders such as smart contract auditors who are actively using it.

A commitment to maintaining the spec would be just as useful as funding for R&D if managed appropriately, and could be even more useful for ensuring the independence and antifragility of the document.

I totally agree with the Hydra approach… I don’t think there should necessarily be one official specification, but the Yellow Paper has unfortunately grown to sort of inhabit that space, and some official guidance that this is not the case could be helpful.

It’s also worth noting that K specs can contain e.g. LaTeX, so you can have a branch of the KEVM repository or a flag on that project which also generates either the beige(/yellow?) paper, the Jello paper, or a combination of the two where both notations are presented in-line (to help serve as translational if you need to use one of the specs for some reason but only want to understand/make reference to the other).

At the end of the day, my own personal goals would be:

  • Seeing commitment to maintaining formal specs actively.
  • Updating specs at appropriate points in the release cycle (before mainnet, testnet, etc).
  • Making all specs involved more readable; merging the Yellow and Beige paper could fall here.
  • Ensuring some spec exists in every high level category of specification (purely mathematical/on-paper, executable, test-generating, usable in verification, etc.). If multiple goals can be hit optimally by a single repo as KEVM attempts, that’s great, if not, other specs can be merged in or included.
  • Avoiding excessive redundancy. I think for example KEVM/the Jello Paper provides some clear value over the Yellow Paper in its ability to be used for proofs. The yellow/beige papers could probably be somehow consolidated, though someone of course needs to take ownership for that.


There are people maintaining the Yellow Paper, and that situation is improving since Gavin released the copyright. I think the issue of whether we can get the specs ahead of the code is independent of the form of the specs.


I just want to see long-term commitment to a good specification, and a good process. And I’m seeing lots of good ideas here.


We are on the same page, @gcolvin. Cannot speak for everybody, but I can speak for the RV, UIUC and K teams: we would be glad to take the task of properly documenting the KEVM seriously. You and @expede and anybody else who is interested are very welcome to help, even if only with suggestions for notation and working. I know it, in particular, that you dislike the XML-like notation for semantic cells in K. In fact, K is very flexible wrt syntax; you can literally choose whatever syntax you want for any symbol. Moreover, as Philip pointed out, we can also generate Latex from K definitions, and you can pick whatever Latex macro you wish for any symbol. See, for example, the bubbles in this paper: For example, this is how the program configuration looks like in a C-like language:


Here is how the semantics of variable lookup looks like:


And here is something more complicated, like exceptions in the same imperative language:

The point is that this can be easily changed and we would be very glad to work with you and anybody interested to come up with the most appropriate notation for the KEVM. Not everybody needs to see the textual definition, many can only read the PDF generated out of it, following whatever notation we all agree upon.


I think that this is a major advantage for K (and the Jello Paper). We can generate classic semantics (BMF, judgements, &c), a custom backend for YP-style idioms, and many more. Rather than duplicating a bunch of work N times to support N docs (in one or more repos — I really don’t think that’s the issue), we can have one canonical spec in K that generates whatever notation we want. It still leaves open the ability to write other specs by hand.

I think that this conversation does need to distinguish between concerns of notation and TMR or n/m specs. For example, I strongly agree with the former, and hugely disagree with the later :stuck_out_tongue_winking_eye:


I’m just going to throw something out there having been previously un-involved in this conversation.

To me, it seems obvious that the yellow paper is past its useful lifetime. I’ve read it many times, and I can agree that it is dense. It’s nearly impossible to update as well. (As is evidenced by the fact that it doesn’t keep up with the code). To the extent that the Jello Paper can be more easily updated, and that it can, over time, become the generating document for all sorts of things (Latex specs, test cases, formal verification), it seems to me an obvious choice.

At this point, instead of asking if the Jello Paper should be the formal spec, I think we should all simply say “The Jello Paper is the official spec.” So: “The Jello Paper is the official spec.”

I would go further to propose that the magicians make a concrete statement (or “sense of the fellowship”) at its next meeting to the same effect. The stated purpose of the magicians is to guide the technical direction of the ecosystem. This seems like a clear win.


It’s not that easy in this decentralized community. The current Yellow Paper editors operate independently of the Magicians, so such a statement wouldn’t necessarily carry much weight, even if you could get a consensus behind it. And the Yellow Paper editors operate independently of the EIP process, so the usual course of forming a Ring and getting to consensus on an EIP to improve the Yellow Paper isn’t there. The Yellow Paper being the current “official spec” that creates a bit of an impasse. @jpitts is trying to start up a Specification working group to pull together interested people, so I hope he drops by and explains where that is going.


Why do “the current yellow paper editors operate independently of the Magicians”? Why do “the current yellow paper editors operate independently of the EIP process”? Why don’t the future Jello paper editors work with the Magicians and the EIP process?

Open question to anyone who can answer: are there known issues with the K Framework that allow for the possibility of an ill-defined spec? There is clearly the possibility of an ill-defined spec in the current methodology, so I would think we would be trying to move towards more rigor.


I think we’re just going to start filing issues, specifying new EIPs in K, and doing the work. Not too complicated :wink:

If people want to maintain the Yellow Paper — great! It’s all time and effort.

Tightening the EIP process for Core EIPs to make it clear when the specs need to be done is also a good process. Especially if they can assist in testing process and so on.