Jello Paper as Canonical EVM Spec

@expede I simply found LOC interesting. First because I was surprised that the C++ version was no longer than the others, despite being a low-level language bummed for performance. It was shorter when I started work on it, and I’d have expected Python to be shorter still. And second because the K version was no longer than the rest. Formal specs used to be a lot longer than the code they formalized.

You talk to different people than I do. In that thread of core dev discussion I shared I think only Everett was unconditionally positive about dropping the Yellow Paper.

This is an old discussion, with a chapter on it in The Mythical Man Month. We are in the usual bind. Most all engineers and mathematicians can read English and ordinary math. They cannot, without prior study, read formal notations like K. And they usually haven’t time or desire for such study. (One way out is an English spec and a formal appendix. Another is to make the spec a tutorial on the formal notation.)

Regardless, as long as people want to maintain and improve the Yellow Paper there will be one. And if people want to complete a Jello Paper there will be one. And keeping them in sync with each other, the various clients, and the network itself will be a pain. It’s pain with only one spec too.

1 Like

Until last year the copyright prevented anyone but Gav from changing it. Yoichi put a lot of work into it, (and on formalizing the EVM) but then decided to stop working on crypto. So only recently are others taking up the task.

Thanks for taking the time to explore this! :tada::tada::tada:


Copyright Issues

Ah, excellent context — thanks! That’s a pretty horrendous situation for a community project to be in :scream: The copyright for such a foundational component of this project absolutely should have been released much earlier, but that’s neither here nor there.

It sounds like there’s a solve now, though, so that’s great :+1:


Unnamed Sources

My original source was the controversial semi(?) closed discussion in Prague, where it was brought up by someone else (I can’t remember who for the life of me). The Twitter Thread has a number of supporters, as well as some of the folks in this thread (including you :smiley: :man_mage: :tada:).

I agree that there is no strong consensus on this change!

I really enjoyed the IETF’s “rough consensus and running code” concept that came up at the EM Council of Prague. In the Jello Paper we have a more complete & working spec, and support from a number of people in the know. The question becomes if this is a “over my dead body”, and why if so?


The Yellow Paper is Dead, Long Live the Yellow Paper

Wonderful! Let people use whichever notation they like, and let a thousand flowers bloom! But even then, why not make the canonical spec for EIPs the formally verified executable one? People can still translate concepts into other notations, or have then generated directly by the K Framework. This was the intent of creating the Beige Paper by hand (since the Yellow Paper was too obtuse for most people), but the canonical spec was still assumed to be the YP.

One such setup could look like this:

                     Jello Paper
                   (canonical spec)
                         |
                         |
      +------------------+---------------------+
      |                  |                     |
      |                  |                     |
Yellow Paper         Beige Paper       BNF & Proof Trees
(historical)        (accessible)        (mathematicians)

The Comprehensibility of K

I actually strongly disagree on needing formal training to read K :open_mouth:

I’m not sure that I’d call K a “formal notation”. It is very much a programming language, and generates both executable code and documentation (conceptually in whatever notation you would like). This reads in a way not unlike modern Javascript’s arrow functions with destructuring, and consuming a K-based spec requires no additional study in PLT or specialized mathematical knowledge than any other programming language. Presumably you would like the writers of the spec in any notation to have some relevant background, though they may or may not be experts in K (I certainly am not one!)

For reference on how much more familiar K is than operational semantics, see the illustrative examples below

Example of K (from Jello Paper)

    syntax UnStackOp ::= "POP"
    rule <k> POP W => . ... </k>

    syntax StackOp ::= DUP ( Int ) | SWAP ( Int )
    rule <k> DUP(N)  WS:WordStack => #setStack ((WS [ N -Int 1 ]) : WS)                      ... </k>
    rule <k> SWAP(N) (W0 : WS)    => #setStack ((WS [ N -Int 1 ]) : (WS [ N -Int 1 := W0 ])) ... </k>

    syntax PushOp ::= PUSH ( Int , Int )
    rule <k> PUSH(_, W) => W ~> #push ... </k>

Formal Notation

This is what a common formal notation looks like:


Call For “Over My Dead Bodies” :skull_and_crossbones:

Correct me if I’m wrong, but you [Greg] are not an “over my dead body”. I would love to hear directly from people that are dead set against this idea, and why that’s the case for them! If you could point the people that you’re talking to at this thread, that would be very helpful!

2 Likes

The questions I keep bumping into are, “Who has the authority to make it canonical?” and, “Do we really need one spec to be canonical?”

I’d say the current consensus is that the Yellow Paper specifies the Ethereum protocol, and that where it doesn’t that is a defect that needs fixing. No other specification that I know of is complete enough to even attempt to do that job. When there are other specs that can then we can ask about their relative status.

Ok, but I can follow the VM part of the Yellow Paper well enough to write code, and can just barely follow the KEVM. And yes, the notation you quote is unreadable too (though it’s not the same as the K), but the math in the Yellow Paper is much easier than your example, and could be made even easier.

It looks to me like different specs are useful in different ways to different people (and programs). Different enough that there are people who can use some but cannot use the others.

2 Likes

To continue the semver sub-thread here, @axic pointed out EIP-178 for versioning the EVM:

1 Like

Another is to make the spec a tutorial on the formal notation.

I very much support this idea. Having an executable specification will help with verification efforts and proving invariants. The YP does a good job of being self-contained, being a tutorial in its own notation. I don’t see why the JP couldn’t do the same thing – making itself usable to anyone with a year or so of undergraduate study.

2 Likes

That you for all the comments above about K-EVM, both the positive and the negative ones! Of course, as creators and everyday users of KEVM it is hard for us at RV (http://runtimeverification.com) to see how it fails short, which is why it is so important to have feedback like here. Here is another thing that may be useful to consider:

  • At my knowledge, there are three other companies besides RV using KEVM in their smart contract execution or auditing. In particular, we have 2-3 smart contracts under “formal auditing” at any given moment. So there is a huge incentive to keep it updated and current! This may not happen with a paper model/semantics, because that is not directly used. Additionally, there are several formal tools based on it: an interpreter (which btw way, will be soon much faster, because we transitioned from OCAML to LLVM as target language we generate from K), a bounded symbolic state-space explorer, and a program verifier. And tens of thousands of tests for each of these that we execute in our CI. These tend to find bugs quickly in the spec, for the benefit of all. Only one tool, say an execution engine, can only find a particular subset of bugs but not all. For example, what if your spec happens to be non-deterministic? You may not find that with the interpreter but can find that with the state exploration tool (model checker).

If I understand it well, the main criticism is the lack of readability. I must confess that we put very little to almost no effort into making it readable! Our main focus was to make it precise and have all the tools work with it. This is good in my view, because we can address the readability part. It may only be a matter of having a good technical writer spend 2 months on it and turn it into a book/tutorial in literate programming style. Or two writers, a technical and a non-technical. We could do this if there is interest.

7 Likes

Hello, Dominik here from Gnosis, one of the companies that uses K-EVM internally. I recently completed the K onboarding, thought I’d share my experience:

  1. There is a pretty good K tutorial available. The only prerequisite that I remember is lambda calculus, but looking back, it is totally possible to skip 1_lambda and 3_lambda++ and complete only 2_imp and 4_imp++ to grasp K.
  2. I have read through the entire K-EVM specification. I disagree with @grosu that it is not readable. :slight_smile: If you have any experience with the functional paradigm in Haskell or even Javascript, it is very readable, and even if one doesn’t, it is very easy to get into.
    What makes K readable in my experience is its explicitness - everything is defined in configuration, syntax and rules - as such the language definition is self-contained, which is super powerful. I personally use K-EVM as the canonical specification of the EVM - if I am unsure about the semantics of an opcode/gas calculation/whatever, I refer to that definition, because of its completeness and simplicity.
  3. I have also used K for formal verification. This, imho, is its weak point when it comes to usability. The kprove tool is quite difficult to use and I found I had to contact RV folks to get help iteratively. In our discussions, this is one thing they are working on, as well as many other things like producing easily verifiable proofs etc. (as they switch from Java to Haskell backend), so I have no doubt that this will improve shortly. But specification of EVM, rather than its programs, is the main topic of this discussion, and for that, nothing beats K

Talking about tutorials/onboarding, I made a Formal verification workshop not long ago. I would be happy to expand this to a full-fledge course (working with RV) if there is demand from the community!

Best,

5 Likes

Maybe we can schedule a workshop for Council of Paris? Or at least, that week, maybe during ETHCC or after? This is the kind of not-quite-council stuff that I am happy to help volunteer around if there is interest.

When you say “I found I had to contact RV folks” – this is the sort of thing where public questions (e.g. github issues or forum threads) would be super helpful. Many times, multiple people have the same question, and either through search or from following along with a discussion, can get answers and learn.

2 Likes

Totally agree. A workshop at Council of Paris would probably be the easiest option, who can I get in touch with? :slight_smile: I could give an introduction into K, K-EVM and deductive program verification using K-EVM!

2 Likes

An aside: Isn’t the Council of Paris just one day? Seems it might not be worth squeezing in many workshops. Better to spend most of our time on the important issues affecting the Fellowship as a whole.

@dteiml that would be me :wink: PM me / I’ll reach out and we can see about organizing.

I’m thinking a few events post ETHCC with smaller groups.

Boris - good morning from very cold Urbana, IL. Please include myself and @ehildenb in this discussion about a K workshop. Thanks!

1 Like

Since I like arguing with myself, one authoritative group are the core developers, who more than anyone actually have to use the spec.

My own ideal would be in three parts:

  • a cleaned-up, Beige-tinted Yellow Paper,
  • a readable, executable K spec, and
  • an associated test suite.

The whole package would be one canonical specification, each part of which serves as a check on the others. Note that the test suite should not be generated by the K, or else it will not serve as an independent check. Never go to sea with two chronometers, take one or three.Fred Brooks, The Mythical Man Month

But most of all I’d like to see the Yellow, Jello, and Beige people work together towards what is best. I think @jpitts is working towards that goal. And some money for @expede, @grosu and some others would help ;->

5 Likes

We all do! So let’s do it! We need something that describes the spec in the best fashion possible.

Here here! Triplex redundancy is the strongest architecture you can get for the least amount of complexity.

Without a doubt!


I’d like to underscore that a canonical spec and test suite (that is maintained and updated with every EIP as it hits mainnet) is a great coordinating factor for all those who wish to use Ethereum and the EVM (compiler devs, tool makers, client devs, etc.) It is the one thing I believe led to the success of all those who built clients in the past year (ask @pipermerriam about Trinity and how the test suite helped)

We REALLY need to keep this documentation coordinated and as up to date as possible. As a piece of core infrastructure it is critical to our success. Funding it is a no brainer!

Also, there should be an Ethereum 2.0 branch for all of this. Tracing the history alone would be worth it.

3 Likes

But, does this really work when we’re talking about software whose goal is to reach consensus?

You either have one chronometer, or else you have differences in consensus, therefore forks, therefore multiple different blockchains.

Code is law at the end of the day. The way I see it, the only choice is whether it’s formal code or informal code. Everything else (papers, docs, whatever) has to play catch up to the existing code up to the point of being bug-compatible.

To me it’s amazing to say the least that informal code has brought us so far…

:+1: Yup, agreed! It’s the same people who are able to accept opcodes and consensus changes. Nothing is stopping others from running other versions of the spec, but the major implementations should agree.

:pray::pray::pray: Thanks for the shout out! Yes, we would love to devote more time to this work, but attention requires funding.

:100::100::100: Could not agree more! Yes, I am involved in some of this work, but I would like to see this regardless of who does the work. We need a solid footing to build this extremely important part of the world’s financial system & highly available censorship resistant open service worth tens of billions of dollars.

These quotes are about the test suite (I agree), bit to maybe expand on why the Jello Paper as the top-level canonical spec rather than competing specs…

TL;DR

Specs and implementations are different beasts. You don’t want multiple sources of truth in your spec. A spec is more abstract and need to have as little ambiguity as possible. Redundancy on implementations are a really great practice, but that’s a different layer in this process.

Rationale

Client Implementations

Yes, TMR is extremely useful for mission critical implementations. Each implementation in the TMR gate is based on the same high level spec: they need to achieve some common goal. We have something like this built into Ethereum itself: the different clients have to reach consensus about state transitions, and are parallel implementations of the same underlying spec.

TMR is great when you’re interacting with empirical data and concrete implementations: chronometers, flight system computers, deployed blockchain clients, and so on. These are also automated systems, with very easy ways to distinguish the degree of disagreement (results are outside of some error threshold).

To be clear: I fully support doing TMR for the actual EVM software. Super great idea, +1, 5/5 stars, would recommend :+1:

System Specification

Specifications should be a priori and universal. (This is part of the push for a formalized spec). If you have multiple specs, have an unspoken, fuzzy, abstract spec somewhere in the collective consciousness of which the others are simulacra. The versions in everyone’s head may or may not agree.

Multiple competing top-level (canonical) specs aren’t just inconvenient, they’re often actively confusing. For instance, if a feature is added to the Blue Paper after going through the full EIP process, but the Red and Purple Paper maintainers haven’t got around to adding it yet — is the feature available for use? What if the Red Paper is no longer being actively maintained? What if they all differ on parts of certain details in their description of the same feature? Which is the “right” one to base your concrete implementation off of? Should you have to read 3 versions of a tech spec and understand them to the point that you can compare every detail between them?

Since client implementers are doing this by hand, it’s a laborious process, and the likelihood of missing a subtlety between m/n specs is very high. With one canonical, formalized spec, you have a single source of truth.

Exactly! :ok_hand:

Yes, multiple test suites would be awesome (1x K-generated + 1x done by hand, + 1x other) :+1:

1 Like

I’m not really calling for TMR per se, just the principle. I’m trying to reconcile the fact that people disagree (some strongly) as to whether a better Yellow Paper or a brand-new K spec or something else would be best, and there are strong arguments on all sides.

In my suggestion the specs aren’t competing, they are reinforcing, and maintained by the same team.

A feature is available for use as soon as hits the mainnet. It will need to be implemented before that, but currently EIPs aren’t merged with the Yellow Paper until after that, because we are moving too fast and haven’t the resources to do better. Once the Yellow Paper is updated future implementations can refer to it. Until then you have to read the EIP and the Yellow Paper. Tests get written partly before and partly after an EIP hits the mainnet, depending on how quickly we are moving.

Having two top-level notations for a spec is not unusual. For instance, in the WebAssembly spec we find:

The prose and formal rules are equivalent, so that understanding of the formal notation is not required to read this specification. The formalism offers a more concise description in notation that is used widely in programming languages semantics and is readily amenable to mathematical proof.

This discussion goes back decades in our craft. Probably all the way back to Mesopotamia or further in other crafts. Forgive me for a long quotation from Brooks’ classic Mythical Man Month

English, or any other human language, is not naturally a precision instrument for such definitions. Therefore the manual writer must strain himself and his language to achieve the precision needed. An attractive alternative is to use a formal notation for such definitions. After all, precision is the stock in trade, the raison d’etre of formal notations.

Let us examine the merits and weaknesses of formal definitions. As noted, formal definitions are precise. They tend to be complete; gaps show more conspicuously, so they are filled sooner. What they lack is comprehensibility. With English prose one can show structural principles, delineate structure in stages or levels, and give examples. One can readily mark exceptions and emphasize contrasts. Most important, one can explain why. The formal definitions put forward so far have inspired wonder at their elegance and confidence in their precision. But they have demanded prose explanations to make their content easy to learn and teach. For these reasons, I think we will see future specifications to consist of both a formal definition and a prose definition.

An ancient adage warns, “Never go to sea with two chronometers; take one or three.” The same thing clearly applies to prose and formal definitions. If one has both, one must be the standard, and the other must be a derivative description, clearly labeled as such. Either can be the primary standard.

Brooks takes the same lesson from the chronometers as you do @expede. I’m adding tests to get three co-equal chronometers. But this only works if the work is coordinated. Note that not so long ago Yoichi was meeting with Gavin weekly to better understand and edit the Yellow Paper, authored a Lem spec of the EVM, (and specs of other Ethereum components) and was team lead for a group including formalists and testing. I don’t know if the Foundation is still supporting that group.

If we go with Brooks’ take on things I have my preferences, but all this is just opinions until there is a working group. For most people it shouldn’t matter which spec is primary. They will use the one that works for the purpose at hand, on the promise that they are equivalent. It’s the working group’s job to keep that promise.

1 Like

Yeah, this is what I basically meant by TMR. The three (written spec, K spec, and JSON test suite) work in unison, and if there’s ever a disagreement usually 2 of them would disagree with the third (easier to debug!)

They should absolutely be maintained in the same repo by the same maintainers: each proposed feature or fix (EIP) corresponds to a PR against the 3 items together. Tested together, deployed together, merged together. There should be a branch corresponding to each relevant network (at least mainnet and testnet/staging)… I think we can see the heirarchy.

The most important thing to remember is that unlike a lot of the client libraries, this repo would be vastly referenced, so strong commit message practices and the like should be encouraged.

1 Like