The Beacon chain is currently specified using a custom format which I’ll refer to as “executable markdown”.
The specification is a combination of high level descriptions written in markdown, and python functions embedded in markdown code blocks. The executable nature comes from custom scripts that are used to extract the function definitions from the markdown, which are then run against the test fixtures to verify the specification is inline with the expected behavior. The python has been written with a focus on readability and simplicity.
This format has some significant benefits:
- the python definition of the functionality is tested to be compliant with the official test fixtures.
- the executable nature means the spec can be used to generate test fixtures.
- the use of markdown and python result in a low barrier to entry to contribute to the spec.
In contrast, the official specification for Eth1.x is the yellow paper.
The yellow paper is written using mathematical notation and uses LaTex. These two technology choices (LaTex and Mathematical notation) result in both a high barrier to entry for contributing to the document. The mathematical notation also makes the document less accessible to those without an academic background.
I would like to formally propose we attempt to replace the yellow paper with a new specification written in the same style as the Eth2.0 Beacon chain specification. A collection of markdown documents with embedded python functions, and the necessary scripting scaffolding to make the specification executable.
The benefits from this effort being successful could be significant. Having a specification that is significantly more accessible for both contributing would lessen the burden currently carried by a small few. This should result in the specification staying more up to date, as well as seeing more contributions that make small improvements to make things better defined and easier to understand. In addition, by moving away from the less accessible mathematical notation, and towards a more accessible descriptive and code based format, we should see reduced barriers to entry for understanding how Ethereum works. This should make core protocol development more accessible.
Here is my rough sketch of how this could be executed.
I would propose that we first validate the idea. This would be done by taking a small and self contained chunk of functionality from the yellow paper and implementing it in this format. Candidates for this might be:
- the hexary patricia trie
- the POW function
- the bloom filter
For whatever is chosen, the markdown specification would need to be written and then the additional scripting scaffolding would need to be created so that the spec can be executed against whatever official test fixtures exist for that functionality. A pragmatic approach would likely be to lift much of the descriptive text directly from the yellow paper with minimal modification, and then to make use of the existing python implementation for that functionality as a starting point for the inlined python functions.
Instead of trying to backfill all historical hard forks, I would propose that we only focus on the latest hard fork. The Py-EVM codebase will likely be a valuable resource, though care will need to be taken to adjust the code to prioritize readability and simplicity over the current focus on clean library architecture.
The specification would not be expected to execute any of the fixture tests that deal with fork transitions, only things fully constrained to the chosen fork. At this stage, it should be possible to parallelize work on the spec since things like individual opcodes could be worked on concurrently by different contributors.
Once the spec has been expanded to cover the full fork rules for the latest hard fork, we would then need to decide whether to backfill old fork rules as well as determining how to handle transitions between different forks.