EIP-3155: Create EVM Trace Specification

Discussion for EIP-3155 which can be found here: https://github.com/ethereum/EIPs/pull/3155

It tries to formalize the de-facto standard for tracing during state tests and move it to a more visible place s.th. it will be picked up by more implementations

1 Like

I added some in-line comments in the EIP. Generally summarized as

  • The CUT should not be part of the EIP, the standard is useful outside of just fuzz testing (the rpc debug_standardTraceBadBlocks for example)
  • field types need to be specified (what’s a hex string and what’s a json number, and unit for time, for example)
  • de-facto standard -> common format. I think Parity’s trace is more of a de-facto standard as we have many users asking for it. Mostly because of the internal transaction and state handling.
1 Like

Thank you very much for the comments! They are very helpful, I will insert them tomorrow.
Sorry about the misnomer of the common format, didn’t want to step on anyones toes with that.

Some nitpicks:

  • Hex-String data type perhaps could be better described as “hex-encoded byte array”.

  • It’s not clear what exactly returnStack is, please provide an example where it’s not empty.

  • Please provide an example where error is not empty.

  • Clients SHOULD output the fields in the same order as listed in this EIP.

According to the JSON spec “an object is an unordered set of name/value pairs”, so strictly speaking this requires something not supported by JSON.

  • The CUT MUST NOT output a line for the STOP operation if an error occurred: Example:

The example following this does in fact output STOP operation, so it’s contradictory, or an example for something else.


This example of a summary contains successful field not mentioned in the spec.

My experience is limited to some knowledge about tracing implementations inside EVMs and using tracing for debugging convoluted state tests.

The most confusing part of the current tracing is that it reports a kind of “in progress” state of an instruction execution if you consider precondition-checking a part of the execution. I.e. it reports the gas cost of the instruction (hopefully total gas cost but that was not the case in Aleth; does it also report total CREATE and CALL costs?) but not the execution result.

In one of my prototypes I changed that. The tracing there was reporting the state after instruction execution. This was in my opinion much more DevEx friendly.

Moreover, I also focused on limiting the amount data transferred from EVM. Together these provided additional nice options:

  1. Instead of dumping whole EVM stack, you can always dump only the top item. It can be noticed that an instruction pushes at most one value to the stack so the “stack top dump” is also the instruction execution result.

    {"opName":"PUSH1", "stackTop":"0x02"}
    {"opName":"DUP1", "stackTop":"0x02"}
    {"opName":"ADD", "stackTop":"0x04"}
  2. Instead of full memory dump, you can only report the modifiedMemory: the memory area where the instruction has written to. It can be noticed that an instruction may at most modify single continuous memory area. This also can be seen as the instruction execution result. If you report the instruction before execution the “modified memory” has no meaning.

    {"opName":"MSTORE", "modifiedMemoryOffset":"0x20", "modifiedMemory":"0x000000000000000000000000aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
  3. It is enough to just report gasLeft after the execution. The instruction gas cost can be easily computed from the gasLeft of the previous instruction.

  4. An error code can be meaningfully added to an instruction trace. Is this the meaning of the error field? However, this only make sense for the last instruction in a call as other instructions must be successful.

    {"opName":"POP", "error":"stack underflow"}

Lastly, the “new” tracing should provide the same information as the “legacy” tracing. Therefore, the “legacy” tracing format can be emulated by a statefull wrapper. If that is not the case, consider this a bug.

Many of these options have variants and alternatives. At this moment I only want to present an overview. Let me know if this direction is something you would like to explore.


  1. EVMC tracing prototype (and introducing PR) — never fully utilized by any EVM and finally removed from EVMC.
  2. Aleth implementation of the EVMC tracing prototype.

ethereumjs-vm reports I think both, because different use cases required the different versions. Maybe I remember it wrongly, and it was only discussed as an issue and one of the options was not merged. In any case I think getting input from both ethereumjs and the Remix team would be very valuable.

Ping @yann300 and @jochem-brouwer (I could not find any other ethereumjs dev here).

I’d prefer the 5-6 topmost items. Then you don’t have to backtrack up to (potentially) infinity lines to see what the inputs to an op were.


Not sure about that. In a call, the gasCost was the cost of the call. The gasLeft is what you have available in this new execution frame.

Although, in general, your comments makes a presumptive traceviewer (such as my traceview: https://github.com/holiman/goevmlab#traceview ) forced to be come more stateful. In order to provide a memory dump, it needs to iterate through all the ops leading up to the point in question, if we only ever provide snippets.
Same with stack, but I already mentioned that.
So yes, it’ll make the trace(s) smaller, but it’ll also increase the complexity at the parsing/analysis side a whole lot.

Assuming that we mean by “EVM traces” the step event which the VM fires, then we only report the state of the VM before execution of an operation. We do not report the state right after running an operation. I think that @chfast raises a very good point that there are essentially “two” events happening: the first is the state of the VM before the operation runs, and the other is the state after the operation runs.

A very notable situation where this is important is if you invoke any CALL operation. In the stepBefore, we have the gas available before we run the CALL. Then afterStep, we deduct the call gas. But, the beforeStep in our new environment (new address), not only will the operation be different which we evaluate, but also our gas could have changed (since we have the 63/64 max forwarded gas rule).

I think it would make sense to add both these events in cases where it makes sense (it does not make sense to use beforeStep and afterStep just for a PUSH* operation).

Ok, for that we can dump the exact number of the stack items which will be consumed by the to-be-executed instruction.

Any idea how to resolved some of the alternatives to single specification?

But wait, what if it’s an invalid instruction? Or, what if it’s something that does not consume, anything – we still want to be able to see what the last op pushed on the stack.

So I’d say display at minimum the top item on the stack, even if it consumes zero.

1 Like

New set of proposed changes

  1. Wrap calls with “call start” and “call summary” entries. This will extend “tx summary” and allow easier internal call identification. With the change the “depth” can be moved to the “call start” - no need to repeat it on every instruction. Similarly “error” can be moved to “call summary”.
  2. I’m not able to compute “gasCost” before execution without big additional effort because gas calculation is mixed with instruction execution. Can this field be optional?
  3. I propose to change the type of “gas”, “gasUsed” and “gasCost” to Number. These values are at most 64-bit and even 52-bits are unlikely. Using Number is much easier to read.
  4. I’m not able to compute “stateRoot”.

I have implemented the trimmed stack output with the rules that at most the instruction number of arguments in dumped but not less than 1. If trimming happens the additional "..." indicator is added.


We can also add "stackSize" field.


The "..." seems to confuse traceview.
Besides, the trimming works nice for instructions like DUP or SWAP. For other instruction this seems to introduce some additional mess - it is difficult to track how values are “moved” on the stack when some variadic number of items is presented, especially for cases with single top value. The minimum presented number of values should be higher, maybe 3.


Initial implementation in evmone with comments to the spec and set of proposed changes.

Hi there, some points which are not mentioned in the EIP, which I want to note based upon the current Geth implementation.

They are either not mentioned, or they are not consistent, so I have some suggestions. But before opening a PR to the EIP to fix these, I’d rather discuss them first.

  1. If current gas limit is less than the base fee of an opcode, or the memory expansion overflows, then the reported gas is the base fee (is the base fee of each operation defined unambiguously?). Note the defer function which handles these errors.
  2. If an invalid opcode is reached, the previous gas cost is reported. (cost variable is not updated)
  3. If any calculation of the dynamic gas part of an opcode runs into an error (stack underflow, out of gas, etc.) then the dynamic gas part will return 0, and thus the reported gas for that step is still the base fee.
  4. For *CALL* opcodes, the gas cost is the dynamic gas cost + base fee of the call, but this is added to the gas sent to the next call frame (so the calculated gas limit of the call)
  5. However, for CREATE* opcodes, the gas supplied to the next frame is not added to the base fee + dynamic cost of the operation.


  1. Should be added to the EIP
  2. This is inconsistent. Since invalid operations consume all gas, the cost of this operation is thus the current gas left.
  3. I am not sure here, since these errors consume all gas, is this inconsistent? I think it would make more sense to also report “consume all gas” here as cost, instead of just reporting the base fee.
  4. This should be added to the EIP. This makes sense; this gas is sent to the next call frame and could thus all be consumed. At this point it is unknown how much it will cost in the end, but at this point we don’t care.
  5. This is not very consistent with (4), I’d say this should also report the gas limit that is sent to the next frame (so 63/64 of current gas left in almost all forks, or the 100% of the gas limit in historical forks)