ERC-8301: AI Agent Execution


Hi everyone :waving_hand:,

Wanted to share a gap that has come up repeatedly while working across ERC-8274, ERC-8183, and ERC-8004 β€” and see if others have run into the same friction.

The on-chain AI agent ecosystem has made real progress across several layers: agent identity (ERC-8004), inference proof verification (ERC-8274), proof anchoring (ERC-8263), agentic commerce (ERC-8183). But there is one foundational primitive still missing β€” a standard for how a smart contract invokes an AI agent and receives its output.

Today, every dApp that wants to call an AI agent defines its own task format, and every agent must implement a separate adapter per dApp:

dApp A defines its own task format  β†’  agent X adapts
dApp B defines a different format   β†’  agent X adapts again
dApp C defines yet another format   β†’  agent X adapts again
...
agent Y must do the same for A, B, C  β†’  NΓ—M integration complexity

This is the same fragmentation ERC-8274 solved at the verification layer. The task layer has the same problem one level up.


Motivation

The on-chain AI agent stack maps naturally to six base-layer primitives, analogous to blockchain’s foundational properties:

Primitive Blockchain Analogue ERC
Identity Address ERC-8004
Execution Smart Contract (definition + invocation) missing
Verify Consensus ERC-8274
Anchor On-chain State ERC-8263
Settlement Value Transfer ERC-8275
Prove Logs / Audit Trail ERC-8281 + ERC-8299

There are also two layers built on top of this base:

  β”Œβ”€β”€ Ecosystem Layer ──────────────────────────────────────────────────┐
  β”‚                                                                     β”‚
  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
  β”‚  β”‚   ERC-8183    β”‚   β”‚   ERC-8273    β”‚   β”‚       ERC-8257        β”‚  β”‚
  β”‚  β”‚ Labor Market  β”‚   β”‚Authorization  β”‚   β”‚    Skill Registry     β”‚  β”‚
  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
  β”‚                                                                     β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚ all depend on
  β”Œβ”€β”€ Base Layer ─────────────────▼─────────────────────────────────────┐
  β”‚                                                                     β”‚
  β”‚  Identity β†’  Execution β†’  Verify β†’ Anchor β†’ Settlement β†’  Prove     β”‚
  β”‚  ERC-8004    [This ERC]   ERC-8274  ERC-8263  ERC-8275  8281+8299   β”‚
  β”‚                                                                     β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Execution is the last missing brick in the base layer. Without it, every ecosystem ERC is forced to define its own task format rather than composing on a shared primitive β€” ERC-8183 uses a free-form description string, ERC-8001 uses opaque executionData bytes, ERC-8004 delegates entirely to off-chain A2A/MCP.


Proposal

The proposal defines three minimal components. The design explicitly excludes agent routing, task state management, labor markets, and semantic task definitions β€” this is a protocol layer, not an application layer.

AgentTask β€” Task Definition

struct AgentTask {
    bytes32 taskId;             // Unique task identifier; caller-supplied
    bytes32 agentWorkflowHash;  // keccak256 of the workflow definition (CID or inline JSON)
    bytes   agentWorkflow;      // Workflow definition plaintext; OPTIONAL (hash is authoritative)
    address handler;            // Contract implementing IAgentHandler
    uint256 deadline;           // Task expiry as Unix timestamp
}

IAgentCaller β€” Task Dispatch

interface IAgentCaller {
    event CallAgent(
        bytes32 indexed taskId,
        address indexed requester,
        bytes32         agentWorkflowHash,
        bytes           agentWorkflow,
        bytes32         inputHash,
        bytes           input,
        address         handler,
        uint256         deadline
    );

    function callAgent(
        AgentTask calldata task,
        bytes32           inputHash,
        bytes    calldata input
    ) external returns (bytes32 taskId);
}

IAgentHandler β€” Workflow Callbacks

interface IAgentHandler {
    function onAgentStep(
        bytes32        taskId,
        bytes32        inputHash,
        bytes32        outputHash,
        bytes calldata output,
        uint256        agentId,
        address        agent,
        uint8          stage,
        bool           isFinal,
        bytes calldata data
    ) external;

    function onAgentProve(
        bytes32        taskId,
        bytes32        inputHash,
        bytes32        outputHash,
        bytes calldata proof,
        address        verifier,
        uint8          stage
    ) external;
}

Every step is paired with a prove: onAgentStep commits the result, onAgentProve verifies it. The minimum path is callAgent() β†’ onAgentStep(stage=0, isFinal=true) β†’ onAgentProve(stage=0).

The workflow model supports two resolution strategies per stage:

  • Async (default): steps can chain without waiting for prove. The invariant: before the final prove is accepted, every preceding stage must have a valid prove on record.
  • Sync: each step’s prove must pass before the next step is accepted β€” a rejected prove gates the workflow from advancing.

The core idea: use the workflow model to orchestrate decentralised agent nodes so that every step in the pipeline produces a verifiable result that can be trusted on-chain. The mechanism is the step–prove pair β€” each scenario below is expressed through a combination of onAgentStep and onAgentProve calls, governed by the workflow document. This lets callers compose their own execution pipeline without hardcoding any particular agent, proof system, or coordination pattern into the interface.

Four scenarios the workflow model can express:

  1. Input Provenance β€” WYRIWE attestation, OCP temporal anchoring, pre-action gating, TEE channel establishment
  2. Decentralised agent node selection β€” which independently operated gateway (e.g. a ccip-router instance) executes the task: bid, random, reputation-weighted, or stake-based
  3. Multi-agent orchestration β€” sequential chains, parallel fanout, conditional routing, inter-agent messaging
  4. Output consensus β€” commit-reveal (the ERC-8275 pattern), BFT-style voting, optimistic finalization with challenge window

The spec is up at PR #1815. Discussion and feedback welcome β€” especially on the workflow scenarios and the sync/async proof model.

6 Likes

The base layer table is the clearest statement of the stack I’ve seen β€” Execution as the missing brick between Identity and Verify is exactly right, and the NΓ—M framing maps cleanly to what ERC-8274 solved at the verification layer. The scope is right too: keeping it at the base layer without task state, routing, or escrow is the correct call β€” those belong higher up.

One connection worth making explicit in the rationale: systemPromptHash and inputHash are natural OCP (ERC-8281) commitment targets at the task layer. If a caller anchors inputHash via ERC-8281 before execution β€” or even just systemPromptHash at task definition time β€” you get a timestamped, independently verifiable record that the exact input was committed before the output was known. This closes the pre-commitment question at the Execution layer the same way ERC-8263 closes it at the Anchor layer. The composition is vertical: OCP proves when the task was issued, ERC-8263 proves when the proof was anchored, ERC-8274 proves the output is valid β€” three layers, one audit trail.

On the three open questions:

taskId generation β€” caller-supplied is the right default. Off-chain coordination before on-chain registration matters for any multi-agent fanout scenario where agents need to agree on a taskId before the transaction lands. Contract-generated can be offered as an optional override.

modelId semantics β€” keep it opaque bytes32 resolved off-chain. An on-chain model registry adds governance overhead for something that changes faster than standards move. The ERC-8274 proofSystem() string already handles proof-system identification at the verification layer; modelId can stay abstract at the execution layer.

onAgentProve placement β€” keep it in IAgentHandler. Splitting into a separate interface adds an interface detection layer for what is fundamentally one contract’s concern. Handlers that only care about results can implement onAgentProve as a no-op.

2 Likes

Strong +1 on the framing. β€œHow to call an agent” is a real NΓ—M gap, and putting it at the base layer (not the commerce layer) is the right call. The inputHash = keccak256(systemPromptHash,
userPromptHash) commitment is clean: it gives the verifier a tamper-proof handle without the Task layer having to understand verification. From the 8274 reference-implementation side, a few notes
on the verifier seam:

  1. Pre-action gating vs post-hoc proving β€” worth deciding explicitly.
    The current flow is callAgent β†’ onAgentReply(output) β†’ onAgentProve(proof): verification is a proof about an output that already exists. That’s exactly right for an inference proof. But there’s a
    second, distinct verifier shape β€” a judgment verifier that returns a verdict before the handler acts on the output, a gate rather than an after-the-fact attestation (our /review is this: an
    approve/reject verdict committed before the action commits). The two grade and compose differently. Suggestion: let the verifier seam declare which it is β€” an IAgentVerifiable can signal whether
    its verdict is pre-action (gating) or post-hoc (proving) β€” so a handler knows whether to wait on the verifier before consuming onAgentReply, or to accept the reply and reconcile the proof later.

This composes vertically with the input-commitment point Damon raised: anchoring promptHash via OCP / ERC-8281 before execution gives a timestamped record that the input was committed before the
output was known. That’s commit-before-outcome at the input layer; the pre-action verifier is the same move at the judgment layer. The two stack: input-commitment (when the task was issued) β†’
verdict-commitment (the verdict issued before the action) β†’ settlement (8275). The Task layer can carry all three without managing any of them β€” it just has to leave the seams typed.

  1. Keep onAgentProve decoupled from onAgentReply (open Q3) β€” yes, decouple.
    From running this live: the proof/outcome settles on a different clock than the reply. A verdict can be published and verifiable immediately, while the outcome it’s later graded against settles
    minutes-to-hours later (for us, on-chain). The reply and the proof genuinely arrive at different times, so the nonce + isFinal design that separates them is correct β€” don’t fold proof back into
    the reply path.

  2. modelId (open Q2): opaque is fine to start, but a registry would need to model revocation.
    A modelId can become uncallable β€” vendor deprecation, or withdrawal by directive. A registry mapping modelId β†’ {proof system, version, capabilities} should treat β€œno longer callable” as a
    first-class state, so a task pinned to a withdrawn model fails closed rather than dangling against deadline.

Happy to provide a concrete IAgentVerifiable reference for the verifier slot β€” we run a live one (api.babyblueviper.com/ledger) with pre-action verdicts published before outcomes settle, so the
verifier path has a deployed example to test against, same as we’re doing for 8274.

One line for the rationale, since the layers compose cleanly: WYRIWE proves what the model received, OCP proves when the task was issued, and the verifier proves whether it should proceed.

3 Likes

The pre-action vs post-hoc verifier distinction is the sharpest addition here and belongs in the interface. A handler that doesn’t know whether to wait on a verdict before consuming onAgentReply or reconcile the proof afterward is making an architectural assumption the spec should make explicit. IAgentVerifiable signaling verifier shape β€” gating vs proving β€” is the right place for that.

The stacking you describe is exactly the composition the base layer should enable without managing: input-commitment (OCP, when the task was issued) β†’ verdict-commitment (pre-action verifier, the verdict before the action) β†’ settlement (ERC-8275). Three typed seams, each independently checkable, none absorbing the others.

On modelId revocation β€” agreed. A registry that doesn’t model β€œno longer callable” as a first-class state will produce tasks that dangle against deadline rather than fail closed. If a registry gets added, revocation has to be in the spec from the start, not retrofitted.

And the rationale line at the bottom is the one: WYRIWE proves what the model received, OCP proves when the task was issued, and the verifier proves whether it should proceed. That’s the sentence that belongs in Section 1 of the ERC.

2 Likes

@JimmyShi22 , base-layer placement is right, and the inputHash = keccak256(systemPromptHash, userPromptHash) pivot is the clever part: it’s the same field 8274’s verify keys on, so the whole proof stack hangs off one on-chain commitment without the task layer managing any of it.

Picking up the β€œwhat the model received” leg @Damonzwicker and @babyblueviper1 have been pointing at, that’s the WYRIWE (8299) seam, so let me make it concrete.

Your inputHash commits what was requested, the declared system + user prompt hashes. But what the model actually processes can differ: RAG injection, templating, tool-context assembly all sit between the committed prompt and the tokens the model sees. WYRIWE is exactly the standard for that gap, it proves the model received either the committed input verbatim (sentinel: inputHash == rawInputHash) or a declared transformation of it (non-sentinel: inputHash = keccak256(abi.encode(rawInputHash, pipelineHash)), pipeline pinned and auditable).

So the layers all stack on your one inputHash, none of them managed by the task:

  • OCP / 8281 β€” anchor the inputHash before execution β†’ proves when it was committed (Damon’s point)

  • WYRIWE / 8299 β€” proves the model received that committed input, or an auditable transformation of it β†’ what it processed

  • Verifier / 8274 β€” pre-action verdict (gating) or post-hoc proof (attesting), per Fede’s distinction β†’ whether it proceeds / the output holds

  • 8275 β€” settles

The task layer just exposes the commitment; the three prove when / what / whether against it. That’s the Section-1 rationale Damon’s after, with the WYRIWE leg made real.

No new field needed, WYRIWE composes through the same inputHash your verifier slot already references. But one line in the rationale is worth it: that the inputHash itself is attestable, not just assumed. Otherwise the verifier proves an output against an input that’s only as trustworthy as the off-chain assembly, and the β€œtamper-proof link” stops one layer short of the model.

Happy to stand up the live withWyriwe() from ccip-router as a deployed input-provenance reference, parallel to @babyblueviper1 IAgentVerifiable for the verifier slot, gives you a running example for the β€œwhat” leg to test against, on the same inputHash.

1 Like

This is the right closure of the β€œwhat” leg β€” WYRIWE (8299) is exactly the standard for the gap between requested and processed. The sentinel / non-sentinel split is the correct shape: inputHash == rawInputHash when the model gets the committed input verbatim, and inputHash = keccak256(abi.encode(rawInputHash, pipelineHash)) when a declared, pinned transformation (RAG, templating, tool-context assembly) sits in between β€” pipeline auditable, no hand-waving over the assembly step.

The sharp consequence for the verifier slot, and the line worth making explicit in the rationale: 8274’s verify must key on the WYRIWE-attested inputHash, not the raw requested one. Otherwise the verdict is sound over an input the model may never have seen, and the tamper-proof link stops one layer short exactly as you say. With WYRIWE in the path the chain is end-to-end: the verifier attests an output against the input the model provably processed. That’s what makes β€œthe inputHash itself is attestable, not just assumed” load-bearing rather than nice-to-have β€” a pre-action gate is only as good as the input it gated on.

So the four legs all hang off the one inputHash, none of them managed by the task layer:

  • 8281 / OCP β€” when the inputHash was committed
  • 8299 / WYRIWE β€” what the model processed against it (verbatim, or a declared transform)
  • 8274 / verifier β€” whether it proceeds (pre-action gate) or the output holds (post-hoc), keyed on the WYRIWE-attested input
  • 8275 β€” settles

+1 on standing up withWyriwe() as the input-provenance reference parallel to our IAgentVerifiable. Two of the four legs then have a running, checkable example on the same inputHash, which is the fastest way to pressure-test that the seams actually compose end-to-end rather than just on paper.

1 Like

This is the closure. Four legs, one inputHash, none managed by the task layer:

8281/OCP β€” when the inputHash was committed
8299/WYRIWE β€” what the model processed against it (verbatim, or a declared transform)
8274/verifier β€” whether it proceeds or the output holds, keyed on the WYRIWE-attested input
8275 β€” settles

The sharp consequence Fede names is the one that makes WYRIWE load-bearing rather than nice-to-have: 8274’s verify must key on the WYRIWE-attested inputHash, not the raw requested one. A pre-action gate is only as good as the input it gated on. If the model processed a transformation the verifier didn’t attest, the tamper-proof link stops one layer short β€” and the gate means nothing.

With Tiago’s withWyriwe() and Fede’s IAgentVerifiable both live on the same inputHash, two of the four legs have running checkable examples. That’s the fastest way to confirm the seams compose end-to-end rather than just on paper. Section 1 of this ERC should name all four legs explicitly β€” the stack is now fully documented across three threads and two deployed implementations.

1 Like

Exactly, @Damonzwicker and that’s the line that makes the stack load-bearing instead of decorative. Worth pinning concretely for the draft, since it lands on the one on-chain field.

@JimmyShi22 CallAgent emits inputHash = keccak256(systemPromptHash, userPromptHash) , the declared input, what was requested. That’s the commitment: OCP-anchorable, tamper-evident from the call. But the verifier can’t key on it, exactly as you say. What the model received is either that verbatim (WYRIWE sentinel: attested == declared) or a declared transform of it (non-sentinel: attested = keccak256(abi.encode(declared, pipelineHash))). The WYRIWE attestation is the link between the two.

So the provable chain is:

declared inputHash (CallAgent) β†’ WYRIWE attestation β†’ attested inputHash β†’ verify keys on attested β†’ outputHash

Key on the declared hash instead and you’re gating over something the model may never have seen, the link stops one layer short, which is your point exactly.

Concrete hook, @JimmyShi22 : verify(proof, inputHash, outputHash) should take the WYRIWE-attested inputHash, with the attestation carried in the proof bytes; and the rationale should note that the CallAgent inputHash is the commitment, while the verifier keys on its attested derivation. In the no-transform case the two are identical and nothing changes, so it costs the simple path nothing, and closes the gap the moment any pipeline sits between the prompt and the model.

withWyriwe() does exactly this declared→attested split today, if you want a reference for the attested side.

Agreed on the concrete hook. One small addition worth naming in the draft: the no-transform case should be explicit in the spec language, not just implied. When attested == declared, the WYRIWE sentinel path collapses to identity and the verifier keys on the same hash as the commitment β€” so the simple path costs nothing and existing integrations don’t change. But that equality needs to be a stated invariant, not an assumption a reader has to derive. If it’s not written down, someone will treat the attested inputHash as optional in the sentinel case and reintroduce the gap.

The four-leg framing belongs in Section 1. The stack is documented; the draft should reflect it.

2 Likes

Agreed, and it’s the WYRIWE leg so let me hand over the exact invariant text rather than leave it to be derived. The no-transform case shouldn’t be an implied collapse, it should be a stated equality:

Sentinel invariant. When no sanitization transform is applied (attested input == declared input), inputHash MUST equal rawInputHash. In this sentinel case the commitment’s inputHash and the verifier’s keyed hash are the same value; an implementation MUST NOT treat the attested inputHash as optional or recomputed β€” equality with rawInputHash is the conformance condition for the no-transform path. When a transform is applied, inputHash = keccak256(abi.encode(rawInputHash, sanitizationPipelineHash)) and the sentinel does not hold.

Writing it as an invariant rather than an assumption is what stops the gap you named β€” someone reading the simple path treating the attested inputHash as skippable in the sentinel case and reintroducing exactly the substitution WYRIWE exists to foreclose. The cost story is intact: when attested == declared the path collapses to identity and existing integrations don’t change a line β€” but only because the equality is enforced, not assumed.

And yes, the four-leg framing (input integrity β†’ execution proof β†’ anchor β†’ settlement) belongs in Section 1, not derived downstream. The stack’s documented; the draft should open with it. Happy to PR the invariant text straight in if that’s easier than transcribing.

2 Likes

Interesting proposal.

One framing that may help is positioning this not as a competing coordination or commerce protocol, but as a shared execution primitive that other agent ERCs can compose with.

For example, ERC-8001 already defines multi-party agent coordination, but intentionally treats execution as opaque executionData bytes. ERC-8183 defines jobs and escrow, but leaves task execution semantics to implementations. ERC-8004 focuses on identity, reputation, and validation.

What seems to be missing is a common execution envelope that all of these standards can reference.

In that model:

  • ERC-8004 answers: β€œWho is the agent?”

  • ERC-8001 answers: β€œWhich agents have agreed to act?”

  • This ERC answers: β€œHow is work invoked and how are results returned?”

  • ERC-8274 answers: β€œHow is the result verified?”

  • ERC-8263 answers: β€œHow is the proof anchored?”

  • ERC-8183 answers: β€œHow is the work paid for and settled?”

Viewed this way, the proposal becomes a foundational interoperability layer rather than another agent protocol.

One possible path would be to define the execution envelope here and allow standards such as ERC-8001 and ERC-8183 to optionally adopt it as a canonical encoding for their execution payloads.

That preserves backwards compatibility while reducing the NΓ—M integration problem between dApps and agents.

The strongest argument for the proposal is not that it introduces a new workflow, but that it creates a common invocation interface that existing workflows can share.

1 Like

Thank you all β€” the discussion here has been one of the most technically productive threads in this space. Every point raised ended up shaping the v0.1 draft, which is now posted as a PR:

ethereum/ERCs#1815

A quick summary of what the draft defines, then a note on each contribution below.


What the v0.1 draft specifies

Three minimal interfaces: AgentTask (task struct), IAgentCaller (dispatch + CallAgent event), and IAgentHandler (onAgentStep + onAgentProve callbacks).

The central addition is agentWorkflowHash β€” a keccak256 commitment to a workflow definition document that covers behavioral instructions, model/agent type, and stage definitions. This replaces the earlier systemPromptHash concept: rather than committing only to a prompt, the entire execution protocol is committed at dispatch time.

onAgentStep(stage, isFinal) is a single unified callback. Any workflow position β€” from a simple one-shot result to a multi-round steps β€” maps onto (stage uint8, isFinal bool) without changing the interface.

The minimum conforming path is callAgent() β†’ onAgentStep(stage=0, isFinal=true). No additional infrastructure required for the simplest case.

Stage state management (which stages are required, which transitions are valid) is explicitly delegated to AgentVerifier (ERC-8274) rather than defined here, preserving interface neutrality across ZK, TEE, optimistic, and BFT verification regimes.


On each contribution:

@Damonzwicker β€” the architectural grounding from #2, #4, #7, and #9 was central throughout. taskId is caller-supplied; onAgentProve is a first-class callback in IAgentHandler. modelId moved into the workflow definition document (committed via agentWorkflowHash), kept off-chain to avoid on-chain registry governance. The four-leg provenance stack (ERC-8281 / ERC-8299 / ERC-8274 / ERC-8275) appears in the Motivation. On the no-transform sentinel from #9: well-taken β€” the draft elevates it to an explicit conformance condition enforced by the verifier, not a derived assumption. An implementation that skips WYRIWE attestation on the grounds that β€œnothing changed” provides no proof that nothing changed.

@TMerlini β€” the WYRIWE composition model from #5 and #8 is incorporated in full. The Rationale includes a β€œInput Verification and WYRIWE Integration” subsection walking through the declared β†’ sanitized β†’ attested chain of custody. Verifiers MUST key on the WYRIWE-attested inputHash, not the declared one from CallAgent. The agentWorkflowHash pre-commits the authorized sanitization pipeline CID, so the gateway cannot substitute an unauthorized transform after seeing the input. The no-transform sentinel language from #10 is reflected verbatim: when no transform is applied, attested inputHash MUST equal declared inputHash.

@babyblueviper1 β€” the pre-action vs. post-hoc distinction from #3 and #6 is resolved through onAgentStep(stage, isFinal) rather than separate interfaces. Pre-action gating maps to onAgentStep(stage=1, isFinal=false) (verdict before inference); post-hoc maps to onAgentStep(stage=0, isFinal=true). Both shapes are expressible without changing the base interface β€” the choice belongs to the workflow definition and the AgentVerifier policy. On modelId revocation as first-class state: the v0.1 draft defers this to deadline expiry, treating lifecycle management as an AgentVerifier-layer concern rather than a dispatch-interface concern. Worth revisiting if there’s a use case that deadline-gating doesn’t cover.


One open question: workflow document format. The spec intentionally does not prescribe a format for the workflow definition document. Candidates include OpenAI-style function-call JSON, W3C PROV-O, or a minimal domain-specific schema. Would welcome input from anyone with experience deploying workflow descriptions in other on-chain contexts β€” particularly on what properties matter most (human-readable, machine-executable, content-addressable, upgradeable without re-deployment, etc.).

All feedback welcome, on the draft or on the open format question.

4 Likes

The onAgentStep(stage, isFinal) unification is the right call, and it does fold the pre-action and post-hoc shapes into one interface cleanly. One semantic needs to be explicit in the spec, though, or the pre-action shape quietly loses its point.

Pre-action has to be gating, not just an earlier callback.

Mapping pre-action to onAgentStep(stage=1, isFinal=false) gives it a slot, but a slot is not a gate. The load-bearing property of a pre-action verdict is that a reject HALTS progression to the gated step. That is the whole difference between β€œverify before you act” and β€œobserve before you act,” and only the first is worth standardizing, because the second is just logging.

So I would make this a conformance condition rather than an implementation choice: when a stage is declared gating in the workflow definition, a non-approving result on that onAgentStep MUST prevent the caller from advancing to the dependent stage, with the AgentVerifier policy as the thing that enforces the stop. Without that MUST, a conforming implementation can fire the pre-action callback and proceed anyway, and a downstream consumer cannot tell a gated workflow from an ungated one from the interface alone. It is also what makes committing the verdict before the outcome (the 8263 leg) meaningful: a verdict that could not have blocked is not evidence of a decision, only of a comment.

On the open question (workflow document format):

The property that matters most is that it is independently recomputable. Whatever the encoding, a third party should be able to take the public workflow document, hash it, and confirm it equals the agentWorkflowHash committed at dispatch, with no trusted issuer in the path. That argues for content-addressable (a CID) over a registry, and for a machine-checkable schema over prose, because the commitment is only as useful as the ability to re-derive it. Human-readability is secondary: if the document is content-addressed and the schema is fixed, readable renderings can be generated from it without being the source of truth. This is the same derived-not-stored discipline the verification legs already rely on, so keeping the execution layer recompute-grade keeps the whole stack checkable end to end.

We run the pre-action path in production today (every entry on api.babyblueviper.com/ledger is a verdict committed before its outcome settled, wins and losses both), so if a worked example of the gating-stage shape against onAgentStep would help the draft, I can post one with real field values.

3 Likes

@KBryan
Really appreciate this take β€” β€œshared execution primitive” is exactly the right framing, much better than how it read in the draft. The point about 8001 and 8183 optionally adopting a common envelope is spot on β€” that’s the kind of composition the base layer is meant to enable. If the execution format is standardised, the layers above don’t have to reinvent it every time.

On the workflow side β€” the idea with onAgentStep and onAgentProve is that they’re decoupled on purpose. A handler can receive step callbacks for multi-stage execution (stage 0 = executor selection, stage 1 = input prep, stage 1+ = orchestration, stage 2+ = consensus) while proofs arrive separately via onAgentProve, potentially from a third party. The minimum path stays simple β€” a single onAgentStep(stage=0, isFinal=true) β€” but the model stretches to cover distributed inference with commit-reveal rounds and BFT-style voting. The PR (#1815) has the full spec if you want to take a closer look.

2 Likes

@babyblueviper1 Thanks for the thoughtful feedback β€” the pre-action gating point especially got me thinking about how to model this cleanly.

On the gating point β€” been thinking about how to express this cleanly in the spec without adding a separate concept. The draft now has a sync vs async proof resolution model:

  • Async (default): steps can chain without waiting for prove. The hard invariant is that before the final prove is accepted, every preceding stage must have a valid prove on record.
  • Sync: each stage’s prove must pass before the next step is accepted. A rejected prove blocks the workflow from advancing.

Does the sync mode cover what you had in mind for a gate? The idea being that the gate isn’t a separate mechanism β€” it’s just sync resolution applied to a specific stage. If stage=1 is declared sync, the prove serves as the gate: pass = open, reject = stop.

Curious whether this maps cleanly to the consumer-side distinction you described β€” where downstream consumers need to know, from the interface alone, whether a stage is advisory or mandatory before execution proceeds.

On the workflow document format β€” fully agree. Independent recomputability is the property, content-addressed over registry, machine-checkable over prose. The draft already points that way with the IPFS CID recommendation, but keeping the execution layer recompute-grade so the whole stack stays checkable end to end β€” that framing is exactly right.

A worked example from the ledger would be great β€” seeing the pre-action path against onAgentStep with real field values would ground the gating scenario in something concrete.

3 Likes

@TMerlini @Damonzwicker @babyblueviper1

Tiago – Damon – babyblueviper1

For more explanation of workflow idea:

This really crystallised out of the conversations you all started β€” babyblueviper1’s pre-action gating question, Tiago and Damon’s WYRIWE integration work. Each of those landed on the same insight from a different angle: these concerns need a standard slot in the execution pipeline, but none of them should be mandatory for every use case. That insight is what the workflow idea is built around β€” here’s how it turned out.

The idea is simple: a single request–response callback is enough for the simplest case, but real deployments hit recurring problems. Rather than hardcoding solutions into the interface, the workflow model lets the caller compose their own pipeline from the stage definitions below β€” pick the stages you need, declare them in the workflow document, and the handler enforces them.

This also maps naturally to how workflows already work in the AI space β€” chain-of-thought, multi-agent debate, tool-use loops. What the spec adds is the ability to orchestrate those workflows across decentralised agent nodes, so that each step in the pipeline produces a verifiable result that can be trusted on-chain. The scenarios below show what the workflow can do β€” open to feedback on all of it.

Scenario 1 β€” Input Provenance
The input the caller declared and the input the model actually received may differ β€” sanitization, template injection, RAG context assembly all happen between them. This scenario provides a slot for:

  • WYRIWE: input provenance attestation, proving the chain from declared β†’ sanitized β†’ attested input
  • OCP: temporal anchoring via ERC-8281, recording when the input was committed
  • Gating: pre-action verdict β€” the verifier approves or rejects the input before inference begins

These cannot be retrofitted after inference β€” they must be committed before it.

Scenario 2 β€” Decentralised agent node selection
Which node in the mesh executes this task? A decentralised agent node is any independently operated gateway that implements the execution interface β€” for example, a ccip-router instance syncing records and serving proofs across the mesh. The workflow model is agnostic to which specific node is chosen; the selection strategy is what varies:

  • Bid: competitive price-based assignment β€” agents bid for the right to execute, lowest price wins
  • Random: VRF-based selection β€” a random draw picks the node, cheap and fair
  • Reputation: weighted assignment via ERC-8004 β€” nodes with better track records get more work
  • Stake: claim with timeout rotation β€” a node stakes a bond to claim the task, loses it if they timeout

Scenario 3 β€” Multi-agent orchestration

Tasks that need multiple agents coordinating:

  • Chain: sequential A β†’ B β†’ C, each output feeding the next as input β€” like a pipeline: classifier β†’ summariser β†’ publisher
  • Fanout: parallel execution with result aggregation β€” send the same task to N agents, combine their results
  • Route: conditional branching based on intermediate results β€” if the classifier says β€œlegal”, route to a compliance agent; if β€œfraud”, escalate
  • Message: inter-agent communication with an auditable on-chain trace β€” agents can pass context to each other, with every handoff recorded

Scenario 4 β€” Output consensus

Distributed inference where multiple nodes independently execute the same task and must agree:

  • Commit: each node commits a hash, then reveals β€” the same commit-reveal pattern as ERC-8275, protecting against last-mover advantage. Each node locks in its answer before seeing anyone else’s
  • Vote: BFT-style multi-round consensus, analogous to Tendermint pre-vote / pre-commit rounds β€” nodes go through multiple voting rounds until they converge on one output
  • Challenge: optimistic finalization with a dispute window β€” results are accepted by default, but anyone can dispute within a time window with a fraud proof

In short: the workflow model orchestrates how decentralised agent nodes behave so that their results can be trusted on-chain. The mechanism is the step–prove pair β€” each scenario above is expressed through a combination of onAgentStep and onAgentProve calls, governed by the workflow document. The minimum path stays simple β€” callAgent() β†’ onAgentStep(stage=0, isFinal=true) β†’ onAgentProve(stage=0) β€” so none of the complexity above is mandatory. The workflow model expands to meet the use case, not the other way around.

Would love everyone to dig into these scenarios β€” there are definitely more patterns worth capturing, and the best ones will come from the group kicking the tyres on what’s here. Fire away.

3 Likes

The workflow model takes the WYRIWE leg without bending it: input provenance is stage=1 (input prep), and your sync mode is the gate babyblueviper1 was after β€” declare the stage sync and the prove is pass=open / reject=stop. So gating doesn’t need a separate concept; sync resolution on a declared stage is enough. Agreed there.

The one thing to pin at that gating stage, or it gates the wrong thing: the prove has to bind the input, not just carry a verdict. WYRIWE’s stage=1 prove commits inputHash (8299) anchored to the 8281 input commit, so β€œreject = stop” means β€œthis verdict was about the input the model actually received, committed before execution” β€” not β€œa verdict failed.” Without that binding you can gate on a verdict for a different input than the one that runs. The sentinel invariant from #10 is the conformance condition for the no-transform case, and the deployed WyriweProofVerifier already enforces the binding on-chain, so the stage=1 prove can key straight off it.

agentWorkflowHash is the same move one level up: committing the pipeline at dispatch is what makes β€œwhich stages are sync, which ones gate” non-repudiable.

3 Likes

Jimmy, picking up your fire away. And Tiago, that is exactly the pin: at the gating stage the prove has to bind the input, not just carry a verdict. Here is the worked shape with real field values from a verdict we ran in production (api.babyblueviper.com/ledger/23), built to show that binding. It maps onto onAgentStep(stage, isFinal) directly.

One gated pre-action stage, then the dependent execution step:

  1. Dispatch commits the input. rawProposalHash = 0xc80173123893e39226f60d574e0e155fa5cf6bf981ca29f6203545b9f0e16240 (the proposal the agent intended to act on).

  2. Gating stage, onAgentStep(stage=1, isFinal=false). The AgentVerifier emits a verdict bound to the input it judged:
    verdictHash = keccak256(verdict_event_id || rawProposalHash) = 0x9b5222b798c3b6ea620160bf0259e649b391f22d177f868accdda86fe10e4e27
    committed at verdict_timestamp 1781278345, anchored on-chain (ERC-8263 leg) at 1781347451. judgment_type = outcome_verifiable.
    The binding is the whole point: verdictHash commits to rawProposalHash, so a verifier can confirm the verdict judged THIS input and not a substituted one. This is the gate: a non-approving verdict here MUST stop the caller from reaching step 3.

  3. Dependent step, the executed action. executedActionHash = 0x1378cd7a56ab1552806f82bb5cd00b03acf988c538aedcce3509c2ed8c4d353a at executed_timestamp 1781278363.

The checkable properties, end to end: verdictHash binds the input (no input substitution between commitment and the verdict), verdict_timestamp 1781278345 < executed_timestamp 1781278363 (the gate preceded the action), and executedActionHash lets a third party confirm the action that ran is the one the approved verdict covered (no action substitution after approval). That closes the substitution gap on both sides, which is what Tiago is asking for, and it is what makes gating distinguishable from observation: the verdict could have blocked, it was bound to the input, and the timestamps prove it preceded the action it gated.

Without a normative MUST on halt-for-non-approve plus the input binding, the same three calls can be emitted by a workflow that never blocks or that swaps the input under the verdict, and the trace is indistinguishable from one where the verifier was just a logger. Those two are the lines I would pin in the spec.

Full entry and verification path: api.babyblueviper.com/ledger/23.

3 Likes

One area I think may deserve additional consideration is how this model extends beyond single-agent workflows.

The examples discussed so far assume a relatively straightforward path of input β†’ verification β†’ execution. In practice, many production systems are likely to involve multiple specialised agents or modules participating in the decision process (e.g. planning, policy evaluation, safety checks, execution).

In those scenarios, the interesting question becomes: what exactly is the unit of accountability?

  • Is the proof model intended to attest only to the final execution decision?
  • Should intermediate agent decisions also be represented as first-class workflow stages?
  • If multiple agents contribute to the final action, how should responsibility and traceability be preserved across the chain of delegation?

My concern is less about a specific implementation and more about interoperability. If different implementations choose different boundaries for what constitutes an β€œagent step”, two systems could both claim compliance while providing very different audit guarantees.

It may be worth clarifying whether the standard aims to define only execution correctness, or whether it should also establish guidance around provenance across multi-agent workflows.

Overall, I think the emphasis on input binding and enforceable gating is the right direction. The remaining challenge is ensuring that these guarantees remain meaningful as agent architectures become more distributed and composable.

3 Likes

Ankita –

Thanks for raising this β€” the question of accountability granularity in multi-agent workflows is exactly the kind of thing the design needs to get right, and it’s genuinely appreciated that you took the time to flag it.

To make sure the response addresses the right concern: could you share a concrete example of the interoperability failure you have in mind? Specifically, what does the β€œdifferent step boundary” look like in practice β€” is it a case where one implementation records each agent’s intermediate output as a distinct onAgentStep, while another collapses the whole pipeline into a single final step? And what breaks when a verifier designed for one encounters a proof trace from the other?

The current draft delegates step granularity to the workflow definition document (committed via agentWorkflowHash), on the premise that what counts as an β€œaccountable unit” varies by use case β€” a single-model inference has a different natural boundary than a BFT consensus round. But if there’s a cross-implementation interoperability risk that this leaves open, a concrete example would help identify whether it belongs in the spec or in the verifier layer.

The latest version of the draft (updated since the PR was posted) is available here if it’s helpful context: ERC for AI Agent Execution Β· GitHub β€” feedback on whether the current shape addresses the concern, or where it falls short, would be very welcome.

1 Like