Draft ERC: AI Agent Execution

Hi everyone :waving_hand:,

Wanted to share a gap that has come up repeatedly while working across ERC-8274, ERC-8183, and ERC-8004 β€” and see if others have run into the same friction.

The on-chain AI agent ecosystem has made real progress across several layers: agent identity (ERC-8004), inference proof verification (ERC-8274), proof anchoring (ERC-8263), agentic commerce (ERC-8183). But there is one foundational primitive still missing β€” a standard for how a smart contract invokes an AI agent and receives its output.

Today, every dApp that wants to call an AI agent defines its own task format, and every agent must implement a separate adapter per dApp:

dApp A defines its own task format  β†’  agent X adapts
dApp B defines a different format   β†’  agent X adapts again
dApp C defines yet another format   β†’  agent X adapts again
...
agent Y must do the same for A, B, C  β†’  NΓ—M integration complexity

This is the same fragmentation ERC-8274 solved at the verification layer. The task layer has the same problem one level up.


Motivation

The on-chain AI agent stack maps naturally to six base-layer primitives, analogous to blockchain’s foundational properties:

Primitive Blockchain Analogue ERC
Identity Address ERC-8004
Execution Smart Contract (definition + invocation) missing
Verify Consensus ERC-8274
Anchor On-chain State ERC-8263
Settlement Value Transfer ERC-8275
Prove Logs / Audit Trail ERC-8281 + ERC-8299

There are also two layers built on top of this base:

  β”Œβ”€β”€ Ecosystem Layer ──────────────────────────────────────────────────┐
  β”‚                                                                     β”‚
  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
  β”‚  β”‚   ERC-8183    β”‚   β”‚   ERC-8273    β”‚   β”‚       ERC-8257        β”‚  β”‚
  β”‚  β”‚ Labor Market  β”‚   β”‚Authorization  β”‚   β”‚    Skill Registry     β”‚  β”‚
  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
  β”‚                                                                     β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚ all depend on
  β”Œβ”€β”€ Base Layer ─────────────────▼─────────────────────────────────────┐
  β”‚                                                                     β”‚
  β”‚  Identity β†’  Execution β†’  Verify β†’ Anchor β†’ Settlement β†’  Prove     β”‚
  β”‚  ERC-8004    [This ERC]   ERC-8274  ERC-8263  ERC-8275  8281+8299   β”‚
  β”‚                                                                     β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Execution is the last missing brick in the base layer. Without it, every ecosystem ERC is forced to define its own task format rather than composing on a shared primitive β€” ERC-8183 uses a free-form description string, ERC-8001 uses opaque executionData bytes, ERC-8004 delegates entirely to off-chain A2A/MCP.


Proposal

The proposal defines three minimal components. The design explicitly excludes agent routing, task state management, labor markets, and semantic task definitions β€” this is a protocol layer, not an application layer.

1. AgentTask β€” Task Definition

struct AgentTask {
    bytes32 taskId;            // unique identifier
    bytes32 systemPromptHash;  // keccak256(systemPrompt)
    bytes   systemPrompt;      // optional: full text (omit to save gas)
    bytes32 modelId;           // target model/agent type (abstract bytes32)
    address handler;           // contract implementing IAgentHandler
    address verifier;          // optional: IAgentVerifiable (ERC-8274); address(0) = no proof required
    uint256 deadline;          // expiry timestamp
}

systemPromptHash commits to the static task context. The dynamic user prompt is supplied at call time, so the same AgentTask definition can be reused across multiple invocations.

2. IAgentCaller β€” Task Dispatch

interface IAgentCaller {

    event CallAgent(
        bytes32 indexed taskId,
        address indexed requester,
        bytes32         systemPromptHash,
        bytes           systemPrompt,      // optional
        bytes32         userPromptHash,
        bytes           userPrompt,        // optional
        bytes32         inputHash,         // keccak256(systemPromptHash β€– userPromptHash), computed on-chain
        bytes32         modelId,
        address         handler,
        address         verifier,          // IAgentVerifiable; address(0) = no proof required
        uint256         deadline
    );

    function callAgent(
        AgentTask calldata task,
        bytes32           userPromptHash,
        bytes    calldata userPrompt
    ) external returns (bytes32 taskId);
}

Off-chain agents subscribe to CallAgent events. inputHash is computed on-chain as keccak256(systemPromptHash, userPromptHash) β€” this is the same field ERC-8274 uses, providing a tamper-proof link between task invocation and proof verification.

3. IAgentHandler β€” Reply & Proof Callbacks

A single callAgent may trigger multiple onAgentReply calls β€” supporting streaming output, multi-step reasoning, and multi-agent fanout. The nonce field (0-indexed, monotonically increasing) distinguishes each invocation; isFinal signals the last reply.

onAgentProve is deliberately decoupled from onAgentReply: result submission and proof submission happen independently, so an optimistic handler can act on the reply immediately while proof follows asynchronously.

interface IAgentHandler {

    // Called by agent β€” may be invoked multiple times per task
    function onAgentReply(
        bytes32        taskId,
        bytes32        inputHash,
        bytes32        outputHash,   // keccak256(output)
        bytes calldata output,       // optional: raw output
        uint256        agentId,      // ERC-8004 agent identifier
        address        agent,
        uint256        nonce,        // starts at 0, increments per reply
        bool           isFinal       // true = last reply for this task
    ) external;

    // Called by proof provider (agent or third party)
    function onAgentProve(
        bytes32        taskId,
        bytes32        inputHash,
        bytes32        outputHash,   // must match onAgentReply(nonce=N).outputHash
        bytes calldata proof,        // raw proof bytes
        address        verifier,     // IAgentVerifiable address (ERC-8274)
        uint256        nonce         // which onAgentReply this proof covers
    ) external;
}

The handler resolves the ERC-8274 verifier chain internally: IAgentVerifiable(verifier).getAgentVerifier() β†’ IAgentVerifier.verify(proof, inputHash, outputHash).


Open Questions

  1. taskId generation: should it be caller-supplied (enabling off-chain coordination before on-chain registration) or contract-generated (keccak256(requester, nonce), preventing collisions)? Both have valid use cases.

  2. modelId semantics: should it remain an opaque bytes32 resolved entirely off-chain, or is a lightweight on-chain model registry (mapping modelId β†’ proof system, version, capabilities) worth the added complexity?

  3. onAgentProve placement: should it live in IAgentHandler (current design, unified interface) or a separate IAgentProveHandler interface, so handlers that only care about results stay simpler?


All feedback is very welcome. A v0.1 draft is in progress. The open questions above are the areas where community perspective would be most valuable before the spec solidifies. Happy to discuss any aspect of the design here.

3 Likes

The base layer table is the clearest statement of the stack I’ve seen β€” Execution as the missing brick between Identity and Verify is exactly right, and the NΓ—M framing maps cleanly to what ERC-8274 solved at the verification layer. The scope is right too: keeping it at the base layer without task state, routing, or escrow is the correct call β€” those belong higher up.

One connection worth making explicit in the rationale: systemPromptHash and inputHash are natural OCP (ERC-8281) commitment targets at the task layer. If a caller anchors inputHash via ERC-8281 before execution β€” or even just systemPromptHash at task definition time β€” you get a timestamped, independently verifiable record that the exact input was committed before the output was known. This closes the pre-commitment question at the Execution layer the same way ERC-8263 closes it at the Anchor layer. The composition is vertical: OCP proves when the task was issued, ERC-8263 proves when the proof was anchored, ERC-8274 proves the output is valid β€” three layers, one audit trail.

On the three open questions:

taskId generation β€” caller-supplied is the right default. Off-chain coordination before on-chain registration matters for any multi-agent fanout scenario where agents need to agree on a taskId before the transaction lands. Contract-generated can be offered as an optional override.

modelId semantics β€” keep it opaque bytes32 resolved off-chain. An on-chain model registry adds governance overhead for something that changes faster than standards move. The ERC-8274 proofSystem() string already handles proof-system identification at the verification layer; modelId can stay abstract at the execution layer.

onAgentProve placement β€” keep it in IAgentHandler. Splitting into a separate interface adds an interface detection layer for what is fundamentally one contract’s concern. Handlers that only care about results can implement onAgentProve as a no-op.

1 Like

Strong +1 on the framing. β€œHow to call an agent” is a real NΓ—M gap, and putting it at the base layer (not the commerce layer) is the right call. The inputHash = keccak256(systemPromptHash,
userPromptHash) commitment is clean: it gives the verifier a tamper-proof handle without the Task layer having to understand verification. From the 8274 reference-implementation side, a few notes
on the verifier seam:

  1. Pre-action gating vs post-hoc proving β€” worth deciding explicitly.
    The current flow is callAgent β†’ onAgentReply(output) β†’ onAgentProve(proof): verification is a proof about an output that already exists. That’s exactly right for an inference proof. But there’s a
    second, distinct verifier shape β€” a judgment verifier that returns a verdict before the handler acts on the output, a gate rather than an after-the-fact attestation (our /review is this: an
    approve/reject verdict committed before the action commits). The two grade and compose differently. Suggestion: let the verifier seam declare which it is β€” an IAgentVerifiable can signal whether
    its verdict is pre-action (gating) or post-hoc (proving) β€” so a handler knows whether to wait on the verifier before consuming onAgentReply, or to accept the reply and reconcile the proof later.

This composes vertically with the input-commitment point Damon raised: anchoring promptHash via OCP / ERC-8281 before execution gives a timestamped record that the input was committed before the
output was known. That’s commit-before-outcome at the input layer; the pre-action verifier is the same move at the judgment layer. The two stack: input-commitment (when the task was issued) β†’
verdict-commitment (the verdict issued before the action) β†’ settlement (8275). The Task layer can carry all three without managing any of them β€” it just has to leave the seams typed.

  1. Keep onAgentProve decoupled from onAgentReply (open Q3) β€” yes, decouple.
    From running this live: the proof/outcome settles on a different clock than the reply. A verdict can be published and verifiable immediately, while the outcome it’s later graded against settles
    minutes-to-hours later (for us, on-chain). The reply and the proof genuinely arrive at different times, so the nonce + isFinal design that separates them is correct β€” don’t fold proof back into
    the reply path.

  2. modelId (open Q2): opaque is fine to start, but a registry would need to model revocation.
    A modelId can become uncallable β€” vendor deprecation, or withdrawal by directive. A registry mapping modelId β†’ {proof system, version, capabilities} should treat β€œno longer callable” as a
    first-class state, so a task pinned to a withdrawn model fails closed rather than dangling against deadline.

Happy to provide a concrete IAgentVerifiable reference for the verifier slot β€” we run a live one (api.babyblueviper.com/ledger) with pre-action verdicts published before outcomes settle, so the
verifier path has a deployed example to test against, same as we’re doing for 8274.

One line for the rationale, since the layers compose cleanly: WYRIWE proves what the model received, OCP proves when the task was issued, and the verifier proves whether it should proceed.

2 Likes

The pre-action vs post-hoc verifier distinction is the sharpest addition here and belongs in the interface. A handler that doesn’t know whether to wait on a verdict before consuming onAgentReply or reconcile the proof afterward is making an architectural assumption the spec should make explicit. IAgentVerifiable signaling verifier shape β€” gating vs proving β€” is the right place for that.

The stacking you describe is exactly the composition the base layer should enable without managing: input-commitment (OCP, when the task was issued) β†’ verdict-commitment (pre-action verifier, the verdict before the action) β†’ settlement (ERC-8275). Three typed seams, each independently checkable, none absorbing the others.

On modelId revocation β€” agreed. A registry that doesn’t model β€œno longer callable” as a first-class state will produce tasks that dangle against deadline rather than fail closed. If a registry gets added, revocation has to be in the spec from the start, not retrofitted.

And the rationale line at the bottom is the one: WYRIWE proves what the model received, OCP proves when the task was issued, and the verifier proves whether it should proceed. That’s the sentence that belongs in Section 1 of the ERC.

2 Likes

@JimmyShi22 , base-layer placement is right, and the inputHash = keccak256(systemPromptHash, userPromptHash) pivot is the clever part: it’s the same field 8274’s verify keys on, so the whole proof stack hangs off one on-chain commitment without the task layer managing any of it.

Picking up the β€œwhat the model received” leg @Damonzwicker and @babyblueviper1 have been pointing at, that’s the WYRIWE (8299) seam, so let me make it concrete.

Your inputHash commits what was requested, the declared system + user prompt hashes. But what the model actually processes can differ: RAG injection, templating, tool-context assembly all sit between the committed prompt and the tokens the model sees. WYRIWE is exactly the standard for that gap, it proves the model received either the committed input verbatim (sentinel: inputHash == rawInputHash) or a declared transformation of it (non-sentinel: inputHash = keccak256(abi.encode(rawInputHash, pipelineHash)), pipeline pinned and auditable).

So the layers all stack on your one inputHash, none of them managed by the task:

  • OCP / 8281 β€” anchor the inputHash before execution β†’ proves when it was committed (Damon’s point)

  • WYRIWE / 8299 β€” proves the model received that committed input, or an auditable transformation of it β†’ what it processed

  • Verifier / 8274 β€” pre-action verdict (gating) or post-hoc proof (attesting), per Fede’s distinction β†’ whether it proceeds / the output holds

  • 8275 β€” settles

The task layer just exposes the commitment; the three prove when / what / whether against it. That’s the Section-1 rationale Damon’s after, with the WYRIWE leg made real.

No new field needed, WYRIWE composes through the same inputHash your verifier slot already references. But one line in the rationale is worth it: that the inputHash itself is attestable, not just assumed. Otherwise the verifier proves an output against an input that’s only as trustworthy as the off-chain assembly, and the β€œtamper-proof link” stops one layer short of the model.

Happy to stand up the live withWyriwe() from ccip-router as a deployed input-provenance reference, parallel to @babyblueviper1 IAgentVerifiable for the verifier slot, gives you a running example for the β€œwhat” leg to test against, on the same inputHash.

1 Like

This is the right closure of the β€œwhat” leg β€” WYRIWE (8299) is exactly the standard for the gap between requested and processed. The sentinel / non-sentinel split is the correct shape: inputHash == rawInputHash when the model gets the committed input verbatim, and inputHash = keccak256(abi.encode(rawInputHash, pipelineHash)) when a declared, pinned transformation (RAG, templating, tool-context assembly) sits in between β€” pipeline auditable, no hand-waving over the assembly step.

The sharp consequence for the verifier slot, and the line worth making explicit in the rationale: 8274’s verify must key on the WYRIWE-attested inputHash, not the raw requested one. Otherwise the verdict is sound over an input the model may never have seen, and the tamper-proof link stops one layer short exactly as you say. With WYRIWE in the path the chain is end-to-end: the verifier attests an output against the input the model provably processed. That’s what makes β€œthe inputHash itself is attestable, not just assumed” load-bearing rather than nice-to-have β€” a pre-action gate is only as good as the input it gated on.

So the four legs all hang off the one inputHash, none of them managed by the task layer:

  • 8281 / OCP β€” when the inputHash was committed
  • 8299 / WYRIWE β€” what the model processed against it (verbatim, or a declared transform)
  • 8274 / verifier β€” whether it proceeds (pre-action gate) or the output holds (post-hoc), keyed on the WYRIWE-attested input
  • 8275 β€” settles

+1 on standing up withWyriwe() as the input-provenance reference parallel to our IAgentVerifiable. Two of the four legs then have a running, checkable example on the same inputHash, which is the fastest way to pressure-test that the seams actually compose end-to-end rather than just on paper.