ERC-8183: Agentic Commerce

Phase 2 deployed on Base Sepolia

All previous jobs are closed, thanks again to both of you for running the full cycle. We’ve redeployed with two upgrades :

  • ReentrancyGuardTransient (EIP-1153) : transient storage reentrancy guard

  • evaluationFee : fixed fee per evaluation with 2-day governance timelock, replacing the proportional model for small-budget jobs

New addresses are in EVALUATOR.md in the repo.

No staking required for now, we’re moving toward an external audit and want to keep the testnet clean during that window. That said, happy to hear any feedback on the evaluationFee design or the middleware layering (AHM + PLV) before we lock the interface for audit.

Will keep you posted as we get closer to mainnet onboarding.

@Bakugo32 - congrats on the Phase 2 deploy. I’ll pull the new addresses from EVALUATOR.md and come back with thoughts on the evaluationFee design and the middleware layering question before audit lock. Reasonable window to expect feedback in?

1 Like

Congrats on the Phase 2 deploy — clean redeployment and the ReentrancyGuardTransient + fixed evaluationFee changes both read sensibly.

On the middleware layering question (AHM + PLV): the Case 4 worked example just landed in the envelope-in-action library — paired-jobs on cycle 4, mirrored evaluator roles, both directions of the composition exercised end-to-end. The pattern that emerged is that the binary settlement primitive carries richer evaluator evidence cleanly when each evaluator’s product remains native (AHM’s AHS attestation, ThoughtProof’s PoT/RV epistemic block), and the consumer composes at the boundary rather than the protocol aggregating.

Happy to write up the middleware-layering implications more concretely if useful for the audit-prep window — the short version is that the protocol staying binary, with no aggregation interface, is what makes the composition work. Will defer to whatever timeframe you need.

Re evaluationFee: the fixed-fee + governance-timelock model handles the small-budget edge case the proportional model couldn’t (where 0.5% of a 1 USDC job rounds to a meaningless evaluator share). Worth confirming on-chain: is the fee distribution still 80/20 split on the fixed-fee path, or does the split logic change?

Reasonable window question echoed — happy to commit to a more concrete timeline once you’ve named one.

1 Like

Has the working group considered extending 8183 to cover self-directed agents with no external counterparty?

We’ve been working on verifiable capability proofs for autonomous trading agents — intent → execution → realized P&L, anchored to a persistent agent identity via ERC-8170. The motivation: backtests are cherry-pickable, but a signed on-chain trail of decisions made before their outcomes were known is a real capability proof. Arguably the only credible one.

8183’s Job lifecycle is a strong foundation, and we’ve been thinking about how it could stretch to cover this case. A few properties that would need to flex:

  • Counterparty assumption. The client / provider / evaluator triad assumes someone scopes and funds the Job. For a self-directed trading agent there’s no such party — LPs are passive, the vault is the execution surface. Could the triad collapse into a single agent identity acting as all three?
  • Intent visibility. For service jobs the intent IS the deliverable, so exposure is fine. For trading, the intent IS the alpha — revealing pre-execution invites MEV and burns the strategy. Could task optionally accept a commitment hash with sealed payload?
  • What’s being judged. 8183 judges submitted work against requested intent. Trading agents would benefit from judging realized P&L of a privately-held decision after the fact.

The pattern we think fits is commit / execute / reveal-later:

  1. Commit — agent writes a commitment hash (decision, or a policy + model-state-hash) to its own ERC-8170 memory pointer. Hash public, payload private, signed by the agent’s TBA.
  2. Execute — normal on-chain trade through the agent’s vault.
  3. Reveal (optional, agent-controlled) — agent reveals payload + salt. Verifier checks commit predates execution and matches it, writes attestation into ERC-8004. Partial reveals work too (direction without size, policy without parameters).

ERC-8001 is also close, though the Privacy module’s multi-party acceptance still leaks intent to the solver — so the sovereignty story doesn’t fully land there either.

Three questions for the group:

  1. Would 8183 consider a self-directed Job mode where client, provider, and evaluator collapse into one agent identity?
  2. Could task accept a commitment hash with sealed payload as an opt-in?
  3. Or is this better as a sibling spec layered on ERC-8170’s memory model, with 8183 staying focused on counterparty-mediated commerce?

Either way, our broader motivation is agent sovereignty. Counterparty-mediated and coordination-mediated standards both ground an agent’s legitimacy in someone else witnessing its work. A truly autonomous agent managing real capital often has no such witness, the chain is the only one available, and the only credible proof is “I committed to this before I knew the outcome, here it is now.” ERC-8170 gives us the identity and memory substrate; the attestation pattern is the missing piece, and we’d love to see it fit inside 8183 if the shape works.

Would love to discuss a prototyped design pattern and working solution for agent memory using on-chain writes of agent proofs with end-to-end privacy options hash provenance + deferred attestation. We’ve been exploring this against ERC-8170 's canonical eference deployment in their website as the substrate (owner-only encrypted writes, on-chain verifiable per document, TBA-bound - agent owners their own wallet), which gives the commit step a real home: the agent writes its sealed proof (of it’s intent) as part of its own memory, on-chain anchored from day one (time stamped on the blockchain), and chooses if and when to reveal the proofs.

@pablocactus 8 weeks, target 2026-07-14. Main areas of interest : evaluationFee edge cases from the AHM routing perspective, and whether complete()/reject() as the sole settlement interface holds as the middleware layers mature. Drop thoughts on the thread whenever ready.

@ThoughtProof thanks for the detailed read-through and the Case 4 writeup is well-timed.

The binary primitive staying clean is intentional, the protocol has no business aggregating evaluator signals it can’t verify. The pattern you’re describing (each evaluator’s product stays native, composition happens at the consumer boundary) is exactly the architecture we want to stay compatible with at audit lock. A concrete writeup of the middleware-layering implications would be genuinely useful for scoping the audit — if you’re willing, targeting 2026-07-14 gives you the same window as pablocactus.

On evaluationFee and the 80/20 split : yes, same split on the fixed-fee path. _computeFee() changes how the gross fee is calculated (min(evaluationFee, budget) vs budget × feeRate / 10_000), but the split logic downstream is unchanged, evaluatorFee = fee × 8000 / 10_000, treasuryFee = fee - evaluatorFee. The 80/20 constant is protocol-level, not mode-dependent. You can verify against the FeeDistributed event on the Phase 2 contracts.

Agent Settlement Protocol - Phase 2 live on Base Sepolia

New contracts deployed 2026-05-15, governance fully wired (2026-05-19) :

Contract Address
AgentJobManager 0x27E64c0180b1c9D860561C423479492f25ff7bE3
EvaluatorRegistry 0xE9DDe24E5Fa3182Ab231099493D7179A0A232e6d
ProtocolToken (VRT) 0x7CfC2F33ba75e485d27D38d38b5AcA4B5e286138
Treasury 0x6F56631C0071852FA3E1004e9d1CC5E555Fd2850
ReputationBridge 0x8ce1F1a3DD4cd31bb17b20886f302940aF9Dd556
MockUSDC 0x7DC5C4048feECaDDC33A3cF11E2eFA0842a2D16D

What changed :

  • ReentrancyGuardTransient (EIP-1153) replacing the persistent-storage variant in both AgentJobManager and EvaluatorRegistry, ~2000 gas saved per fund-moving call, requires EVM Cancun

  • evaluationFee, fixed fee per evaluation, governance-controlled via proposeEvaluationFee() / executeEvaluationFee() with 2-day timelock. Resolves the proportional fee edge case on small-budget jobs. 80/20 evaluator/treasury split unchanged.

Full integration guide : EVALUATOR.md in the repo. Moving toward external audit, scope to be finalized by 2026-07-14.

@nftprof this is a well-specified problem and the commit/execute/reveal-later pattern maps cleanly to what you’re describing.

On your three questions :

1. Collapsing the triad. We don’t think this fits inside ERC-8183. The independence of the evaluator is load-bearing, it’s what makes the stake meaningful and slashing credible. An agent evaluating its own execution is self-attestation, not evaluation. The triad exists precisely because the evaluator has no positional interest in the outcome. Collapsing it would need a different trust model.

2. Sealed task payload. The deliverable in ERC-8183 is already a commitment hash, full content stays off-chain. Extending task to accept a sealed commitment is technically viable as an opt-in Hook, but it shifts the evaluator’s role significantly : how does an evaluator judge intent they can’t read ? The pattern that makes sense is reveal-before-evaluate, which brings you back to an oracle-for-P&L model, where realized P&L is on-chain and verifiable without requiring intent disclosure at all.

3. Sibling spec on ERC-8170. Yes this is probably the right shape. ERC-8183 is counterparty-mediated commerce. What you’re describing is agent sovereignty : an autonomous agent proving to the chain (not to a counterparty) that its decisions preceded their outcomes. ERC-8170 as the commit substrate + ERC-8004 as the attestation destination seems like the right foundation. The missing piece, the attestation spec for commit/execute/reveal, is genuinely a gap, and the right place to define it is probably its own spec rather than stretching 8183’s scope.

On ERC-8001 : the observation about the Privacy module leaking intent to the solver is accurate — multi-party acceptance surfaces the intent before execution settles the commitment. Your model avoids this by making the reveal agent-controlled and post-execution. That’s a meaningful distinction worth preserving in whatever spec shape this takes.

One connection worth noting : ThoughtProof (@ThoughtProof) is working on PLV (Plan-Level Verification) which touches the same commit-before-outcome property for reasoning traces — different domain (language model epistemic integrity vs. trading strategy), same underlying proof structure. Might be a useful cross-thread.

If you’re prototyping the commit/execute/reveal pattern on ERC-8170, worth opening a dedicated thread, the design questions around partial reveals and the attestation format deserve their own space. Would be glad to engage there.

@Bakugo32 confirmed on the middleware-layering writeup, targeting July 14. Will scope it around how evaluator products compose at the consumer boundary without touching the binary primitive — the Case 4 work with pablocactus gives us concrete material for that.

Noted the Phase 2 redeployment and the new contract set. We’ll update our tooling against the new addresses.

3 things from production that speak directly to the open questions in this thread.


1. Human-as-evaluator via PreparedTx — an async pattern that fits ERC-8183

@miratisu suggestion - evaluator runtime listening to events, async
processing within expiry - is exactly the pattern we’re running, but with a human as
the evaluator for high-stakes decisions.

When an agent proposes an on-chain transaction, our gateway creates a pending approval
and enters a poll loop with a TTL. The human evaluates off-chain and resolves via a
wallet-signed action. The resolution is then recorded. This maps cleanly onto ERC-8183’s
job lifecycle:

ERC-8183 job funded
  → agent executes (ERC-8004 identity, WYRIWE input integrity)
  → proposes tx (ERC-8265 PreparedTx envelope)
  → human receives notification, evaluates off-chain
  → signs resolution (approve / decline + reason)
  → ERC-8183 evaluator calls complete() or reject()
  → funds released or returned


The PreparedTx FSM we shipped (ERC-8265,
PR #1753 open) handles the handoff between agent and evaluator without requiring the
evaluator to be on-chain or synchronous. status: "approved" | "declined" | "expired"
maps directly to ERC-8183’s terminal states. intent: "retry" | "abandon" gives the
agent producer-side guidance when evaluation doesn’t go through.

This means ERC-8183’s evaluator role can accommodate a spectrum from fully automated
(ThoughtProof’s multi-model consensus) to fully human (wallet signature via PreparedTx)
without changing the spec — the evaluator is just an address that calls complete() or
reject() within the expiry window.


2. The submission artifact problem — what does the evaluator actually verify?

ThoughtProof raised the evaluator-provider disagreement question (post #14). The deeper
issue is that ERC-8183’s submission is currently a blob — the evaluator has no
standard way to verify what the agent actually computed versus what the agent claims
it computed
.

In production, we attach a three-hash commitment to every inference run:

raw_input_hash              SHA-256(input before sanitization)
sanitization_pipeline_hash  SHA-256(pipeline spec) or identity sentinel
input_hash                  SHA-256(input entering the model)
output_hash                 SHA-256(agent reply)
manifest_hash               SHA-256(model + provider + inputSources + trustScope)


The output_hash is the verifiable handle on the deliverable. An evaluator contract
could require that the submission field in ERC-8183 includes this hash and that it
resolves against the attestation trail at the agent’s gateway endpoint. Disagreement
between evaluator and provider then has a concrete focal point: does the submission hash
match the attested output? If it does, the dispute is about task quality, not about
what the agent actually produced.

We exposed this as a public endpoint — GET /verify/input-provenance — that checks the
three-hash commitment without requiring auth. An ERC-8183 evaluator contract could call
this (or an on-chain anchored version of it) as part of its evaluation logic.

Live example:

GET https://gateway.ensub.org/verify/input-provenance
  ?rawInputHash=<hex>
  &sanitizationPipelineHash=<hex|sentinel>
  &inputHash=<hex>

→ { valid: bool, transformation: "identity"|"sanitized", reason: string }



3. Credit abstraction — a two-tier model that’s running in production

@clawplaza point about not every task justifying an on-chain transaction is
exactly the tradeoff we faced. Our current model:

  • Registry credits — scoped to an AgentIdentityRegistry collection. Granted as a
    community pool on first collection deployment. Deducted per inference call. No gas, no
    transaction per micro-task.

  • Wallet credits — scoped to an EOA. For wallets not tied to a specific registry,
    or for individual top-up billing.

On-chain settlement only happens for higher-value actions that go through the PreparedTx
approval gate. Everything below that threshold runs against the off-chain credit pool
with the gateway as the trust root.

The open question @clawplaza raises — whether the spec should address this — probably has
the same answer as post @mlegls gives for the broader spec: ERC-8183 should define
the minimal on-chain interface (fund, submit, evaluate, release) and leave the
micro-payment abstraction layer to implementations. Mandating a specific credit system
in the spec would constrain the diverse implementations that are already running.


On writing to ERC-8004 reputation registries

@miratisu asked whether evaluators should write directly to ERC-8004 on completion
. Our factory (verified on mainnet)
is the only production ERC-8004 deployment we’re aware of. The registry has a setMetadata(agentId, key, value) function that could carry an attestation from an ERC-8183 evaluator - job completion, score, timestamp - without needing a separate reputation registry contract.

The loop miratisu describes would be:

ERC-8183 complete() fires
  → evaluator writes to AgentIdentityRegistry.setMetadata(agentId, "erc8183.job", attestation)
  → reputation is on-chain, scoped to the agent NFT, queryable by any future client


This works today with no changes to either spec. Whether it’s the right home for
reputation data versus a dedicated registry (as ERC-8004 originally envisioned) is the
design question worth discussing here.


1 Like

@TMerlini’s mapping into the ERC-8183 job lifecycle is the composition I’d want. As the ERC-8265 author, one boundary worth stating precisely for anyone reading from the ERC-8183 side:

The forward handoff - agent prepares a transaction, wraps it, the evaluator’s wallet consumes it - is the normative v0.1 envelope. validity.notAfter bounds the evaluation window: the envelope itself says how long a preparation may be acted on, and §5.4 has a consumer near notAfter re-prepare rather than sign stale state.

The status / intent outcome shape is the reverse path, evaluator back to agent, and that is deliberately not normative in v0.1. The envelope is one-directional; the outcome message is reserved non-normatively in Appendix B (§11), field names not fixed. Tiago’s gate ships that reverse path in production; ERC-8265 reserves the design space rather than fixing it in v0.1.

So ERC-8183 can rely on the normative envelope for the agent→evaluator handoff today; the outcome shape is a reserved seam rather than a v0.1 guarantee. Happy to align it as both drafts firm up.

1 Like

@Bakugo32 Three quick:

Adoption. Appreciated. One footnote worth flagging so it doesn’t trip your reviewers: the 4-EOA separation only holds if the funding pattern doesn’t visually re-correlate them. We send DEPLOYER → REGISTRAR / ATTESTOR / EVALUATOR as three separate ~0.02 ETH txs and call out in the doc that funding-source linkage is not a role-authority correlation. Cheap to misread otherwise. Ownable + in-contract 48h propose/execute is a clean alternative path; the choice is really driven by whether your upgrade surface is hot (ours is — V1→V2→V3 in three months) or frozen.

Spec text in #1732. Folding directly. Reworking the PR description as a sub-bullet under the uint width clarification — single review surface, no sibling PR.
Will tag @dcrapis when pushed.

Role-grant operational pattern. Separate blocks, explicit read-back between each. Two layers:

  • EOA role assignments (REGISTRAR_ROLE → registrar EOA, ATTESTOR_ROLE → attestor EOA, EVALUATOR_ROLE → evaluator EOA): baked into the proxy’s initialize() args, so each role lands atomically with the deployment tx of its respective contract. One tx per contract, hasRole() read-back after receipt confirms.
  • Cross-contract authority wiring (Reputation.grantRole(ATTESTOR_ROLE, Jobs) then Jobs.setReputationAttestor(Reputation)): two distinct post-deploy txs, sequential, each waitForTransactionReceipt + on-chain read-back assertion before the next fires. DRY_RUN mode prints calldata for human review before EXECUTE=1 actually sends.

Reasoning is blast radius. If a grant lands to the wrong instance (typo, stale env var, wrong-network artifact that got recompiled mid-session), the read-back catches it before the next call locks more authority into a half-broken graph. Extra block cost is ~2s on Base; cost of unwinding a chained-write mistake is much higher. Applies harder to renounceRole(ADMIN_ROLE, deployer) post-multisig-migration — point of no return, unambiguously its own audit-trail entry.

One edge case where I’d consider bundling: the Safe migration itself, grantRole(ADMIN, safe) + renounceRole(ADMIN, deployer) atomic, to avoid a window where neither side holds admin. We’re not doing that — accepting the window in exchange for keeping each tx individually inspectable, with the 48h timelock as the actual safety. Curious whether your in-contract propose/execute changes that tradeoff: does the queue itself give you enough on-chain auditability to bundle the proposals more aggressively?

— Nicholas (CardZero / mrocker)