ERC-8183: Agentic Commerce

interesting proposal. the SettlementDefault coverage type caught my eye.

been working on ERC-8203 (agent off-chain conditional settlement) which handles the dispute/settlement layer. your AAP could sit on top nicely. the claims resolution flow maps well to 8203’s COMPOSITE conditions where you combine collateral lock + oracle attestation as proof.

one thought: if agents batch settlements via merkle roots (which 8203 supports), the assurance layer needs to handle partial claims on batched settlements. worth thinking about early imo.

thread here if you want context: ERC-8203: Agent Off-Chain Conditional Settlement Extension Interface

1 Like

Really like this framing. The ERC-8183 job session → ERC-8203 composite/dispute path maps very cleanly to how we think about ThoughtProof.

We see ThoughtProof as a verification layer before settlement: structured outputs that can feed downstream attestation, claims, or dispute-resolution logic without becoming a settlement rail itself.

The proofType point makes sense, and the partial-claim / batched settlement issue feels like one of the key design questions to solve early. If disputes happen against batched commitments or Merkle roots, you need a precise way to trace them back to the exact underlying claim/job/proof.

One additional angle that may matter here: we’re also exploring ar.io / Arweave-backed storage for full Epistemic Blocks, so a dispute wouldn’t only reference a verdict, but could trace back via TX ID to the complete verification record / audit trail behind it.

Happy to read the draft once it’s up.

2 Likes

The self-funded model is a strong design choice — it avoids the pooled risk mutualization problem where one bad agent drains the pool and takes everyone down. Two questions on the mechanics:

Reserve depletion: What happens when an agent’s AssuranceAccount drops below the coverage amount mid-job? Say an agent has 2 ETH reserved, takes on a job with 1.5 ETH JobAssurance, then a previous claim drains 1 ETH from the reserve. The active JobAssurance is now undercollateralized. Does the protocol enforce a minimum reserve ratio, or is this left to the client to check before accepting?

Chained workflows: In multi-agent pipelines (Agent A’s output feeds Agent B’s input), a single job failure can cascade. If Agent B fails because Agent A delivered garbage that passed evaluation, the client has a valid JobFailure claim against B but the root cause is A. Does the coverage model account for upstream dependencies, or is each job treated as fully independent? This feels like it matters a lot for real-world agentic workflows where tasks are decomposed across multiple providers.

On your question about read-only vs tighter coupling — I think read-only is the right starting point. Tighter coupling would mean ERC-8183 contracts need to know about AAP, which breaks the composability story. The one exception might be emitting an event when a Job transitions to a state that could trigger a claim, so AAP doesn’t need to poll.

Good questions, let me address both.

Reserve depletion: This scenario actually can’t happen under the current design. When commitToJob() is called, committedAmount is synchronously deducted from availableAmount and credited to lockedAmount. Each JobAssurance’s collateral is independently reserved at commitment time, so two active JobAssurances never compete for the same funds. If the Agent doesn’t have enough availableAmount to cover a new commitment, commitToJob() simply reverts. The only way lockedAmount could become insufficient at payout time is if an optional extension (like external slashing) consumes it, which is why the spec includes a defensive insolvency check in payout() and requires such extensions to provide a fallback funding mechanism.

Chained workflows: Each JobAssurance is 1:1 with an ERC-8183 Job, and each Job is treated as fully independent. This is a deliberate scope boundary. Tracing upstream causality (A delivered garbage that passed evaluation, which caused B to fail) requires modeling inter-Job dependency graphs, which would significantly expand the core spec surface. For now, the client’s protection is to hold separate JobAssurances against each provider in the pipeline. If B fails, the client claims against B’s collateral. If the root cause is A’s output quality, that’s an evaluator quality problem on A’s Job, potentially covered by an EvaluatorDispute claim against A’s evaluator. Not a perfect solution for deep pipelines, but it keeps the core composable. Cross-Job dependency tracking could be a future extension.

Read-only integration: Agreed, and good news: ERC-8183 already emits events on every Job state transition, so AAP implementations can subscribe to those events without polling. No changes to 8183 needed.

I’ve gone into more detail on these design decisions in the AAP spec and forum thread. Would be great to continue this discussion there, especially the chained workflows question. It’s exactly the kind of feedback that helps shape the extension roadmap.
Thread: ERC-8210: Agent Assurance
Spec: Add ERC: Agent Assurance by wangbin9953 · Pull Request #1632 · ethereum/ERCs · GitHub

1 Like

Thanks for the detailed breakdown, both answers land well.

On reserve depletion — the synchronous committedAmountlockedAmount transition at commitToJob() is clean. I hadn’t fully internalized that the lock happens atomically at commitment, not at evaluation time. That eliminates the race condition I was worried about. The insolvency check in payout() as a defensive layer for optional extensions makes sense too — belt and suspenders for a system where external slashing hooks could introduce unpredictable drains.

One follow-up there: if availableAmount is insufficient and commitToJob() reverts, is there a standard error code or event that clients/orchestrators can react to programmatically? Knowing why a commitment failed (insufficient reserves vs. other revert reasons) matters for automated pipeline orchestration — you don’t want a scheduler retrying a commitment that will never succeed because the agent is fully allocated.

On chained workflows — I agree this is the right scope boundary for the core spec. Modeling inter-Job dependency graphs at the protocol level would be a complexity explosion. The pattern you describe (separate JobAssurances per provider + EvaluatorDispute against A’s evaluator if A’s output was the root cause) is workable for shallow pipelines. Where it gets uncomfortable is deep pipelines (A → B → C → D) where the evaluator on A’s Job passed the output but the failure only manifests three hops downstream. By then the causal link is hard to establish, and the client is stuck claiming against D’s collateral when D arguably did nothing wrong.

Not saying the core spec should solve this — but it might be worth documenting the pattern as a known limitation with a recommended mitigation. Something like: “For multi-hop workflows, clients SHOULD maintain independent JobAssurances at each stage and MAY implement application-level dependency tracking to attribute root cause failures.” That way implementers aren’t surprised when the 1:1 model doesn’t cleanly map to their pipeline topology.

On the events — perfect, that’s exactly what I was hoping. ERC-8183 event subscriptions give AAP everything it needs without coupling.

I’ll take this over to the ERC-8210 thread. The chained workflows question in particular feels like it deserves its own section in the spec’s “Design Rationale” or a companion EIP for cross-Job dependency extensions.

1 Like

Good catch on the error code question. The current spec doesn’t define custom errors for commitToJob() revert scenarios. For automated orchestration, distinguishing “insufficient availableAmount” from “Job already has an Active assurance” from “coverage condition already qualifies” would be useful. Worth adding typed custom errors (e.g. InsufficientAvailableAmount, DuplicateCommitment, AdverseSelectionBlocked) in the next revision.

On documenting the chained workflows limitation: agreed. A short note in the Rationale section along the lines you suggest would help implementers. Will add that to the next spec update.

See you in the 8210 thread.

1 Like

Hi all — joining this thread with an open-source implementation and a few concrete things to contribute to the ongoing design discussions.

Repo available on github : agent-settlement-protocol by Demsys
Base Sepolia + live Swagger UI: agent-settlement-protocol-production.up.railway.app/docs/


Responding to @ThoughtProof (#21**) — “Happy to test against a live ACP job contract on either Base Mainnet or Sepolia”**

We have a live contract on Base Sepolia: AgentJobManager at 0xef8b87A6236e7DB4E0967Ed068C8893fD5a5D57f. If you want to point your evaluator at it, the TypeScript SDK (@asp-sdk/sdk on npm) handles the full lifecycle — you call submitWork(jobId, deliverable) as provider, your evaluator contract calls complete(jobId, attestationHash). Happy to coordinate


On evaluator complexity (clawplaza #13**, ThoughtProof** #14 )

Both of you identified evaluator complexity as the real bottleneck. We approached it from a different angle : an on-chain EvaluatorRegistry with stake-weighted pseudo-random selection and slashing.

When createJob() receives address(0) as evaluator, the contract call assignEvaluator(jobId) which selects from the eligible pool weighted by staked protocol tokens. Design rationale:

  • Stake = skin in the game, not just identity

  • Slashing for provably wrong calls creates direct incentive alignment

  • Warmup period (1 day minimum, up to 30 days) prevents Sybil spamming the registry

This doesn’t answer who should evaluate (your multi-model consensus approach, clawplaza’s AI coordinator, a ZK proof) — it answers how to select trustlessly from a pool of competing evaluators. Composable: ThoughtProof’s evaluator contract could itself be a staked registry participant.

Currently 2 evaluators on testnet, both controlled by us — lets be honest about that. The mechanism is live and working, not theoretical.


On the ERC-8004 reputation bridge (ThoughtProof #5**, Sentinel** #27**)**

ThoughtProof asked about the complete() → ERC-8004 update pattern. We implemented ReputationBridge.sol that sits between the two contracts.

Key constraint we ran into : the bridge must never revert if the reputation registry fails, otherwise a broken registry blocks fund release. We solved it with a gas-capped try/catch (200k gas limit). Settlement never fails due to a reputation update failure. Currently runs in no-op mode since there’s no canonical ERC-8004 registry deployed — one setReputationRegistry(addr) call activates it.

Signal model we used : complete() → positive signal for both provider and evaluator. reject() → negative signal for provider only (evaluator did their job correctly by rejecting bad work). Open to challenge on this logic.


On agenttech’s AAP / ERC-8210 (#67**)**

The read-only integration design is the right call. One observation specific to stake-weighted evaluator selection : the EvaluatorDispute coverage type gains a natural on-chain resolution path. If an evaluator is slashed by the registry for a demonstrably wrong call, that slash event is verifiable proof of evaluator failure — the Claims Resolver could check slash history directly rather than relitigating the original evaluation. Worth considering whether AAP could treat an on-chain slash as automatic claim eligibility for EvaluatorDispute.


On wjmelements (#10**) — “benefit from implementing a reference implementation”**

For developers wanting to build on ERC-8183 without writing Solidity : @asp-sdk/sdk (npm, TypeScript) with a Google A2A adapter, and asp-sdk (PyPI, Python) with adapters for CrewAI, LangGraph, and AutoGen. MIT license. The SDK abstracts the full job lifecycle behind a few method calls.


Testnet only, MockUSDC, no audit yet but working on it — appropriately experimental. Happy to open issues on the base-contracts repo for anything spec-level.

1 Like

repo github : GitHub - Demsys/agent-settlement-protocol · GitHub

1 Like

@Bakugo32 This is brilliant. The stake-weighted EvaluatorRegistry perfectly separates how to select an evaluator from how to evaluate.

ThoughtProof acting as a staked participant in your registry is the exact composition we want to test. We’ll grab the @asp-sdk/sdk and point our backend at your Base Sepolia AgentJobManager (0xef8b…5D57f).

Give us a day or two to wire up a test script that catches a job, runs the multi-model reasoning consensus via our API, and calls complete(jobId, attestationHash) with the resulting verification CID. I’ll follow up here once we push the first successful evaluation on-chain!

1 Like

@ThoughtProof, heads up on a few things before you wire up, including a redeployment that happened yesterday.

Updated contract addresses (redeployed 2026-04-05 with the grace period fix from PR#13):

  • AgentJobManager: 0xfb4D4F517798efAc603d0d6472a11E48447dE7D7

  • EvaluatorRegistry: 0x01a60505E55032F8F8A4a092d845b4446EFa56ec

  • ProtocolToken (VRT): 0xA35d7c260ee4455D7f5da8C786286f5e6A2179Da

Staking flow:

  1. Share your wallet address — I’ll mint you 100 VRT (the minimum stake)

  2. ProtocolToken.approve(EvaluatorRegistry, 100e18)

  3. EvaluatorRegistry.stake(100e18)

Timeline: governance wiring executes April 7 (2-day timelock). Stake before April 6 with the 1-day warmup → eligible April 7.

complete() is a direct contract callmsg.sender must equal job.evaluator. The REST API’s /complete endpoint only handles our deployer wallet. Call the contract directly:

AgentJobManager.complete(uint256 jobId, bytes32 reason)

reason is bytes32 — pass keccak256(abi.encodePacked(cid)) or the raw SHA2-256 digest of your CID.

How to detect assigned jobs — we just shipped a public evaluator endpoint in v0.2.0:

GET /v1/evaluator/{yourAddress}/jobs

Returns all jobs where evaluatorAddress matches. No auth needed. The deliverable plaintext is now included in the response once the provider submits. The SDK exposes this as:

const watcher = client.watchForAssignments('0xYourAddress')
watcher.on('submitted', (job) => {
  // job.deliverable contains the plaintext — evaluate and call complete() on-chain
})

One honest caveat : job auto-assignment via EvaluatorRegistry only activates on April 7. Until then, the deployer is the fallback evaluator. Once the governance executes, new funded jobs will be assigned stake-weighted from the registry pool — which is exactly where ThoughtProof comes in.

Thanks for your response!

Interesting implementation, especially the stake-weighted evaluator selection with slashing.

On the EvaluatorDispute point: using an on-chain slash event as automatic claim eligibility is a compelling pattern. A slash is arguably the strongest form of on-chain attestation for evaluator failure, since the evaluator’s own staking protocol has already adjudicated the error and penalized it.

In the current AAP spec, this would work without any core changes. The fileClaim() evidence parameter can reference the slash event (transaction hash or attestation CID), and the Claims Resolver can verify the slash against the evaluator registry contract. Whether a slash constitutes automatic eligibility or just strong evidence is an implementation decision for the Resolver’s resolution criteria.

What I’d avoid is hardcoding “slash = auto-approve” at the protocol level, since different staking registries may have different slashing thresholds and reasons, not all of which map cleanly to EvaluatorDispute eligibility.

Worth documenting this pattern in the ERC-8210 reference scenarios though. @cmayorga this might be a good case to include in the multi-hop workflows documentation you mentioned.

Forum thread for deeper AAP discussion: ERC-8210: Agent Assurance

1 Like

@Bakugo32 Thanks for the detailed heads-up and the updated addresses! We’re already wiring up the ThoughtProof evaluator bot on our end. To get ahead of the April 7 governance timelock, we’d love to stake our evaluator wallet right away so we’re eligible for the stake-weighted job pool from day one.

Could you mint the 100 VRT to our Base Sepolia evaluator wallet?

Address: 0xB4B9Cb85A2642719ba919b0C0F25d2df570eB9C0

@JackyWang @Bakugo32 Also, this cross-protocol composition happening here is beautiful to watch. From ThoughtProof’s perspective as an evaluator, this is the holy grail: our stake-weighted evaluations settle the commerce layer (ERC-8183), and our on-chain slashing conditions map perfectly into the evidence parameters for the assurance layer (ERC-8210).

A completely closed-loop trust model for agents without requiring a monolithic protocol. Love it.

2 Likes

Test vectors for the 14 spec scenarios are now up as a PR against the repo: PR #1647. Foundry tests against a minimal IAAP mock with ERC-8183 job state simulation, 14/14 passing. No changes to the core spec — purely third-party implementability validation.

One thing worth flagging from the ERC-8183 thread: @Bakugo32 from Demsys has deployed on Base Sepolia. They made an interesting observation about how this composes with AAP: if an evaluator is slashed by the registry for a demonstrably wrong call, that slash event is verifiable on-chain proof of evaluator failure. A Claims Resolver could check slash history directly rather than relitigating the original evaluation.

This maps cleanly to the EvaluatorDispute coverage type. Worth considering whether AAP could treat an on-chain slash from a recognized EvaluatorRegistry as automatic claim eligibility — no Resolver adjudication needed, the proof is already on-chain. This wouldn’t require changes to the core spec, just a documented pattern: “if your evaluator is registered in a slashing-capable registry, the slash event itself satisfies the EvaluatorDispute eligibility condition.”

This also connects to the upstream verification field we discussed earlier — the slash event CID could be included as evidence in fileClaim(), giving the Resolver a machine-verifiable proof chain.

@ThoughtProof done there is 100 VRT minted to 0xB4B9Cb85A2642719ba919b0C0F25d2df570eB9C0 on Base Sepolia.
tx: 0x86da5921b635ff04dbf7b04c64846375fb2e3a634826d25df61928bfe2f16c08

Staking steps:

  1. ProtocolToken (0xA35d7c260ee4455D7f5da8C786286f5e6A2179Da).approve(0x01a60505E55032F8F8A4a092d845b4446EFa56ec, 100e18)

  2. EvaluatorRegistry (0x01a60505E55032F8F8A4a092d845b4446EFa56ec).stake(100e18)

Governance wiring executes April 7 - stake before then and you’ll be in the eligible pool from day one.

@cmayorga thanks for the precise framing and for citing the deployment. The “slash as direct on-chain proof, no re-adjudication needed” angle is stronger than how we articulated it. One honest gap on our side is that our current EvaluatorSlashed event doesn’t include jobId, so correlating a specific slash to a specific job requires cross-referencing with EvaluatorAssigned. If this pattern makes it into the AAP reference scenarios, we’d align our event shape to whatever ends up specified.

1 Like

Good catch on the jobId gap — that’s exactly the kind of implementation detail that matters when you try to compose two specs in practice.

The simplest fix would be including jobId directly in EvaluatorSlashed:

event EvaluatorSlashed(
    address indexed evaluator,
    uint256 indexed jobId,
    uint256 slashedAmount,
    bytes32 reason
);

With jobId indexed, an AAP implementation can set up a single event filter to catch slashes relevant to any active EvaluatorDispute JobAssurance. No cross-referencing needed — one event, one lookup, one claim eligibility check.

The reason field maps directly to the evidence parameter in AAP’s fileClaim(). The Beneficiary would pass the slash event transaction hash or an IPFS CID referencing the slash details, and the Resolver (or an automated resolution path) can verify on-chain that the slash actually happened for that specific Job.

I can include this as one of the reference scenarios in the multi-hop workflows follow-up I’m planning for ERC-8210 — “Evaluator slash as automatic EvaluatorDispute claim eligibility.” That gives both specs a concrete composition example. Would that be useful?

Yes definitely include it. A documented composition example between the two specs is more valuable than any amount of abstract discussion. Your event shape works for us:

event EvaluatorSlashed(
    address indexed evaluator,
    uint256 indexed jobId,
    uint256 slashedAmount,
    bytes32 reason
);

We’ll update EvaluatorSlashed in our next contract deployment to match. The reason field is already consistent with how we handle complete(jobId, bytes32 reason), same encoding, same semantics.

1 Like

We’ve been building an evaluator implementation on Arc testnet and have a design question: our setup uses off-chain scoring (wallet behavioural history, prior patterns) to inform the on-chain complete()/reject() attestation - hybrid rather than purely on-chain logic. Is this pattern something the spec intends to accommodate, or is the evaluator role envisaged as fully on-chain?

Thanks @Bakugo32! Just confirming that the 100 VRT are successfully staked on Base Sepolia.

Our evaluator bot is active and the wallet (0xB4B9Cb85A2642719ba919b0C0F25d2df570eB9C0) is fully locked in.

Stake Tx: 0x01f129b6470e99218f1d95317f1328875dd192efdd2db0408c5fb5e772dc1b1d

Looking forward to the governance wiring tomorrow!

1 Like

Interesting to see another evaluator implementation with staking.. we’re approaching this differently on Arc testnet (no staking requirement, scoring is purely off-chain signal informing on-chain attestation) so curious how the governance wiring works for you.

Our setup: off-chain AHS scoring (wallet behavioural history, D1/D2/D3 dimensions) → score informs complete()/reject() decision → on-chain attestation via ERC-8183. The evaluator wallet holds the key, scoring logic runs off-chain.

The open question from our side is whether the spec intends evaluators to be fully on-chain (smart contract logic only) or whether hybrid patterns like ours are within scope. The EvaluatorSlashed event thread above suggests there’s appetite for reputation/staking mechanisms layered on top — which would complement rather than replace hybrid scoring.

What’s your staking mechanism protecting against specifically.. evaluator collusion, or something else?