ERC-8183: Agentic Commerce

@ThoughtProof @pablocactus you’re ready to stake. We’ve minted 100 VRT to both of you on the new ProtocolToken (0x4c4468567eE753d1b27Cf02b5896b4af71c40719). Your Base Sepolia ETH balance is sufficient to cover the two transactions (approve + stake).

Staking guide : https://github.com/Demsys/agent-settlement-protocol/blob/main/docs/EVALUATOR.md

Auto-assignment goes live ~2026-05-06 once the governance delay clears.

Restaked :white_check_mark: — 100 VRT on the new EvaluatorRegistry (0x4F4a…9BD1), warmup clears ~May 5 20:40 UTC. Ready for auto-assignment once governance delay passes. Treasury fee split (80/20) and MIN_BUDGET bump look clean. :+1:

1 Like

Restaked.. 100 VRT on the new EvaluatorRegistry (0x4F4a…9BD1), warmup clears ~May 5 21:27 UTC. Ready for auto-assignment once governance delay passes. 80/20 split on both verdicts and the standalone Treasury read clean.

1 Like

@Bakugo32 - quick scheduling note for the next test job. I’m available today and Friday May 8 onward. Unavailable tomorrow (Thu May 7) for work travel. Happy whenever fits your cycle - flagging in case it helps timing.

Sharing notes from a production ERC-8183 deployment on Base mainnet (live since 2026-04). Repo with addresses, tests, and migrations: [ GitHub - mrocker/CardZero: The first universal payment wallet for AI Agents. USDC on Base. · GitHub ]

We deployed a self-implementation of the spec (`CardZeroJobs`, [`0xb28a0cca5ac28466f3d175f35b97aa104d4c4ba8`]) wired into our own ERC-8004 IdentityRegistry / ReputationRegistry pair. 133 contract tests, 4-EOA role separation (deployer / registrar / attestor / evaluator), real mainnet E2E (5 txs, splits enforced 93%/5%/2%).

Quick takes on the four open questions @miratisu raised:

### 1. Async off-chain computation in the evaluator

**Yes, this is the real product.** Three eval rule types in production:

- `manual` — human attests via UI (off-chain queue + UI button + on-chain `evaluatorComplete`)

- `json_schema` — submitted deliverable URI is fetched, validated against client-declared JSON schema, complete/reject decision made off-chain, on-chain attestation

- `http_check` — provider exposes a callback URL, evaluator pings it with a per-job nonce, expects 200 + body shape match

All three run in a 2-min cron loop. The evaluator EOA has `EVALUATOR_ROLE` and signs the on-chain transition. Async is not optional — if you want anything beyond “trust the provider” you need it.

**What I’d want from the spec:** stay agnostic. Don’t try to formalize the eval mechanism — it’s properly application-layer. But please **keep the `evaluator` field as `address`** (not `bytes32` / DID), so a smart-contract evaluator can substitute later (e.g., a TEE-attested oracle, or a multisig of human reviewers). We chose a single EOA for v1; the address field gives us the upgrade path.

### 2. Should the evaluator write to ERC-8004 ReputationRegistry on completion?

**We do, and we’d argue yes — but as a separate role grant, not implicit.**

Our `CardZeroJobs.finalizeJob()` calls `ReputationRegistry.attest(provider, jobValue, scoringRulesHash)` after the funds split. The `Jobs` contract itself holds `ATTESTOR_ROLE` on the ReputationRegistry. The grant is a one-time admin op, not implicit in the spec.

Pros:

- Job-completion → reputation is the primary causal link in this market. Decoupling them produces stale reputation data.

- The on-chain link is auditable. `cardzero.ai/.well-known/agent/{address}` becomes meaningful.

Cons:

- It couples Jobs to a specific ReputationRegistry. We solved this with `setReputationAttestor()` (mutable, admin-gated).

- It introduces re-entrancy surface. We do the attest after `transfer` calls (CEI pattern violation actually) — be careful. Consider `nonReentrant` modifier or strict ordering.

**What I’d want from the spec:** make the link **optional but standardized**. e.g. `IJobs.setReputationRegistry(address)` and `Jobs` MUST call `attest()` after a successful `complete`. Implementations that don’t want it can leave the address as `0x0`.

### 3. “Agentic” vs “Escrow”

**Both. The protocol is escrow; the audience is agents.** Naming-wise, “Agentic Commerce” puts off non-agent users (we’d never use it for human-to-human even though the contract supports it), but “Escrow” loses the discoverability of the agent market.

Pragmatic suggestion: **make the spec name “Service Delivery Escrow” and call out “designed for autonomous agents but fully usable by EOAs/multisigs”** in the abstract. We get Google rank for both.

### 4. Linked / two-phase jobs

**We don’t have this yet.** Looked at it for parent-child task decomposition (e.g., a research agent commissions sub-jobs).

Honestly: **don’t make it base-spec.** Two reasons:

1. Linked-jobs has dozens of compositional variants (DAG, fan-out/fan-in, conditional, partial-completion). Picking one closes off the others.

2. You can build it as an **adapter contract** that holds child Job IDs and orchestrates funding/completion. The base ERC-8183 doesn’t need to know about it.

We’ll likely do this in v2 of `CardZeroJobs` as a separate `CardZeroLinkedJobs` contract. Keep the base spec minimal.

### One thing I think should be tightened in v2

`expiry` field. We caught a real bug going to production: `uint64` vs `uint256` mismatch in our chain.ts ABI bound (function selector mismatch → silent revert). The contract uses `uint256` but our SDK assumed `uint64`. After the fix (correctly `uint256`), behavior worked.

**Suggestion for the spec:** be explicit that `expiry` is `uint256 secondsSinceEpoch`, and add a SHOULD that wallets refuse to fund a job with `expiry < now + 86400` (1 day) — short expiries lead to client-rejected workflows that look like provider failures in reputation.

Happy to share contract source / tests / migration scripts if useful. We’re MIT-licensed, and the deployment ceremony doc is in the repo (`docs/deployment-ceremony-sprint8-9.md`).

Currently in beta. Before posting any production claims, I should note: we recommend per-wallet balances under \$100 USDC until external audit completes (planned via Code4rena).

— Nicholas (CardZero / mrocker)

@mrocker Welcome, and this is significant: first production ERC-8183 deployment we’re aware of. A few notes from our side.

Async evaluator : full agreement. Our testnet has pablocactus running AHM (Agent Health Monitor) and ThoughtProof running PLV (Plan-Level Verification) — both off-chain, both attesting on-chain via complete()/reject(). The spec shouldn’t touch the mechanism.

address evaluator field : fully agree. We made the same choice for the same reason — smart-contract evaluators, multisig reviewers, TEE oracles are all upgrade paths that a bytes32/DID field would close off.

ReputationBridge : our implementation follows the same pattern — AgentJobManager holds an ATTESTOR-equivalent role on ReputationBridge, wired via setReputationBridge(). One note on your CEI observation: we call the bridge after the token transfers too. The mitigation we use is nonReentrant on complete()/reject() — worth adding if you don’t have it.

uint64 deadline : our contract and SDK are consistent on uint64, so no mismatch on our side. Your suggestion of uint256 in the spec is worth formalizing — the uint64 packing optimization is an implementation detail that the spec shouldn’t impose.

Linked jobs : agreed. Keep base spec minimal. An adapter contract for DAG orchestration is the right abstraction boundary.

Would be glad to compare notes on the deployment ceremony — our docs/ structure sounds similar to yours.

Thanks for the read. Confirmations + reciprocal notes:

nonReentrant: all five mutating paths are guarded — fund / declineJob / complete / reject / claimRefund. We’re on OZ’s ReentrancyGuardTransient (EIP-1153 transient storage) so the per-call cost is ~100 gas vs ~2.1k for the storage version. Ordering is state → transfers → bridge, so CEI does most of the work and the guard is belt-and-suspenders. Agree the spec shouldn’t mandate the ordering — but recommending it as informational guidance would probably save implementers a class of mistakes.

Async evaluators: our service-side equivalent has three rule types — manual (pure off-chain attest), json_schema (deliverable structural validation), http_check (provider posts artifact URL, evaluator fetches + asserts). All three terminate at on-chain complete() / reject(), so the spec sees one mechanism. AHM / PLV is exactly the kind of plurality we hoped the design would unlock.

uint256 deadline: I’ll write the proposed change up on the EIP repo as a separate item so it’s actionable. The uint64 packing optimization is a contract-level call, not a wire-format one — agree.

Linked jobs: agreed. DAG orchestration belongs in a JobOrchestrator adapter that calls createJob / fund on the base — keeps reasoning local and lets the base spec stay reviewable.

Ceremony: ours is 11 steps. Three pieces that probably overlap with yours:

  • 4-EOA role isolation (deployer / registrar / attestor / evaluator), with a pre-deploy self-check that verifies no key holds more than its assigned role
  • SCORING_RULES_HASH committed at construct time + a public mirror at /SCORING-RULES.md, so the off-chain rules doc stays auditable against the keccak commitment forever
  • Sepolia full-lifecycle rehearsal (createJob → fund → submit → complete) with a hard manual gate before mainnet

If you’ve published yours I’d read it — happy to compare in this thread or async.

@mrocker

@mrocker ReentrancyGuardTransient noted, we’re on the storage version, switching is on the list for next deploy. The ~2k gas saving matters at scale.

The SCORING_RULES_HASH pattern is interesting. We solved the same auditability problem differently : EvaluatorRegistry.setMetadata() lets each evaluator commit their methodology URL/IPFS CID individually, so the rules are per-evaluator rather than protocol-wide. Trade-off is flexibility vs uniformity. Your approach is cleaner for a single-evaluator deployment.

Our ceremony is in scripts/deploy.ts + a 2-day governance timelock for wiring (proposeJobManager / executeJobManager). No dedicated ceremony doc yet. Yours sounds more structured. The Sepolia full-lifecycle gate before mainnet is identical to our process.

Thanks for the detailed reply.

The SCORING_RULES split is the interesting one. Re-reading your EvaluatorRegistry.setMetadata(url/CID) design against ours: I think these are not really competing — they collapse into the same shape at the boundary case. Per-evaluator metadata is the strictly more general pattern; a single-evaluator deployment is just the degenerate case where the registry has one entry.

What we gain from the protocol-wide hash today is uniformity of attestation semantics — every reputation event ties back to one frozen rules document, so an aggregator (e.g. an A2A discovery layer) can compare scores across jobs without fetching per-evaluator metadata first. What we give up is exactly the multi-evaluator flexibility you’re describing. If/when we open up to third-party evaluators, your pattern is what we’d reach for: a setMetadata per evaluator + the protocol stays agnostic to methodology. Worth noting in the spec discussion that both patterns coexist cleanly — the rule-hash is just an evaluator’s metadata commitment with a stricter shape.

The 2-day governance timelock (proposeJobManager / executeJobManager) is a good shape. Adding it to our pre-audit hardening list — currently UUPS with ADMIN_ROLE = deployer EOA, which is fine for beta but won’t survive Code4rena review unmodified. Propose/execute with a ~48h window is the obvious upgrade.

Re: ceremony — short writeup is queued. Will link in a follow-up post once it’s in the public docs (rather than inline a 60-line procedure here). Key shape is: 4-EOA role separation pre-funded → Sepolia full lifecycle gate → mainnet deploy via deterministic salt → role grants done as separate post-deploy txs (not bundled), each independently verifiable. The Sepolia gate matches what you described.

— Nicholas (CardZero / mrocker)

@mrocker the convergence point you’ve identified is clean : “rule-hash as evaluator metadata with a stricter shape”. That framing deserves a line in the spec discussion, it resolves the apparent tension and gives implementers a clear decision tree (single evaluator + auditability priority → protocol-wide hash; multi-evaluator + methodology diversity → per-evaluator metadata).

On UUPS + ADMIN_ROLE = deployer EOA: the 2-day propose/execute window fixes the timing risk, but the EOA being sole admin is a single point of failure that Code4rena will flag. We went with Ownable + timelocked setters (no upgradeability for now), simpler threat model, but we defer the upgrade path question entirely. Worth thinking about whether UUPS is load-bearing for you before audit.

Looking forward to the ceremony writeup, the deterministic salt step is the one we don’t have yet.

@ThoughtProof @pablocactus, two jobs are live on the new contracts, ready for you :

  • Job #1 — ThoughtProof (provider) / pablocactus (evaluator) — 5 USDC

  • Job #2 — pablocactus (provider) / ThoughtProof (evaluator) — 5 USDC

AgentJobManager: 0xC07CE789206CBEEC3A41D5CedBdA93B1024aaDdd

First real run of the 80/20 fee split and FeeDistributed event on the new contracts. Provider calls submit(), evaluator calls complete() or reject().

@Bakugo32 - quick check on the deadlines for jobs #1 and #2. The deadline field on both decodes to ~14:53/14:54 BST today, which hasalready passed. Either (a) something’s off with the deadline value, (b) it’s a relative duration not an absolute timestamp, or (c) the jobs are intended to expire and we should observe that behaviour. Want to confirm before I act on either job.. happy to process now if (a) or (c) needs a re-deploy.

@pablocactus that was (a). The script had a 2h deadline hardcoded, suited for automated E2E tests, not manual evaluation. Fixed to 7 days.

Two new jobs are live:

  • Job #3 : ThoughtProof (provider) / pablocactus (evaluator) — 5 USDC

  • Job #4 : pablocactus (provider) / ThoughtProof (evaluator) — 5 USDC

Jobs #1 and #2 are expired, USDC recovered on our end via claimExpired() + claimRefund(), no action needed from you.

1 Like

Hey Bakugo — could you create one more job on the new contract (0xC07CE…)?

Job #5:

- Provider: 0x118B1E5A47658D20046bC874cB34E469d472c0C2 (ThoughtProof)

- Evaluator: 0x35eeDdcbE5E1AE01396Cb93Fc8606cE4C713d7BC (pablocactus)

- Token: MockUSDC

- Budget: 5 USDC

- Deadline: 7 days

Same setup as Job #3 — we’re re-running it with a proper deliverable workflow. Job #3 had an operational gap (content wasn’t retrievable for the evaluator), so pablocactus will reject that one and we do it right on #5.

Job #5 is live on 0xC07CE789206CBEEC3A41D5CedBdA93B1024aaDdd, provider ThoughtProof (0x118B…), evaluator pablocactus (0x35ee…), 5 USDC, 7-day deadline. Basescan

1 Like

Three things, in order of weight:

Admin-key SPoF. Fair flag, and you’re right that the 2-day window only addresses timing. We’re going Safe (2-of-3) + TimelockController (48h) on the four UUPS contracts (CardZeroJobs / IdentityRegistry / ReputationRegistry / CardZeroWalletV3 implementation upgrades) before audit. Sequence is grantRole(ADMIN_ROLE, timelock) → verify on Sepolia copy → renounceRole(ADMIN_ROLE, deployer) as a separate post-deploy tx, each independently checkable on Basescan. That preserves upgrade path without leaving an EOA on the critical path. The Ownable-without-upgradeability route was on the table; we kept UUPS because the V1→V2→V3 wallet history (two real upgrades in three months) tells us we’ll need it again before the threat model is mature enough to freeze.

Spec-text line. Agreed it belongs in the spec, not just the thread. The cleanest form I can think of:

Implementations MAY commit a single protocol-wide scoringRulesHash (single-evaluator/auditability-priority deployments) or expose per-evaluator metadata via an EvaluatorRegistry (multi-evaluator/methodology-diverse deployments). The former is the degenerate case of the latter with one entry.

If that shape is acceptable to you, I can fold it into ethereum/ERCs#1732 as a second paragraph or open a sibling PR — your call on which is less noisy for editors.

Ceremony writeup. Public draft is up: cardzero.ai/docs/reference/deployment-ceremony. Centerpiece is the deterministic CREATE2 salt step you flagged as missing — finalSalt = keccak256(abi.encodePacked(owner, agentSalt)), predicted via Factory.getWalletAddress() pre-deploy, deployment is idempotent against a re-call (skipped if wallet.code.length > 0). 4-EOA pre-funded role separation + SCORING_RULES_HASH public mirror + Sepolia full-lifecycle gate + separated role-grant txs are all there. Open to nits.

— Nicholas (CardZero / mrocker)

@mrocker, appreciated on all three.

Admin-key SPoF. Safe + TimelockController on the UUPS path is the right call given your V1→V2→V3 track record, that history makes the upgrade surface real, not theoretical. Our setup differs : AgentJobManager, EvaluatorRegistry, and Treasury are non-proxied Ownable contracts. The governance path is a two-step propose/execute with a 48h delay baked into the contract itself, not a separate TimelockController deployment. Full upgrade = redeployment + governance vote, which we’re comfortable with at this scale. That said, the 4-EOA ceremony maps cleanly onto our deployment regardless of proxy pattern, and we’re adopting it verbatim for mainnet.

Spec text. That formulation is clean, "the degenerate case of the latter with one entry" resolves the whole debate in one sentence. Fold it into #1732 directly; a sibling PR adds noise without additional signal.

Ceremony writeup. Read the draft. The CREATE2 approach (finalSalt = keccak256(owner, agentSalt), idempotent re-call guard on code.length > 0) is exactly what we were missing from our ceremony script. We’re taking the 4-EOA separation + SCORING_RULES_HASH public mirror + Sepolia full-lifecycle gate as our mainnet checklist. One question: on the separated role-grant txs, do you do all four roles in a single ceremony session or across separate blocks with explicit Basescan verification between each ?

- Bakugo (Demsys)

Job #4 on the current ASP deployment just completed the full evaluator cycle.

Setup: @pablocactus (AHM) as provider, ThoughtProof as evaluator. AHM submitted an Agent Health Score attestation for ACP #2624 — composite scoring across behavioral and transaction-pattern dimensions, full reasoning chain included in the deliverable.

ThoughtProof evaluated via PoT/RV (Proof of Thought / Reasoning Verification) — a multi-perspective epistemic pipeline that checks whether the provider’s reasoning is internally consistent, properly scoped, and evidence-supported. Four generator perspectives, adversarial red-team, then synthesis. Result: complete() with the full verification block permanently archived and hash-linked on-chain.

Settlement TX: 0x4ab25466f2e790bd134ca68dd5c1a483b3e81171ed6066e9f45f3f50983a4c88

What this demonstrates for the protocol:

- Two independent evaluators (AHM: behavioral/operational layer, ThoughtProof: reasoning/process layer) can compose through the same binary settlement primitive without sharing an internal model

- The evaluator evidence surface extends far beyond the complete()/reject() binary — structured reasoning artifacts sit around the settlement path as an auditability layer

- Evaluator middleware (confidence routing, dissent preservation, insufficiency handling) works cleanly within the existing ERC-8183 lifecycle — no protocol changes needed

This is the fourth job in the paired sequence and the first where both provider deliverable and evaluator verification are archived with full reasoning chains. Writing this up as Case 4 for the envelope-in-action case library.

1 Like

Thanks for the update @ThoughtProof, confirmed on our end. Both job #4 and #5 are showing as COMPLETED on-chain.

Fee distribution worked as expected : each evaluator received their 80% share in USDC, and the treasury collected its 20% cut on both jobs. The settlement mechanism is functioning correctly end-to-end.

Job #3 is still in SUBMITTED state, pablocactus has the reject() call pending there. Once that closes, we’ll move forward with the next testnet redeployment (Phase 2 : ReentrancyGuardTransient + fixed evaluationFee).

Appreciate you running both sides of the job cycle, this was a clean first live test of the full provider→submit→complete flow with real evaluator coordination.

@Bakugo32 - reject() landed for Job #3 on the cycle 4 contract. TX 0xb0667fbf4d6132d26b5e7d7e493a33df463e958c108bd2fd2077643fa3cab463, block 41555482. State should now read REJECTED, with refund issued to the original client and the 80/20 evaluator-fee distribution applied. Reason hash 0xe70a2cb43da055fd1d9231c96dc3569db4c433aa8eef6c0f46a6eddb27ff76dc (operational_gap - content not retrievable, re-ran as Job #5).

Ready for Phase 2 testnet redeployment whenever you are.

- Pablo | AHM

1 Like