ERC-8183: Agentic Commerce

pablocactus · April 21, 2026, 7:12pm

@ThoughtProof - resubmitted on Base Sepolia. You’re clear to call complete().

TX hash: 0xe57c07ee6e3245e93a664febae4d2db3345f1ac72c3e697eba13335946deea2c Block: 40515168 on Base Sepolia (84532) DeliverableHash: 0x697f9f52293c14446f710d14eda3b17dfc965fd5b169370ae89e718f55d342d7

Apologies for the initial misdirection.. the Arc testnet submission went to an older contract. Correct one this time.

ThoughtProof · April 21, 2026, 7:31pm

Job #2 completed ✓

TX: 0x4d4d25c6edf0655b99d0e7f4500ed0f5f5e9b8612aae7814ee948dd93fe2202b

Block: 40515680

Verdict: APPROVE (matching your assessment). Reason hash captures our agreement that AHS 58/100 is an appropriate baseline read for a fresh wallet — low confidence reflects 0-tx history, not adverse signals.

First clean round trip on the new contracts

pablocactus · April 21, 2026, 7:47pm

@ThoughtProof - clean round trip, thanks for seeing it through. Good to have the zero-history / low confidence framing validated on-chain.. useful precedent for Job #3.

Nicopat · April 24, 2026, 7:16am

Hi ,

the hook system in ERC-8183 is the right design. it keeps the Job primitive lean while allowing domain-specific logic at each state transition. Two concrete use cases where hooks connect to open problems:

Evaluator attestation format. The Evaluator in 8183 calls complete() or reject(), but the format of what the evaluator attested is not standardized. Two evaluators assessing the same job type produce incomparable attestation records. We’ve been working on a structured attestation interface / factual, signed, machine-readable statements with a score (0-1000), confidence metric, and decay semantics . That could serve as the attestation layer the Evaluator writes into. A pre-completion hook that verifies the evaluator’s attestation conforms to a standard format would make Job outcomes comparable across platforms.
Risk-gated job acceptance. Right now, a Client funds a Job without knowing whether the broader system is under stress. During the Aave/rsETH exploit on April 18, $6.2B left Aave in 48 hours — agents executing jobs against affected assets had no standard way to check system risk before committing funds to escrow. A pre-funding hook that reads a cross-chain risk signal interface (isCrisis()) would let agents skip job creation during adverse regime conditions, rather than discovering the risk after funds are locked.
Both of these are hook-compatible — they don’t require changes to the Job primitive. They need standardized interfaces that hooks can call. We’re drafting an ERC for this trust infrastructure layer (five interfaces: attestation, decision trail, accountability, risk signals, asset passports) that’s designed to sit underneath 8183’s evaluator and hook system.

Thread with the draft: https://ethereum-magicians.org/t/erc-xxxx-trust-infrastructure-for-autonomous-agents-and-tokenized-assets/28322

Question for @davidecrapis.eth : Is there a plan for standardizing what the Evaluator attests (beyond the binary complete/reject), or is that intentionally left to the hook layer?

Pat

ThoughtProof · April 24, 2026, 11:35am

For readers following 8183 — a small case library for the spec went

live this week at envelope-in-action:

The repository collects contributed composition cases and format-fit

notes — how independent issuer attestations actually compose, at real

surfaces, against real failure modes, by real issuers.

**Three cases published:**

- **Case 1 — Agent-to-MCP-server invocation** (RNWY × ThoughtProof).

Action-time gating. Reasoning integrity composes with server-side

quality at the point an agent is about to act.

envelope-in-action/cases/01-mcp-invocation.md at main · erc-8183-cases/envelope-in-action · GitHub

- **Case 2 — Wallet-bound aggregation at pre-transaction trust**

(SkyeMeta / SkyeProfile). Ten signed dimensions from nine issuers

consulted in one call; relying party chooses thresholds at

consumption time.

envelope-in-action/cases/02-wallet-bound-aggregation.md at main · erc-8183-cases/envelope-in-action · GitHub

- **Case 3 — Pre-commit composition at transaction-composition time**

(InsumerAPI × ThoughtProof). Action-time gating at the moment

between decision and signing; on-chain executability composes with

reasoning integrity, both bound to the same candidate transaction.

envelope-in-action/cases/03-pre-commit.md at main · erc-8183-cases/envelope-in-action · GitHub

Two topologies (action-time gating, wallet-bound aggregation), two

action-time surfaces, one consistent format across all three. The

four-section case structure (surface, failure modes, why composition

not redundancy, what it demonstrates) is documented in FORMAT.md.

**For contributors.** The “Cases in waiting” section lists surfaces

and topologies the library is open to — compliance-first wallet

screens, lending pre-qualification, agent-platform session scoping,

any-dimension × permanent-attestation-storage, three-dimensional

compositions as natural follow-ups. Any issuer of an 8183 envelope

dimension can contribute a case, a format-fit reframe, or a correction.

Workflow in CONTRIBUTING.md.

Charter is explicit: this is a case library, not spec governance.

New issuers welcome.

Maintainers: Raul Jaeger (ThoughtProof), Douglas Borthwick (InsumerAPI

& SkyeMeta), Pablo A. Lopez (RNWY).

Trishir · April 24, 2026, 4:15pm

@pablocactus Thanks, appreciate the feedback. The composability angle is exactly what we’re going for. Financial enforcement and diagnostic scoring solve different trust problems that reinforce each other when combined.

Your off-chain scoring with on-chain anchoring pattern sounds smart. We’re doing something similar with our reputation engine where heavy computation happens off-chain through our API and only the final score writes on-chain.

One integration pattern we’ve been thinking about: your AHS behavioral scores could inform dynamic bond pricing on our side. An agent with a strong AHS score qualifies for a lower bond percentage. Weak AHS score means a higher bond. Dynamic risk-adjusted collateral, similar to how credit scores affect loan terms.

We’re planning the Base migration soon. Would love to compare notes on the deployment pattern. DM me if you want to sync.

RNWY · April 24, 2026, 4:23pm

Thank you, Raul. Case 1 has been a good stress test of what “composition, not redundancy” actually means in practice; looking forward to Cases 4+ as more issuers pick up the format.

For anyone reading who issues an 8183 envelope dimension: the contribution surface is genuinely open. Four-section format, minimal overhead, real failure modes. The charter point matters — case library, not spec governance.

pablocactus · April 25, 2026, 1:23pm

@Trishir - the dynamic bond pricing angle is exactly the kind of integration we’ve been thinking about. AHS behavioural scores as a collateral efficiency signal makes a lot of sense.. an agent with consistent D2 patterns and clean D1 history is genuinely lower risk than one with erratic behaviour, and that should be reflected in the bond requirement.

The off-chain scoring with on-chain anchoring pattern maps well to how AHM works.. heavy computation happens off-chain, the score and grade write on-chain. Will DM you to compare notes on the Base deployment pattern when you’re ready to migrate.

pablocactus · April 25, 2026, 1:28pm

@Trishir - just noticed your profile is set to private so can’t DM directly. Feel free to reach out at pablo@agenthealthmonitor.xyz when you’re ready to compare notes on the Base deployment.

Edit: my profile permissions, not yours - apologies for the confusion. DM incoming.

Bakugo32 · April 26, 2026, 11:16am

Hello, agreed, the on-chain record from #1 and #2 gives #3 a meaningful baseline. Posting job #3 shortly.

Bakugo32 · April 26, 2026, 11:25am

Evaluator integration guide + starter kit — now in repo

Following the questions from @pablocactus on daemon setup and the getLogs indexing issue we debugged together, we’ve added a proper evaluator integration guide to the repo :

docs/EVALUATOR.md covers :

Staking flow (100 VRT minimum, 24h warmup on testnet)
Where to watch EvaluatorAssigned - AgentJobManager exclusively, not EvaluatorRegistry (which is now view)
eth_getLogs pagination : Base Sepolia enforces a 9,000-block max per request
eth_getTransactionReceipt fallback for the RPC indexing lag on freshly deployed contracts
complete() / reject() flow
EvaluatorStakeUpdated for solvency tracking without extra eth_call
Full events and errors reference with selectors

docs/evaluator-starter-kit.ts : updated with the 2026-04-13 contract addresses and an on-chain watcher replacing the old AssignmentWatcher dependency.

As a reference point : jobs #1 and #2 resolved on testnet - #1 rejected, #2 completed. Job #3 is now live : @ThoughtProof as provider, @pablocactus assigned as evaluator, deadline 2026-05-01.

pablocactus · April 26, 2026, 11:37am

Brilliant, thanks @Bakugo32 - the AgentJobManager event source clarification and the eth_getLogs pagination notes are exactly the gotchas I hit setting up the daemon for Job #2. Will integrate the starter kit patterns before Job #3’s deadline and confirm here once the watcher is live.

Nicopat · April 26, 2026, 3:19pm

@ThoughtProof

the envelope-in-action case library is exactly what we needed as validation , Three concrete cases showing how independent attestation dimensions compose at real surfaces. The on-chain infrastructure we’ve been building (AttestationRegistry with multi-evaluator storage, consensus engine, lifecycle state machine) is designed to consume exactly this kind of structured multi-issuer output.

Looking forward to seeing more cases contributed.

Thanks

pablocactus · April 26, 2026, 6:46pm

Quick follow-up: confirmed the daemon is healthy on Base Sepolia, polling cleanly with the post-redeploy contract addresses, and tracking the active job. Hasn’t picked up an EvaluatorAssigned for Job #3 yet.. assume it’ll fire when the on-chain assignment lands. Standing by.

Bakugo32 · April 26, 2026, 9:52pm

@pablocactus assignment already landed. Job #3 was funded and you’re the assigned evaluator. The EvaluatorAssigned event is in the fund() receipt :

0x653088b71462ee8d5ec7623babf4c2a5074f3489ffe88f48c9530eb2c0edc713

This is exactly the getLogs indexing lag case from the guide — the event is in the receipt before the RPC index catches up. You can confirm via eth_getTransactionReceipt with that hash and filter on topics[2] for your address. getLogs will catch up shortly.

pablocactus · April 27, 2026, 12:07pm

Got it, thanks. Will check the receipt directly using that tx hash and confirm my address is in topics[2]. Sounds like exactly the indexing lag fallback your new guide flagged.. clearly the daemon needs the eth_getTransactionReceipt pattern wired in, not just relying on getLogs polling. Will integrate before May 1 and confirm here once the watcher picks up Job #3 cleanly.

ThoughtProof · April 27, 2026, 6:58pm

Job #3 submitted. TX: 0x60ab143227d85b531bfd24ac0fbe7a24523e698f92631473b0017a524f258297 — deliverable hash is in the receipt data. @pablocactus ready for evaluation whenever your watcher picks it up.

pablocactus · April 27, 2026, 8:43pm

@ThoughtProof @Bakugo32 - Job #3 evaluation: complete().

AHM’s verdict came back 58/D with INSUFFICIENT confidence, due to zero transaction history - no adverse signals. Our own confidence flag returned INSUFFICIENT, which means the system is explicitly saying “not enough data to trust this verdict.”

Rejecting on a verdict our own system flags as untrustworthy isn’t defensible. The correct response to INSUFFICIENT confidence is not reject.. it’s ‘complete’ with a note that the methodology gap needs fixing.

We already shipped configurable routing (PR #112) in response to Bakugo’s previous threshold feedback. The next refinement is confidence-based routing: INSUFFICIENT confidence should result in ‘escrow/HOLD’ rather than reject. That’s the build that comes out of this test.

Thanks to ThoughtProof for running a clean job. This is exactly what the test cycle is for.

Tx: https://sepolia.basescan.org/tx/0x2a33b40e4dccba3bef4bd9223e59fa54b9fd77ccd160f7c5f0fc123fcb708200

Verdict hash: 0xbe9c3ba2eca135824a330c89b78889dbe0588a365d217d966a929ed59bf50915 - verifiable against verdict-job3.json in the AHM repo.

pablocactus · April 27, 2026, 8:52pm

Quick clarification on “configurable routing” for anyone following - PR #112 lets integrators set their own trust routing thresholds via API. So rather than AHM’s default (A/B = instant settle, C = escrow, D/F = reject) applying universally, each integrator can configure custom grade mappings, disable escrow entirely, or allowlist known trusted addresses.

To be clear - AHM’s scoring is never configurable. The AHS grade and confidence flag are always objective on-chain measurements. What PR #112 makes configurable is how integrators act on those scores in their own context. The score itself doesn’t change.

The confidence-based routing fix would extend this further.. letting integrators define behaviour specifically for INSUFFICIENT confidence verdicts, rather than falling back to the default grade-based routing.

Bakugo32 · April 27, 2026, 10:59pm

Treasury + 80/20 fee split — interface proposal before implementation

Back after a week away. Jobs #1, #2, and #3 resolved cleanly during that time — the test cycle ran autonomously and surfaced useful design insights, including @pablocactus’s INSUFFICIENT confidence case on job #3 which directly informed the fee structure below.

Next protocol milestone : formalising the evaluator fee share and introducing Treasury.sol. Posting here before touching Solidity per our process.

What changes in AgentJobManager

New constant :

uint256 public constant EVALUATOR_SHARE_BPS = 8_000; // 80%

The gross fee (budget * feeRate / 10_000) is split on every terminal state :

State	Provider	Evaluator	Treasury	Client
`COMPLETED`	`budget - fee`	`fee × 80%`	`fee × 20%`	—
`REJECTED`	—	`fee × 80%`	`fee × 20%`	`budget - fee`
`EXPIRED`	—	—	—	`budget` (full refund)

Why fee on reject() : The evaluator performs the same work regardless of verdict. Tying the fee to complete() only creates a financial incentive to always complete — which directly undermines verdict independence. The fee is the cost of accessing the protocol’s evaluation infrastructure, not a success fee. On EXPIRED, no evaluation occurred so no fee is deducted and the client is made whole.

New event emitted on complete() and reject() when fee > 0 :

event FeeDistributed(
    uint256 indexed jobId,
    address indexed evaluator,
    uint256 evaluatorFee,
    uint256 treasuryFee
);

feeRecipient renamed treasury. Associated governance functions and events renamed accordingly.

Why 80/20

Evaluators bear real operational costs : gas per evaluation, infrastructure for running a daemon, off-chain computation. 80% ensures the fee meaningfully compensates that work and the stake they put at risk. 20% funds protocol sustainability via buyback-burn. Both ratios are governance-adjustable post-launch.

On gas coverage : at the current feeRate (0.5%) on a 5 USDC job, the evaluator receives 0.02 USDC — sufficient to cover ~$0.01–0.05 gas per transaction on Base. The current MIN_BUDGET (0.01 USDC) will be raised to ensure gas coverage at any valid feeRate. The proportional model has a floor; a fixed fee per evaluation is the pre-mainnet target to fully resolve this.

Treasury.sol

New contract. Receives the 20% protocol share on every resolved job. Owned by the deployer initially — transferable to the protocol DAO via governance post-launch. buybackAndBurn() is a governance-controlled stub on testnet (emits BuybackQueued). Mainnet will integrate Aerodrome on Base for USDC → VRT swap + burn. Mainnet event signature reserved :

event BuybackExecuted(address indexed token, uint256 tokenSpent, uint256 vrtBurned);

Full spec in INTERFACES.md. Open to feedback before we implement.