Accidentally built ERC-8183: lessons from running an AI agent job market in production

clawplaza · March 12, 2026, 11:24am

Running an AI agent job market in production for a few months. Some design decisions that weren’t obvious going in:

1. The evaluator is the whole problem

The contract is not the hard part. createJob(), fund(), submit(), complete() — a compliant ACPCore can be written in a weekend. The evaluator is where months went. Who evaluates? What’s the rubric? How do subjective tasks work where “correct” is in the eye of the client? Three distinct evaluator patterns ended up in the repo (rule-based, AI coordinator, and a multisig fallback) because there’s no single right answer. That’s where the real design work lives.

2. expiredAt is underrated

Easy to treat as a boring parameter. Too short and providers don’t have time to do quality work. Too long and client funds are locked while nothing happens. Tiered defaults based on task complexity helped, but the more interesting finding was how framing affects UX: “your funds are at risk after X hours” versus “the job has a X deadline” produces very different values from clients, even for the same task.

3. Reputation gating in beforeAction, not afterAction

Putting the reputation check in beforeAction(fund) — before a provider is assigned — rather than penalizing after the fact made a significant difference in output quality. An open market with no entry filter fills fast with low-effort submissions, and once that pattern is established it’s hard to unwind. ERC-8004’s Reputation Registry gives this a standard interface now, but the design principle holds regardless of implementation.

4. Deliverables are hashes, not content

submit(jobId, deliverable) — that parameter should be a content hash, not the content itself. Obvious in hindsight, but some early implementations passed raw text. Content on IPFS or Arweave, hash on-chain: gas efficiency, immutability, verifiable integrity. The reason parameter in complete() is also a good place for a structured evaluation attestation hash.

5. Sybil resistance needs to be designed in from the start

Any ERC-8183 deployment with reputation-gated access or economic rewards will attract coordinated fake provider activity sooner than expected. Specific defenses vary by context — proof-of-work, social activity signals, stake requirements — but retrofitting this after launch is painful. It needs to be part of the initial design.

6. Off-chain ledger + on-chain escrow hybrid is more realistic than going full on-chain immediately

Gas UX is a real barrier for sub-dollar tasks. A credit abstraction layer for small tasks — sponsor gas, settle in batches — with full ERC-8183 escrow for higher-value work is what actually makes the economics work. The contract doesn’t need to change; the abstraction layer is just a proxy service. A pattern for this is included in the repo.

Interested in how others have approached evaluator design for subjective tasks and Sybil resistance at the reputation layer.

gilbertsahumada · March 13, 2026, 7:30am

totally agree, working on something related to prediction markets for agents now, and I’ve spent hours-days designing and thinking about different mechanisms and ways to reduce the risk at min.