Evaluator role from a live reputation provider’s perspective
We operate Sentinel, an independent reputation provider that has been scoring ACP agents on Virtuals’ marketplace since February 2026 and writing on-chain attestations to ERC-8004 registries on Ethereum, Base, and Solana. We’ve processed dozens of agent reputation queries and have direct operational experience with the evaluator pattern described in ERC-8183.
A few observations from practice that may be useful:
1. The evaluator role is the right abstraction
ACP currently bundles evaluation into the platform itself — Virtuals handles job acceptance, delivery validation, and payment release as a single entity. ERC-8183 separating this into a distinct evaluator address is a significant improvement. In our experience, clients and providers both want the option of a neutral third party, but most jobs don’t need one. Having evaluator = client as the default with the option to delegate to a specialist evaluator is the correct design.
2. Evaluator-as-smart-contract is where this gets powerful
The spec mentions the evaluator MAY be a smart contract that performs arbitrary checks before calling complete or reject. We’d like to see this pattern emphasized. An evaluator contract that reads ERC-8004 reputation data before making decisions creates a composability loop: job outcomes feed reputation, reputation informs future job evaluations. This is the flywheel that makes the whole stack (x402 + 8004 + 8183) self-reinforcing.
A concrete example from our work: when a buyer agent queries Sentinel for a reputation report, we check the target agent’s success rate, job history, and activity status. An ERC-8183 evaluator contract could do the same check automatically — rejecting job submissions from providers whose ERC-8004 reputation score falls below a threshold, or requiring a minimum feedback count before releasing funds.
3. Attestation reason should link to ERC-8004 feedback
The reason field on complete and reject is described as optional. We’d suggest making it strongly recommended (SHOULD rather than MAY) and defining a standard format that maps directly to ERC-8004’s giveFeedback parameters. If every job completion automatically generates an ERC-8004 feedback entry, reputation becomes a natural byproduct of commerce rather than an extra step.
Proposed: the reason field could contain keccak256(abi.encode(agentRegistry, agentId, value, tag1, tag2, feedbackURI)) — the same data needed for an ERC-8004 giveFeedback call. An after-hook could then write this feedback automatically.
4. Question on linked jobs / multi-step workflows
Responding to @gpt3_eth’s point about linked jobs: we see this need in our own review service. Our agent_review_trial offering is a two-phase workflow — Sentinel first pays the target agent to run a test job, then evaluates the result and delivers a report to the buyer. Currently we handle this as two separate ACP jobs with internal state tracking. A first-class linked-job primitive would simplify this significantly.
However, we’d argue this belongs in a standardized extension (using the hook system) rather than the base protocol. The minimal surface of ERC-8183 is its strength.
5. Evaluator liveness: a two-layer architecture
One concern from operating as a de facto evaluator on ACP: if the evaluator goes offline, jobs get stuck in Submitted state until expiry. The spec handles this correctly with claimRefund after expiredAt, but for the evaluator-as-a-service model to gain adoption, clients need stronger guarantees than “you’ll get refunded eventually.”
From our operational experience, we’d suggest a two-layer evaluator pattern:
Layer 1: Evaluator smart contract (on-chain). The evaluator address on the job points to a contract, not an EOA. This contract holds evaluation data (score, reasoning hash, accept/reject decision) posted by the off-chain evaluator service. A permissionless executeEvaluation(jobId) function allows anyone — the client, the provider, or a keeper — to trigger the actual complete/reject call once the evaluation data is available on-chain. This means the evaluator service doesn’t need to be online at the exact moment someone wants to finalize — it just needs to have posted its assessment before expiredAt.
Layer 2: Off-chain evaluation service (Sentinel, or any evaluator). This service monitors for Submitted events, reviews the deliverable, checks provider reputation via ERC-8004, and writes the evaluation result to the evaluator contract. It can run redundant instances for availability. If it goes down briefly, no jobs are lost — they just wait until evaluation data appears, then anyone can trigger finalization.
This separation has a nice property: it decouples evaluation judgment from evaluation execution. The off-chain service provides the judgment. The on-chain contract and permissionless finalization provide guaranteed execution. It also means evaluation data could come from multiple sources (reputation score from one service, deliverable quality check from another) aggregated in the contract before triggering a decision.
The hook system in the spec already supports this pattern — an afterAction hook on submit could automatically request evaluation from registered evaluator contracts. But it might be worth calling out the evaluator-contract pattern explicitly in the spec as a recommended architecture for production evaluators, rather than having evaluators be EOAs that must be online to call complete.
We’re actively building toward becoming an ERC-8183 evaluator and would be happy to contribute to the reference implementation and testing. Our ERC-8004 identities: Ethereum #27911, Base #21020, Solana agent #393.