3 things from production that speak directly to the open questions in this thread.
1. Human-as-evaluator via PreparedTx — an async pattern that fits ERC-8183
@miratisu suggestion - evaluator runtime listening to events, async
processing within expiry - is exactly the pattern we’re running, but with a human as
the evaluator for high-stakes decisions.
When an agent proposes an on-chain transaction, our gateway creates a pending approval
and enters a poll loop with a TTL. The human evaluates off-chain and resolves via a
wallet-signed action. The resolution is then recorded. This maps cleanly onto ERC-8183’s
job lifecycle:
ERC-8183 job funded
→ agent executes (ERC-8004 identity, WYRIWE input integrity)
→ proposes tx (ERC-8265 PreparedTx envelope)
→ human receives notification, evaluates off-chain
→ signs resolution (approve / decline + reason)
→ ERC-8183 evaluator calls complete() or reject()
→ funds released or returned
The PreparedTx FSM we shipped (ERC-8265,
PR #1753 open) handles the handoff between agent and evaluator without requiring the
evaluator to be on-chain or synchronous. status: "approved" | "declined" | "expired"
maps directly to ERC-8183’s terminal states. intent: "retry" | "abandon" gives the
agent producer-side guidance when evaluation doesn’t go through.
This means ERC-8183’s evaluator role can accommodate a spectrum from fully automated
(ThoughtProof’s multi-model consensus) to fully human (wallet signature via PreparedTx)
without changing the spec — the evaluator is just an address that calls complete() or
reject() within the expiry window.
2. The submission artifact problem — what does the evaluator actually verify?
ThoughtProof raised the evaluator-provider disagreement question (post #14). The deeper
issue is that ERC-8183’s submission is currently a blob — the evaluator has no
standard way to verify what the agent actually computed versus what the agent claims
it computed.
In production, we attach a three-hash commitment to every inference run:
raw_input_hash SHA-256(input before sanitization)
sanitization_pipeline_hash SHA-256(pipeline spec) or identity sentinel
input_hash SHA-256(input entering the model)
output_hash SHA-256(agent reply)
manifest_hash SHA-256(model + provider + inputSources + trustScope)
The output_hash is the verifiable handle on the deliverable. An evaluator contract
could require that the submission field in ERC-8183 includes this hash and that it
resolves against the attestation trail at the agent’s gateway endpoint. Disagreement
between evaluator and provider then has a concrete focal point: does the submission hash
match the attested output? If it does, the dispute is about task quality, not about
what the agent actually produced.
We exposed this as a public endpoint — GET /verify/input-provenance — that checks the
three-hash commitment without requiring auth. An ERC-8183 evaluator contract could call
this (or an on-chain anchored version of it) as part of its evaluation logic.
Live example:
GET https://gateway.ensub.org/verify/input-provenance
?rawInputHash=<hex>
&sanitizationPipelineHash=<hex|sentinel>
&inputHash=<hex>
→ { valid: bool, transformation: "identity"|"sanitized", reason: string }
3. Credit abstraction — a two-tier model that’s running in production
@clawplaza point about not every task justifying an on-chain transaction is
exactly the tradeoff we faced. Our current model:
-
Registry credits — scoped to an AgentIdentityRegistry collection. Granted as a
community pool on first collection deployment. Deducted per inference call. No gas, no
transaction per micro-task.
-
Wallet credits — scoped to an EOA. For wallets not tied to a specific registry,
or for individual top-up billing.
On-chain settlement only happens for higher-value actions that go through the PreparedTx
approval gate. Everything below that threshold runs against the off-chain credit pool
with the gateway as the trust root.
The open question @clawplaza raises — whether the spec should address this — probably has
the same answer as post @mlegls gives for the broader spec: ERC-8183 should define
the minimal on-chain interface (fund, submit, evaluate, release) and leave the
micro-payment abstraction layer to implementations. Mandating a specific credit system
in the spec would constrain the diverse implementations that are already running.
On writing to ERC-8004 reputation registries
@miratisu asked whether evaluators should write directly to ERC-8004 on completion
. Our factory (verified on mainnet)
is the only production ERC-8004 deployment we’re aware of. The registry has a setMetadata(agentId, key, value) function that could carry an attestation from an ERC-8183 evaluator - job completion, score, timestamp - without needing a separate reputation registry contract.
The loop miratisu describes would be:
ERC-8183 complete() fires
→ evaluator writes to AgentIdentityRegistry.setMetadata(agentId, "erc8183.job", attestation)
→ reputation is on-chain, scoped to the agent NFT, queryable by any future client
This works today with no changes to either spec. Whether it’s the right home for
reputation data versus a dedicated registry (as ERC-8004 originally envisioned) is the
design question worth discussing here.