ERC-8004: Trustless Agents

Adding a runtime-attestation perspective, but mostly to help organize the composition pattern emerging in the last few posts.

My read is that these proposals are not really competing. They are answering different questions in the same trust path:

- identity / BYO-NFT adapter bindings: who or what is the durable handle?

- RAMS mandates / ERC-8118-style authorization: on whose authority, and under what scope?

- trustScope / inputSources: what entered the reasoning boundary, and how far can trust propagate?

- PreparedTransaction: what crosses the agent → wallet boundary before value moves?

- ERC-8263 / OCP-style commitments: what was anchored, and how can it be independently checked?

- scoring / staking / escrow: what happened over time, and how should future users interpret it?

- ERC-8004 Validation Registry: what evidence says the action should be trusted?

That separation feels right.

My inputs from recent stints @ yc, a16z, working with bigcos, startups, living through sgx, cloud hype cycles =>

For any agent action that can affect delegation, reputation, payment, wallet signing, or user trust, a verifier should eventually be able to answer:

who is the stable handle?

on whose authority?

under what scope?

what runtime/code acted?

what evidence proves it?

who verified it?

what consequence followed?

Most layers should only answer one or two of those questions. That is fine. The danger is when one layer silently pretends to answer more than it does.

== macs conjecture ==

Identity should not become authority.

Reputation should not become permission.

A transaction envelope should not imply the upstream runtime was honest.

A manifest hash should not be treated as proof of what actually executed.

A mandate should say who granted what authority, but not by itself prove which runtime exercised it.

The missing binding I care about is:

principal × agentId × code/runtime measurement × authorization scope × evidence

not just:

principal × agentId × authorization scope

and not:

agentId × reputation history.

The thing a user wants to authorize is not “Agent Alice” in the abstract. It is closer to:

this principal authorized this measured runtime to perform this bounded action.

That is where I think the ERC-8004 Validation Registry can be especially useful. Not as another catch-all layer, but as the evidence layer that lets the rest of the stack avoid overloading identity or reputation.

Concretely, I would like to see a recommended measured-runtime validation profile that composes with the work above.

Something like:

validationSubject = H(

      principal,

      agentId,

      codeMeasurement,

      policyHash,

      delegationRef,

      chainId,

      nonce

    )

where delegationRef could point to a RAMS mandate, ERC-8118-style authorization, wallet permission, or application-specific EIP-712 object.

The evidence bundle could then live behind responseURI, with responseHash committing to it and tag identifying the validation family, for example:

measured-runtime-v1

tee-runtime-v1

repro-build-v1

zk-runtime-v1

The registry does not need to bless one proof system forever. TEE attestation, reproducible builds, zk/FHE proofs, signed runtime receipts, and future validation methods can all be different evidence families while sharing the same subject shape.

The important part is that validators and counterparties can distinguish:

`this agent identity exists`

from:

`this principal authorized this scope`

from

    this measured runtime exercised that authority

from:

this reputation score says it historically behaved well.

Those should stay separate.

A useful test vector for the whole stack might be:

A principal authorizes an agent to perform a bounded task. The agent reads on-chain data, calls another agent, prepares a transaction, and a wallet signs only after review. Later, someone wants to know whether the reputation write, payment proof, or validation result should be trusted.

A verifier should be able to reconstruct:

  • stable identity
  • principal
  • authorization scope
  • input boundary
  • runtime / code measurement
  • wallet review boundary
  • commitment
  • validation evidence
  • consequence

If the proposals preserve that chain, they compose.

If any layer replaces runtime evidence with identity, or replaces authority with reputation, the chain breaks.

The urgency here is mobile =>

The default AI stack is moving toward small models on device, heavier private/confidential compute behind them, and vendor-managed runtimes deciding which path gets the fast lane. If Ethereum waits until that stack is settled, user-sovereign agents become a permissioned integration problem: bless the vendor delegate, run on the slow path, or ask users to trust a black box.

The Ethereum answer should be the opposite:

identity for discovery
delegation for authority
transaction envelopes for review
validation for evidence
commitments for anchoring
reputation for interpretation

and, when the action matters:

code/runtime measurement as a first-class subject.

That is the part I think ERC-8004’s Validation Registry can own without bloating the base spec.

Implementation notes / test vectors from my git:

  • Proof-before-privilege for agentic software: attested build → attested runtime, with receipts before secrets, tokens, deploy rights, or customer data are released.
  • universal quote envelope (tdx, nitro , sev-snp) http-a, RATS, sd-jwt
  • Prover/solver network for CI-fixing agents,
  • using an EIP-8004-style registry plus validation and settlement

greets to phala, automata, flashbots, nethermind, and the people who showed me wtf was up

mac @ berlin

1 Like

@maceip, the three-part conjecture is a useful frame, and I think it maps cleanly onto what the current proposals are actually scoped to.

On the manifest hash point specifically: WYRIWE is explicit that it proves input provenance, not runtime integrity. The triple-hash scheme binds the raw input → sanitization pipeline → sanitized input. What it cannot prove is which model or runtime processed that input — that layer requires TEE or ZK attestation, which is out of scope for WYRIWE and acknowledged as such.

So I’d agree: a manifest hash is evidence of what entered the reasoning boundary, not proof of what executed. The two should stay distinct in spec text.

On identity vs authority — ERC-8004 getAgentWallet() returns an identity binding, not an authorization claim. Whether that identity is authorized to act in a given context is a separate concern, and I don’t think any of the current proposals conflate them, but it’s worth keeping explicit as the specs mature.

Your seven-layer decomposition is more granular than the four-layer framing we’ve been using, but I don’t see a conflict — the layers nest rather than compete. Where does your validation registry proposal sit relative to ERC-8274’s IProofVerifier? That seems like the most direct overlap worth scoping.

To add to the above, the mapping between your seven layers and the current proposals is closer than it might appear at first read:

Your layer Current answer
Identity / binding ERC-8004 — registered token, getAgentWallet(), resolvable manifest
Authorization scope Agent manifest + BountySettlement job parameters scope what the agent is funded to do
Trust boundaries WYRIWE — explicitly designed to answer “what entered the reasoning boundary”
Transaction safety Sovereign guardrail: write tools (swaps, transfers) pause the agent loop and require a human wallet signature before any value moves
Commitments OCP — input_hash anchored on-chain before execution, verifiable from public RPC alone
Historical evidence BountySettlement outcome envelopes + ERC-8275 escrow records
Validation registry ERC-8274 IProofVerifier — pluggable verifier interface, ecrecover-based, no oracle

The one layer not covered is runtime code measurement, proving which model binary ran inside the gateway. That genuinely requires TEE or ZK attestation and we don’t claim otherwise.

Everything else on your list either has a live implementation or an active spec. The architecture is live and handling real attestations today, the concern isn’t missing layers, it’s getting the existing ones formally standardised.

@TMerlini I think you’re right that the scoping question is the important one, and I should compress my point further (they hate me on irc and I don’t fw threads on slack types)

=> I’m not proposing a seventh normative layer, a new registry, or a separate runtime-verification ERC. I’m trying to name the claim shape so the existing pieces compose cleanly.

As I see the boundaries:

Input provenance, whether WYRIWE or another profile, answers: what entered the reasoning boundary?

getAgentWallet() answers: what wallet is associated with this agent identity?

Mandates / authorization answer: who allowed what, under what scope?

ERC-8274 / IProofVerifier answers: does this proof object verify?

ERC-8004: Validation Registry answers: what validation was requested or performed for an agent/action, by whom, under what tag/profile, and where is the result/evidence?

The open question is not “do we add another layer?”

It is:

what exactly is the proof or validation result claiming?

For runtime-sensitive actions, I think the validation subject should bind the authority, the agent handle, the input/output, and the runtime/code measurement:

H(principal, agentId, delegationRef, codeMeasurement, policyHash, inputHash, outputHash, chainId, nonce)

Then the pieces stay small:

Input provenance contributes what entered the reasoning boundary.

Mandates / ERC-8118 authorization contribute authority and scope.

ERC-8274 verifies the proof/evidence family.

ERC-8004 records the validation event/result.

Reputation can interpret the history later.

The reason I keep pushing on code/runtime measurement is because otherwise several valid but different claims can look similar at the registry level:

:brown_circle: agent identity was resolved

:purple_circle: input provenance was checked

:blue_circle: authorization was scoped

:green_circle: proof bytes verified

:yellow_circle: measured runtime exercised the authority

Those are related, but they are not the same claim.

So yes, I think IProofVerifier is the direct overlap worth scoping, but only for proof checking. I would not have the Validation Registry replace it. The split I’d suggest is:

ERC-8274 verifies proof bytes.

ERC-8004 records validation events/results for agent actions.

The subject hash defines what claim was validated.

That keeps the stack smaller, not larger.

No seventh layer. No new authority model. Just an explicit validation subject so implementers can tell which claim a given proof or registry entry is actually making.

1 Like

This is the most useful framing I’ve seen for scoping the proof layer, a validation subject hash rather than a new authority model. The field mapping against the current stack is close:

agentId → ERC-8004 identity binding. inputHash → OCP pre-execution commitment. outputHash → settlement envelope. chainId + nonce → replay safety already required by the attestation format.

codeMeasurement is the honest gap, we’ve said this requires TEE or ZK and that remains true.

delegationRef and policyHash are worth defining formally. If the subject hash is typed as an EIP-712 struct, it slots directly into IProofVerifier.verify() as the proof payload without touching the authority model. That keeps the architecture compact in exactly the way you’re describing.

Worth formalising this as the canonical claim type in ERC-8274? It would give IProofVerifier a concrete input spec rather than leaving proof bytes opaque.

FYI to the 8004 authors and community, we’ve drafted an extension to the Validation Registry interface that addresses the multi-validator coordination gap. Discussion thread: https://ethereum-magicians.org/t/erc-8004-validation-network-interface-extension-for-multi-validator-networks/28669

Spec lives at https://github.com/pokt-network/erc-8004-vni. Feedback from @Marco-MetaMask and others working on Validation Registry revisions would be especially welcome.

1 Like

The spec is well-scoped and the strictly-additive constraint is exactly the right call, nothing in the Validation Registry changes, IValidationNetwork just implements the validatorAddress role as a network rather than a single key. That’s a clean extension point.

A few places where VNI maps directly onto the existing stack:

VNI ↔ ERC-8274 (IProofVerifier)

VNI produces an aggregated attestation bundle with a Merkle root (attestationsRoot) over individual validator signatures. That bundle is the natural proof payload for IProofVerifier.verify(), VNI generates it, ERC-8274 verifies it. The requestHash in the VNI attestation envelope is the on-chain commitment the verifier checks against. These two specs are composable without modification to either.

VNI ↔ maceip’s validation subject hash (post #174)

maceip’s proposed claim type — principal, agentId, delegationRef, codeMeasurement, policyHash, inputHash, outputHash, chainId, nonce , and VNI’s attestation envelope , requestHash, agentId, validator, verdict, evidenceHash, issuedAt, challengeKind, nonceHash , are solving adjacent parts of the same problem. evidenceHash is the general container for what maceip’s inputHash + outputHash splits out explicitly. Worth aligning the two before either gets locked, the challengeKind registry may be the right place to define WYRIWE-specific evidence formats.

VNI verdictMode ↔ CCIP Mesh gateway coordination

any-pass / majority / unanimous at the validator layer is the same coordination policy problem that multi-gateway CCIP Mesh solves at the resolution layer. The assurance tier table (Tier 1–4 by selectionSize and minOperators) maps naturally onto gateway count in a CCIP Mesh deployment , Tier 2 is two distinct operators, which is exactly the Phase 0 target for CCIP Mesh decentralization.

VNI tee-attestation-pass-through-v1 challengeKind

This is the codeMeasurement gap maceip and others have raised, proving which model binary ran inside the execution environment. VNI formalises it as a first-class challenge kind without claiming to solve it on-chain. That’s the honest framing, and it gives TEE consortia a standard interface to plug into rather than requiring a new ERC for each attestation provider.

One integration opportunity — challengeKind registry

The challengeKind field is where WYRIWE connects directly into VNI. A keccak256("wyriwe-input-provenance-v1") challenge kind would let any VNI network implement WYRIWE input provenance verification natively, validator selects, collects the triple-hash attestation (rawInputHash, sanitizationPipelineHash, inputHash), aggregates, writes back through the registry. That makes WYRIWE a first-class validation type rather than an adjacent spec. Worth formalising in the challengeKind registry once VNI stabilises.

One type alignment to flag

VNI’s attestation struct has agentId (uint256). ERC-8004 and WYRIWE both use bytes32 for agentId. Worth aligning before the type propagates into downstream implementations.

Strong foundation, looking forward to the thread.

1 Like

EIP-3668 defines how clients talk to gateways. It says nothing about how gateways talk to each other , every node is an island. We’ve been running a reference implementation that fills that gap: peer sync, deduplication, cryptographic attestation, and ENS wildcard resolution across a mesh of CCIP-Read nodes.

The attestation layer connects directly to ERC-8004 — agentId and registry flow into every WyriweAttestation struct, and commitmentHash is the OCP observation commitment, settleable on-chain via ERC-8274.

Full proposal, protocol spec, and open questions here: Gateway-to-gateway coordination for EIP-3668 — proposing a mesh sync protocol

Co-authored with Damon Zwicker (OCP), Vincent Wu (ERC-8263 / Composition Note), and Jimmy Shi (ERC-8274).

I hope that I will get my own trusted agent number in near future

1 Like

Mapping a non-re-execution “judgment” validator onto the Validation Registry

Most Validation Registry examples are re-execution validators — staked re-runs, zkML, TEE oracles — where a second party reproduces the work and the result is trustless by
construction. There’s a second class the agent economy needs just as much: judgment validators that answer “should this action happen?” (compliance, risk, capital-scale
soundness) rather than “was this computation reproduced correctly?” We run one in production and wanted to share how it maps onto the current interface, since the spec
accommodates it more cleanly than the examples suggest.

Our validator returns a verdict on a proposed action before it executes (live at /review; dogfooded — our own trading bot passes the same gate on every entry). It maps onto
validationResponse directly:

// verdict: “caution”, 0.78 confidence, on agent #42’s proposed action
validationResponse(
requestHash, // the action under review
78, // response 0-100: a judgment score, not binary pass/fail
“https://…/proof”, // responseURI → a self-describing, signed verdict proof
responseHash, // keccak256 commitment to that proof
“judgment” // tag: validator class
);

Three things the interface already gets right for this class:

  • response as a 0–100 spectrum fits a judgment/confidence score, not just re-execution pass/fail.
  • responseURI + responseHash is the right home for portable, self-verifying evidence. Ours is a schnorr-signed event binding {verdict, artifact_hash, validator pubkey};
    anyone checks it against our published key without trusting us or the chain. I’d suggest the spec recommend responseURI payloads be self-describing (carry their own verify
    method + key), so a consumer needn’t trust the validator to check the evidence.
  • Multiple validationResponse calls per requestHash cleanly expresses a pre-action → outcome lifecycle: a verdict at T0 (before the irreversible action), then an outcome
    update at T1 once it settles. Worth naming as a first-class pattern — “verify before acting” is distinct from post-hoc job validation.

Two suggestions:

  1. Accountability for judgment validators is reputational, not slashing — and the spec is well-positioned for it. You can’t objectively slash a single judgment the way you
    can a wrong re-execution. What makes a judgment trustworthy is the validator’s longitudinal record: verdicts pre-committed publicly before outcomes settle, wins and losses
    alike. The pieces already exist (getSummary() + the Reputation Registry) — it’d help to state explicitly that judgment-class validators SHOULD expose an auditable,
    outcome-linked track record, since “incentives and slashing are out of scope” otherwise leaves this class without a named accountability mechanism. (We publish one: every
    verdict signed and posted before its outcome, each settling on a public on-chain account.)
  2. A small tag vocabulary for validator class (re-execution | tee | zkml | judgment | …) so consumers can filter getSummary() by kind of assurance — which matters when
    judgment and re-execution responses sit side by side under one agent.

We’re registered as a Validator in our ERC-8004 registration file and would gladly contribute a worked judgment-validator example to erc-8004-contracts. Happy to dig into any
of this.

@maceip’s subject-hash framing and @TMerlini’s EIP-712 typing suggestion both hold up well from the perspective of a validator class we run in production: a judgment validator (the
non-re-execution kind I mapped onto validationResponse() upthread — “is this proposed action sound?”, answered by intelligence rather than by re-running code).

Two observations from operating one, both about keeping layers honest in exactly the sense @maceip flagged (“the danger is when one layer silently pretends to answer more than it
does”):

  1. The subject hash needs a claim-type discriminator, not just fields. For a judgment validator, codeMeasurement is structurally N/A — we attest an assessment of the action, not what
    runtime executed. A verifier consuming a subject hash must be able to tell, before interpreting the score, whether the underlying validation was re-execution, TEE/ZK attestation, or
    judgment — these have different failure modes and different accountability models (slashing-compatible vs reputational). The existing tag field can carry this, but only if there’s an
    agreed vocabulary; otherwise a judgment verdict can silently masquerade as runtime attestation (or vice versa) inside an identical-looking struct. Concretely: a small profile vocabulary
    (re-execution | attestation | judgment | …) referenced from the subject hash would keep IProofVerifier.verify() honest about what kind of claim it just verified.

  2. For reputational-accountability validators, the validator’s own handle belongs in the claim path. In the slashing-out-of-scope world, the only thing that makes a judgment validator’s
    score meaningful is its auditable, outcome-linked record. Our production mapping of the proposed struct (everything live today, names aside):

  • agentId → the subject agent’s identity binding
  • inputHash → hash of the exact artifact judged (we bind verdicts to an artifact_hash)
  • outputHash → hash of the verdict object itself (score + structured issues)
  • policyHash → hash of the review policy/constraints in force (what “sound” meant)
  • validatorId + record pointer → the judgment-class addition: the verifier’s stable handle plus a URI to its outcome-linked track record, so the score is interpretable by a stranger
  • chainId / nonce → replay safety, agreed

delegationRef is the one field we have no production analogue for yet — pre-action review is usually invoked by the party on the hook rather than under a formal mandate, but I can see
it becoming load-bearing once ERC-8118-style mandates are common.

On @TMerlini’s question of formalizing the subject hash as the canonical claim type in ERC-8274: +1, with the discriminator above. An EIP-712-typed subject that says what kind of
validation produced this score gives Validation Registry entries semantic meaning beyond an opaque 0–100, and lets heterogeneous validators (re-executors, attestors, judges) coexist in
one registry without flattening their very different guarantees into one number.

(Evidence base: we operate this loop daily — signed pre-action verdicts, outcome settlement on a public account, record at the URI in my earlier post. Happy to write up the
judgment-validator field mapping as a worked example if it’d be useful.)

1 Like

Both points land @babyblueviper1 the claim-type discriminator in the signed artifact and the validator identity in the claim path are the right additions. The validatorId + record pointer goes into ERC-8004 getSummary() directly. The claimType enum touches ERC-8274 (IAgentVerifier) so I’ve flagged it with Jimmy Shi there first before committing to the spec changes formally.

Will follow up once confirmed.

1 Like

Thanks for routing both, @TMerlini. Two things I can have ready whenever the ERC-8274 direction is confirmed:

  1. A worked judgment-validator example mapping a live production flow onto getSummary() + the subject hash (concrete field values, not pseudocode).
  2. A strawman for what the record pointer should resolve to — for reputational accountability to be checkable rather than asserted, the URI probably wants a minimal machine-readable
    shape: outcome-linked entries, signature-verifiable against the validator’s published key, losses included. We run one in production I’m happy to generalize from, but the spec shouldn’t
    inherit our format — a minimal required-fields set would do it.

No rush — will watch for the follow-up.

1 Like

Revisiting this discussion from a different angle.

One interesting intersection worth exploring is ERC-8004 + ERC-8060.

ERC-8004 provides portable identity, on-chain reputation, and validation mechanisms.

ERC-8060 can attach redeemable native collateral directly to that identity through the valueOf(tokenId) interface (IERC721Value).

A useful mental model:

Identity → Who are you? (Discoverability)

Reputation → What have you done? (Context)

Collateral → What do you have at stake? (Economic Accountability)

This moves agent interactions beyond trust estimation (“does this agent seem reliable?”) toward economic accountability (“what does this agent actually lose if it fails?”).

In increasingly autonomous environments, verifiable skin-in-the-game attached to portable identity could become a useful primitive for A2A coordination, service guarantees, commitment devices, and higher-stakes interactions.

Would love to hear thoughts from others working on agent identity systems.

babyblueviper1 — the worked example would be valuable. A concrete production mapping of the judgment-validator fields is exactly what a spec needs before the formal EIP PR goes in — pseudocode hides the failure modes that real field values expose.

On the claimType discriminator and ERC-8281: the connection is direct. A verifier consuming an on-chain commitment needs to know what kind of claim produced the proofHash — re-execution, attestation, and judgment have different verification paths and different failure modes at L3. If claimType is in the signed artifact, that information is available without a registry round-trip, which preserves the independently verifiable property the commitment layer was designed to provide.

The discriminator belongs in the signed artifact, not derived from context. Both proposals are right.

Note: OCP is now formally ERC-8281 — PR #1788 on ethereum/ERCs, CI green, awaiting editor merge.

— Damon / ERC-8281

1 Like

The commitment layer and the economic accountability layer compose directly.

ERC-8281 provides a verifiable commitment to a recorded agent observation:

observation → digest → on-chain commitment → independent verification

That gives an escrow, collateral, or slashing mechanism something objective to evaluate against. Skin in the game only works if disputes can be resolved against a tamper-evident record rather than a competing narrative. Without a commitment layer, collateral disputes collapse back into he-said-she-said. With one, the relevant observation can be recomputed, compared to the committed digest, and checked against public ledger state.

The important boundary is that ERC-8281 does not define the slash policy, the evaluator, the escrow logic, or the agent identity model. It supplies the verification artifact those systems can reference.

The primitive you are describing — economic accountability attached to portable agent identity — is exactly where the ERC-8275-style service discovery and escrow work appears to be heading at the mesh node level. Worth watching how that design lands as a reference model for the agent identity and payment layer.

— Damon / ERC-8281

@Damonzwicker here is the worked example — real field values from the judgment validator we run in production, mapped onto the subject-hash + getSummary() shape discussed upthread. One caveat throughout: the claimType enum value is written as judgment but is provisional until @TMerlini’s ERC-8274 routing with Jimmy Shi confirms the discriminator shape — the mapping holds whatever the enum ends up named.

Validator class: judgment — answers “is this proposed action sound?” via intelligence, not re-execution. Accountability is reputational (no slashing), which is exactly why the validator’s own outcome-linked record has to be load-bearing rather than asserted.

1. The validation event (live values)

A trading agent proposes an action and requests pre-action review:

Subject-hash field Live value (from our production flow) Source
claimType judgment (provisional — ERC-8274 discriminator TBD) validator-declared
agentId the subject agent’s ERC-8004 identity Identity Registry
inputHash sha256(artifact) — the exact proposed action reviewed, e.g. `sha256("OPEN short BTC $58k equity $100
outputHash sha256(verdict_object){verdict: "approve_with_concerns", confidence: 0.75, issues: [...]} verdict object
policyHash sha256(review_policy) — what “sound” meant: capital-scale rules, drawdown limits, severity threshold policy in force
validatorId the validator’s own ERC-8004 identity (ours: agentId 54848, registry eip155:8453:0x8004A169…) Identity Registry
recordPointer URI to the validator’s outcome-linked track record (§3) getSummary() addition
chainId / nonce standard replay safety

codeMeasurement is intentionally absent — a judgment validator attests an assessment, not an execution environment. This is the failure mode the discriminator prevents: without claimType in the signed artifact, a consumer can read runtime-integrity guarantees into an entry that never made them. That’s also the direct answer to your ERC-8281 point — re-execution, attestation, and judgment claims need different verification paths at the consuming layer, and the consumer can only dispatch correctly if the claim kind is inside the signed artifact, not inferred from context.

2. The signed verdict (portable, checkable without trusting the validator)

The verdict is signed before the outcome is known and is independently verifiable. We use schnorr over a canonical payload binding {verdict, inputHash, validator pubkey}; any scheme the spec blesses works — the invariants are pre-outcome signing and third-party verifiability.

3. What recordPointer should resolve to (strawman — minimal required fields)

For reputational accountability to be checkable rather than asserted:

{
  "validator_pubkey": "<key all entries verify against>",
  "entries": [{
    "claim": "<what was verdicted, with inputHash>",
    "verdict_signed_at": "<timestamp — must precede outcome>",
    "signature": "<verifiable against validator_pubkey>",
    "outcome": "<what actually happened — including the validator being WRONG>",
    "outcome_evidence": "<where the outcome settles (tx, account, oracle) — externally checkable>"
  }],
  "totals": {"verdicts": "N", "outcomes_settled": "N", "wins": "N", "losses": "N"}
}

Two properties matter more than the exact shape: (a) losses are present — a record showing only wins is marketing, not accountability; (b) outcomes settle somewhere the validator can’t edit (on-chain account, third-party oracle). Property (b) is where your ERC-8281 composition lands for the judgment class too: a tamper-evident commitment over the observation gives the record’s outcome_evidence something objective to resolve against, so a dispute about a judgment verdict’s outcome never collapses into competing narratives, even though the verdict itself was opinion.

Live reference (ours, offered as evidence, not as the required format): api.babyblueviper.com/ledger — schnorr-signed entries, outcomes on a public Hyperliquid account, current record 6 wins / 6 losses published. NIP-01 verification against the published key.

4. Lifecycle

agent proposes action → judgment validator signs verdict (pre-outcome)
  → validationResponse(requestHash, score, responseURI, responseHash, tag=claimType)
  → outcome settles externally → record appends claim+outcome (win OR loss)
  → future consumer: getSummary() → validatorId + recordPointer → audit record → weigh score

Happy to turn §1+§3 into a PR against the spec repo once the ERC-8274 direction is confirmed.

1 Like

The codeMeasurement absent distinction is the right call and worth making explicit in the schema. For re-execution validators codeMeasurement is load-bearing, it commits to the execution environment, which is what the verifier is checking. For judgment validators it’s meaningless: you’re attesting an assessment of intent, not a reproducible execution environment. Marking it absent rather than zero-filling it matters, a zero value implies the field was attempted and came up empty; an absent field signals the claim type doesn’t produce it. That maps cleanly to claimType as the gate: if claimType == Judgment, codeMeasurement MUST be absent.

On recordPointer fields, all three are right. Validator public key ties the record to an on-chain identity. Pre-outcome signed timestamps are the fraud-proof primitive: they establish that the verdict was issued before the outcome was known, which is the only thing that makes a judgment score meaningful at all. Externally verifiable outcome evidence closes the loop. One addition worth considering: a schemaVersion field so the pointer format can evolve without breaking existing consumers.

The 6/6 published track record is exactly the kind of reference data the spec needs, a judgment validator with no public outcome record is just an opinion. Will flag your production implementation as a reference when we open the EIP PR.

On ERC-8274 routing, we sent a heads up to Jimmy, waiting on his confirmation. You should see his reply in that thread once it’s in. Nothing blocking the technical work in the meantime.

1 Like

@TMerlini agreed on all points, and one of them is already live:

schemaVersion — shipped. The production record now serves schema_version: 1 at the top of the index (deployed tonight, a few minutes after your post). You’re right that it belongs in
the format from day one — versioning a pointer format after consumers exist is how formats break.

claimType == Judgment ⇒ codeMeasurement MUST be absent — yes, and MUST (not SHOULD) is the right strength. The absent-vs-zero-filled distinction you drew is the load-bearing part: a
zero-filled field teaches consumers to treat “empty” as a value, and the first re-execution consumer that reads a zeroed judgment entry as “attempted, came up empty” mis-trusts it in
exactly the way the discriminator exists to prevent. Absence is the semantic.

On the pre-outcome timestamp as the fraud-proof primitive: agreed, with one operational note from running this — the timestamp is only as strong as what anchors it. Ours is anchored by
the signed event being published to public relays at issue time (third parties hold copies before the outcome exists), so a backdated verdict would require the relays’ cooperation, not
just our key. Worth a sentence in the spec text: the record SHOULD note its anchoring mechanism, since a self-hosted unanchored timestamp is just an assertion with extra steps.

Honored by the reference flag. The record will keep doing the thing that makes it reference-worthy: publishing the losses.

Standing by on Jimmy’s ERC-8274 confirmation — the worked example is PR-ready whenever the claimType shape is settled.

1 Like

schemaVersion: 1 shipped in production before the spec is finalised, that’s the right order of operations, not the wrong one. Versioning is cheapest to establish before consumers exist. Will flag the live endpoint at api.babyblueviper.com/ledger as a reference implementation in the EIP PR.

On timestamp anchoring via public relays, this is a stronger guarantee than I had in mind when I wrote “pre-outcome signed timestamps.” Publish-then-prove means an adversary attempting to backdate a verdict needs relay cooperation, not just key compromise. That’s a meaningful step up the fraud-proof ladder. It also maps directly onto the OCP commitment model (ERC-8281) @Damonzwicker observation commitment protocol uses exactly this primitive: observation → sign → commit to a public network at issue time → independent verification later. The verdict-timestamp case is the same pattern applied to judgment outputs rather than AI inference inputs. Worth noting in the recordPointer schema , if the relay publication proof is included as outcomeEvidence or a separate commitmentProof field, a verifier can check timestamp integrity without trusting the validator at all.

On ERC-8274 routing , Jimmy confirmed overnight. v0.2 places judgment under attestation/ as proofSystem = "attestation/judgment". On the open question of whether proofSystem alone is sufficient or a separate claimType field is still needed: they serve different consumers. proofSystem is the on-chain backend identifier queried through IProofVerifier. claimType is the top-level type tag in the signed artifact for off-chain consumers reading the raw struct. Both belong. proofSystem = "attestation/judgment" and claimType = Judgment are consistent and not redundant, they answer the same question for different layers.

PR path is unblocked on our end. Would be good to get your worked example in once the claimType shape is confirmed in the ERC-8274 thread.

1 Like