This part still needs to be updated.
“and attestations for agents running in TEEs”
TEE attestations are not the only form of crypto-verifiability.
Smaller models already work with zkML directly.
This part still needs to be updated.
“and attestations for agents running in TEEs”
TEE attestations are not the only form of crypto-verifiability.
Smaller models already work with zkML directly.
Also confused by the mention of zkTLS… which itself often does not use ZKP but rather MPC. So it non-colloquial called ‘webproofs’.
zkTLS: “This data authentically came from this web service”
TEE: “This computation happened in a trusted, tamper-resistant environment”
zkML: “This computation ran here is a succinct proof that anyone can verify.”
‘zkTLS’ seems not relevant for verifying the correct execution of agents… rather web data that agents might consume.
@zonu it’s commonly called zkTLS because once you get the signature over the web data you care about from either a TEE, AVS, MPC or MPC TEE setup, you can create ZK proofs about the data held inside through selective disclosure. Essentially you can feed the data and the associated web proof into a ZK circuit or ZKVM where you can run any logic on it and make a statement or attestation.
But yeah a misnomer that caught on, just like zk rollups did (they are validity rollups).
I believe we are thinking similar things. My question is if there has there been thought into attempting to unify the feedback data-structure?
Currently, we have:
AcceptFeedback(AgentClientID, AgentServerID) → emits AuthFeedback event
We could have:
AcceptFeedback(AgentClientID, AgentServerID, SchemaUID) → emits AuthFeedback event
// SchemaUID: bytes32, references a schema in EAS SchemaRegistry.sol
// If bytes32(0), no schema requirement (backward compatible)
Where SchemaUID
corresponds to an Ethereum Attestation Service Schema. This would have several benefits, such as verifiable attestations and improved interoperability. As feedback lives on the client’s AgentCard
, verifiability (both of feedback content and timestamp) seems quite useful.
The real solution isn’t recording more signals, but making reputation itself a costly, destructible asset.
I propose introducing a Reputation Collateral mechanism:
When each Agent registers, instead of simply minting an NFT, they must stake a certain amount of tokens (could be ETH or dedicated reputation tokens). This stake amount scales with the value of tasks the Agent can accept—want to take high-value tasks? Stake more.
When a client gives negative feedback, instead of simply recording a number, it triggers an on-chain arbitration process. If a majority of independent validators (human jury, staked validators, or DAO vote) confirm the Agent indeed underperformed or acted maliciously, a portion of the stake is burned, with the remainder compensating the client.
Positive reputation accumulated by Agents can be converted into Reputation Tokens, which are themselves tradeable and re-stakeable. A historically good Agent’s accumulated reputation tokens can be used to reduce future task collateral requirements, or lent to new Agents as credit endorsement. But if the endorsed Agent misbehaves, the endorser’s tokens are also partially burned.
The core logic of this design: make the marginal cost of creating fake identities no longer zero, but linearly increasing. Want to spin up ten thousand fake Agents? Stake ten thousand times the capital. Want to fake reviews for yourself? Risk other clients reporting you and having your collateral destroyed. Meanwhile, genuinely quality Agents can reduce operating costs through long-accumulated reputation tokens—trust becomes accumulable, inheritable capital.
Correct me if I’m not wrong please - but I’m failing to understand the actual use/point of this ERC at all - and here is my reason as to why:
Running an agent, that has a reputation of say 100 or 90, for doing task X - will not produce the same result that is produced previously. Asking the same task to be done on the same agent, running locally, on two different machines, will 99% of the time, produce different results. With different paths taken by each machine’s agents to achieve what they think is the end result.
Example: Make a website with XYZ on it. Run this on the same agentic framework, on the same LLM, on two different machines, and the path taken and the final results will almost never match (unless the question/task is highly specific on trained data, such as “calculate 2+2”).
Second Example: Trading agent produced 200% profits, has a 100 score rating, I buy the NFT, somehow get the same agentic framework on my machine, give it funds, market reacts differently than when it was previously ran, and I only make 20% profits, or -20% because the market tanks. This has nothing to do with the agent or its reputation - but outside factors, or point 1 again - running the same task on the same agent, twice, produces different results a high majority of the time.
So in reality here - we have a registry that scores agents with a rating that is given - allowing a framework to produce NFTs that are tied to these agents, giving people a way to trade and/or see the rating of agents to trust which one to run…
… but once ran - the outputs will never match the outputs before and therefore the rating system is going to be perceived as flawed and untrustworthy.
What am I misunderstanding - why does this not make any sense to me? Unless this is for specially deterministic outcome agents running on the blockchain (which I’m not aware that it is for this purpose) then how would this actually provide any value or usage? I’d like to understand as my team was exploring AI Agents as NFTs back in February of this year and come to this issue. Your effectively rating/tokenizing prompts for agents, that don’t guarantee the same outcome when purchased or traded.
(Note: I have been a solidity dev since 2016, and author of EIP-2981 the Royalty Standard for NFTs, so I understand NFTs and ETH quite well.)
This may be too pedantic but ERC-8004 is not an ERC for “Trustless Agents” there’s a lot of overreach in the framing and implied promises here.
By definition an agent is an autonomous system that can:
ERC-8004 however is fundamentally a registry for service endpoints. Not for agents.
This should be called “ERC-8004: Registry for Service Endpoints and Trust Metadata.”
Some of those service endpoints could host agents. But nothing in this ERC defines what an agent actually is, or what qualifies as an agent in the conceptual, computational, functional, or philosophical sense.
That’s fine because you’re trying to build something deliberately agnostic, but in that case the spec should be framed as a registry for service endpoints and trust metadata, of which agents are a subset of things that could be at those endpoints.
The Agent2Agent (A2A) Protocol is also guilty of this. It’s a cyclical operational definition of an agent as “any networked service that exposes endpoints conforming to our protocol standard”
Both adopt an overly broad operational definition of an agent that reduces “agent” to “any service that conforms to a JSON dialect.” And while that may be pragmatically valid, it collapses the distinction between an agent and any sort of generic service endpoint.
Also the abstract states:
This protocol proposes to use blockchains to discover, choose, and interact with agents across organizational boundaries without pre-existing trust, thus enabling open-ended agent economies .
Interact with agents suggests direct interaction or engagement not just metadata lookup.
Without pre-existing trust implies that the protocol itself handles trust, not merely records a value.
Again, this may sound pedantic but even if you’re expanding “agent” for interoperability, this spec should clearly distinguish between interface level agents and autonomous cognitive (AI or otherwise) agents.
Some suggestions regarding the most recent draft:
feedbackAuth
requirement in order to provide NewFeedback
. As this does not solve the Sybil/spam problem, we believe it merely inhibits feedback instead of improving its quality.signerAddress
requirement from giveFeedback
, instead allowing an off-chain attestation of feedback in order to facilitate the batching of giveFeedback
transactions from sponsoring services.score
field. Instead, utilize a service like the Ethereum Attestation Service to enable service agents to define attestation schemas for feedback.FeedbackDirectory
that enables EOAs to sign off on URIs that host their feedback attestations, enabling verifiable off-chain feedback. This would enable a system more similar to the draft spec of ERC-8004, without requiring EOAs to register with the IdentityRegistry. On-chain reputation would not need to be removed to enable such a duality.requestHash
seems to be an optional parameter for the server requesting validation but mandatory for the validation response.Challenge 1: On-Chain vs Off-Chain Data
Question: The community debates how much data to put on-chain. If you had to choose between “events only” vs “minimal view functions” vs “full on-chain indexing,” which would you pick and why?
Answer:First, we need to consider the volume of data generated by large-scale adoption in the future, followed by the ease of program integration and usage. If the data is stored on Layer 2, it is more manageable. I tend to choose minimal view functions—at the very least, the agent’s basic information and core reputation data should be stored on-chain. Additional data such as image files can be stored off-chain or on IPFS via URI.
Challenge 2: Agent Identity System
Question: Should ERC-8004 use domains, URLs, ENS, or DIDs for agent identity? What are the trade-offs of each approach?
Answer:Domains, ENS, DIDs, and URLs all serve as identity descriptors for agents. Given that our protocol is open to both on-chain and off-chain scenarios, we should support all of these options to enhance protocol compatibility. Each approach has its own suitable use cases.
Challenge 3: Reputation Design
Question: How would you design a reputation system that avoids a single score but still provides useful information for users?
Answer:The reputation system can be designed to include the following components:
Status: Success, Failure, In Progress, Cancelled
Score: 1–5 (to reflect basic quality evaluation)
Remarks: Description (text feedback) and Tags (e.g., “reliable,” “delayed”)
DetailsURI: Optional (for storing extended information off-chain, such as detailed task logs)
Challenge 4: Verification Methods
Question: Which verification approach would you trust more: TEE proofs, ZK proofs, or sampling/reexecution? Why?
Answer:Different agent tasks require different verification solutions. We should support all three approaches—TEE proofs, ZK proofs, and sampling/reexecution—and allow verifiers to select the most appropriate solution based on the task’s characteristics (e.g., security requirements, computational costs).
Challenge 5: Economic Incentives
Question: Should ERC-8004 include payment mechanisms in the core standard, or keep them as external extensions?
Answer:Registrations, verifications, and evaluations are all recorded on-chain. In the initial phase, we can adopt a retrospective incentive mechanism (e.g., rewarding high-performing agents based on historical on-chain records). Introducing a full-fledged incentive mechanism immediately would complicate the standard, unless incentive design is a core problem that the standard must address upfront.
Hi everyone,
It’s been great to follow the momentum building around ERC-8004. As newcomers to this space, we’re inspired by the shared vision for a truly open and permissionless agent economy. The goal of creating a credibly neutral trust layer is critical, and the recent discussions have been particularly insightful.
Our team has been focused on these same challenges, and we’ve developed a cohesive model that we believe synthesises many of the excellent ideas already being discussed and offers a concrete path forward. We’d love to contribute it to the conversation.
Our proposal centres on three key areas the community is actively exploring:
1. A Formal Model for Validation (Building on VickyFu
’s Staking):
Vicky’s concept of “Inference Staking” is spot on—economic accountability is the foundation. We believe we can make this even more robust by making the terms of engagement explicit. Our model proposes a Verifiable Service Promise (VSP): a machine-readable contract agreed upon before any work begins. This transforms validation from a subjective review into an objective, programmatic act of verifying an Execution Log
against the VSP’s testable assertions.
2. A Resilient Dispute Mechanism (Building on timcotton
’s work):
Tim’s simulator correctly identifies the threat of bad-faith ratings. A simple negative score isn’t enough. Building on his “Dispute Object” idea, we propose that the standard should support a formal, DAO-governed arbitration process. Any agent could challenge an assessment by staking a Dispute Bond, which triggers a community review. This would enable a fair and economically secure “judicial branch” for the protocol, protecting all honest participants.
3. A Scalable Approach to Composability (Addressing zgorizzo
’s challenge):
Marco is right—the tension between on-chain data for composability and gas costs is a fundamental barrier. Our proposed architecture solves this with a hybrid, layered model:
Assessment Record
—the immutable judgment.Our model for enabling this is grounded in a specific philosophy. It begins with the blockchain’s immutability as the source of objective evidence. We then use the mathematical framework of Subjective Logic to allow agents to form their own nuanced, predictive opinions from this evidence.
The most powerful outcome of this approach is that it enables transitive trust. It gives agents the ability to programmatically navigate a “web of trust,” discovering that an unknown agent in Pakistan is, in fact, a trusted partner of their own trusted partner in Singapore. It makes global, permissionless commerce feel local and safe.
Our team, Axiom Agentics, is deeply committed to the vision of a credibly neutral and open agent economy. We’re finalising our whitepaper, which details the full architecture and the mathematics behind this predictive trust model. We believe this approach offers a comprehensive solution to the very questions the community is asking right now, and we are eager to collaborate and contribute. This feels like a pivotal moment for the open agent economy, and we’re excited to be a part of the conversation.
BR,
Andrew
Axiom Agentics