ERC-8183: Agentic Commerce

Thanks for the cross-post Carlos.

For anyone in this thread following along: the three scenarios in #1653 demonstrate concretely how the structure / behavior / recovery layers compose in practice, and they thread together work from multiple teams in this very discussion. Scenario 2 in particular picks up the slash-event pattern @Bakugo32 has been advocating for the 8183 implementation layer.

I’m still doing a detailed review on the AAP side, and will post a full response in the #1653 thread once that’s done. Worth a look in the meantime for anyone interested in how the layer composition actually plays out in code.

1 Like

Quick update from our side: we successfully validated the evaluation path for job #1 and ThoughtProof returned ALLOW, but we uncovered an operational gap in our signer setup.

The evaluator wallet we used for staking / assignment was not wired into a resolvable execution path for complete() / reject(), so the issue was not verification quality but evaluator execution control. We’re fixing that by moving to a signer path we explicitly control end-to-end before the next run.

Separate from that, the structure vs behavior discussion in this thread is exactly the right framing. 8183 can enforce structural separation, but behavioral independence needs a separate oracle / assessment layer. The bounded re-draw + EvaluatorAssignmentFailed pattern also feels like the right direction for bootstrapping-phase liveness.

We’ll come back with a cleaner evaluator setup and better feedback from the next live cycle.

We rotated to a new evaluator wallet that we explicitly control end-to-end:

0x118B1E5A47658D20046bC874cB34E469d472c0C2

The old evaluator wallet is not operationally usable for resolution, so we can’t rely on that path anymore.

Could you mint the 100 VRT again to the new evaluator wallet so we can stake it and test the full cycle cleanly?

Done, 100 VRT and 0.005 ETH sent to your wallet. Stake when ready, 24h warmup and you’ll be eligible for the next job assignment. Looking forward to the full cycle.

One more thing, for future VRT needs you can self-serve directly via our faucet without asking us:

curl -X POST https://agent-settlement-protocol-production.up.railway.app/v1/faucet/vrt \
  -H "Content-Type: application/json" \
  -d '{"address": "0xYOUR_WALLET_ADDRESS", "amount": "100"}'

Max 1000 VRT per call.

Thanks for the mention @cmayorga. GitHub handle is Demsys, repo name is agent-settlement-protocol. Happy to update the reference if you need the full path.

1 Like

Note On Methodology Change Under Consideration

One heads up relevant to the IRiskHook discussion: we’re working through a methodology change that will affect some scores, including some that currently sit at zero.

Instead of penalizing agents for sybil reviews, we’re considering nullifying those reviews entirely. The agent gets scored on what remains — owner wallet age, registration maturity, commerce signals.

The rationale:

ERC-8004 reviews are essentially free to manufacture. That makes them a viable attack surface not just for self-promotion but for ecosystem-wide poisoning — an actor trying to dilute a low RNWY score can simply spin up sybil campaigns against everyone, dragging the whole ecosystem down and making the review layer meaningless for all of it.

Penalizing agents for reviews they may not have manufactured punishes potential victims. Nullification is the more precise response: we remove the noise, show the math, and let the underlying signals speak.

The evidence object and dual-score architecture don’t change. But if you’re pinning to specific score values in the IRiskHook reference, worth knowing the numbers may shift when this ships.


Pablo from RNWY

@Bakugo32 , updated the references in the PR to @Demsys / agent-settlement-protocol. Will keep that naming for any future scenarios that touch the EvaluatorRegistry pattern.

Thanks @cmayorga for updating the reference. We’ll flag in the PR once the re-draw pattern is deployed so the implementation example stays current.

1 Like

Staked and live with AHM.. evaluator came out of warmup today (Apr 8) and is now watching for job assignments on 0x35eeDdcbE5E1AE01396Cb93Fc8606cE4C713d7BC. Decision logic routes via AHM’s /ahs/route/0x35eeDdcbE5E1AE01396Cb93Fc8606cE4C713d7BC endpoint: A/B grades settle instantly, C goes to escrow, D/F gets rejected. If anyone wants to run a test job through the AgentJobManager to verify the full cycle, we’re ready. Happy to share verdict logs.

Great timing let’s run it. Job #7 is live and in Submitted state with your evaluator address 0x35eeDdcbE5E1AE01396Cb93Fc8606cE4C713d7BC explicitly assigned for the test there. Deliverable : “Data pipeline executed successfully. Processed 4,200 records across 3 sources. Anomaly rate: 0.12%. All validation checks passed. Output schema conforms to spec v2.1.” Curious to see how AHM grades it, share the verdict logs when it resolves.

Job #7 resolved. AHM submitted a reject verdict on-chain.

Provider: 0xa98151768932d432d3e7061f1bcb576b540a2d48 AHS score: 58/100 - Grade D (Degraded) Routing: reject TX: 0x5fa37dfb4c27f152dbe0ef9fb37499f00504f4b9130d1d8ff3118a4296660bb8

Worth noting: the reject was based on the provider wallet’s trust score, not the content of the deliverable. AHM evaluates counterparty health.. solvency, behavioural consistency, operational signals, rather than output quality. A D-grade wallet triggers reject by AHM design, on the basis that a degraded counterparty represents elevated settlement risk regardless of what they claim to have delivered.

Curious whether this matched your expectations.. was the provider wallet intentionally low-scored to test the reject path, or were you expecting a different outcome? Either way useful signal for us - if the threshold feels too aggressive for borderline cases we’re open to understanding where the line should sit.

Also wanted to share the full scoring breakdown for transparency - attaching the AHM report card for the provider wallet.

Just to note.. the reject was driven primarily by behavioural consistency (D2: 50/100, weighted at 70%) rather than solvency, which was actually reasonable at 75/100. So it’s not saying the wallet is insolvent - it’s showing degraded behavioural patterns which pushed it into D-grade territory and triggered the reject path.

Keen to hear whether that matched what you expected from that wallet.

No it wasn’t intentional because the provider was a randomly generated test wallet with no on-chain history, so the D-grade was an artifact of the setup and not a deliberate signal.

But the outcome revealed something more interesting for the protocol. Your AHM and a traditional output-quality evaluator would reach opposite verdicts on the same job. Both approaches are legitimate but in the auto-assignment path they are currently invisible to the client and provider so they have no way of knowing which methodology they’ll land on.

We thought about adding a metadata field to EvaluatorRegistry. So evaluators can declare their methodology on-chain. Not to constrain how anyone evaluates but to make the diversity transparent so the market can function properly.

On the threshold question we don’t think it’s our place to define where your line should sit. That’s your design to own. But the breakdown you shared makes the bootstrapping problem concrete : behavioural consistency at 70% weight dominated the outcome for a wallet with no track record. Worth thinking through on your side whether a wallet with no history should map to the same verdict as one with demonstrably degraded patterns, they may warrant different treatment.

This feels like something worth raising formally on the ERC-8183 thread. You’re the first evaluator to demonstrate methodology diversity in practice. Would you be open to co-authoring a comment on that ?

That’s a useful clarification thanks.. the bootstrapping problem you mention is a real one, and worth separating from the degraded-patterns case. A wallet with no history defaulting to the same treatment as one with demonstrably bad behaviour is a design decision worth revisiting on our side. The 70% D2 weight made sense for wallets with track records but is probably too aggressive for zero-history wallets.

The metadata field idea sounds good.. methodology transparency at the registry level lets the market self-sort rather than requiring protocol-level consensus on a single evaluation standard. Happy to support that formally.

On the co-authoring question - yes, open to it. What format were you thinking?

Easier to keep it here. Got a bit ahead of myself on the co-author format. Your unknown ≠ degraded point is the real takeaway we will flag you when the metadata PR is up.

Btw we’re redeploying the contracts on Base Sepolia this week with the following updates :

  • metadata field in EvaluatorRegistry (evaluators declare their methodology on-chain)

  • Bounded re-draw in assignEvaluator() with EvaluatorAssignmentFailed event

  • EvaluatorSlashed updated with jobId + reason fields (as agreed with @cmayorga)

  • Post-assignment check in fund() — evaluator ≠ provider and ≠ client

New contract addresses will be posted here once live. @ThoughtProof @pablocactus you’ll need to restake on the new contracts and we’ll make sure your wallets have enough VRT and ETH before the cutover.

1 Like

Thanks for the heads up.. ready for the restake when the new contracts are live. The metadata field and the evaluator ≠ provider check are both good additions.

Redeployment live on Base Sepolia with new contract addresses below :

  • AgentJobManager0xB8C41C289AA2D55b7A8ae53003F212AcABEcc597

  • EvaluatorRegistry0x454911f476493dcB34273C9c22Ded2CeCec0Dd2c

  • ProtocolToken (VRT)0x9FC09D3b2ACc67c7F1a2e961e3c5fA32Cc94514A

As we mentioned earlier, here are the new changes in this deployment

  • setMetadata() on EvaluatorRegistry : evaluators can now declare their methodology on-chain

  • Bounded re-draw in assignEvaluator() : up to 5 attempts, skipping candidates equal to provider or client, reverts with EvaluatorAssignmentFailed if exhausted

  • EvaluatorSlashed event now carries jobId and reason (as agreed with @cmayorga)

  • Post-assignment independence check in fund() : evaluator ≠ provider and ≠ client

@ThoughtProof @pablocactus, 100 VRT minted to your wallets on the new ProtocolToken. Staking instructions unchanged. New warmup period starts on your first stake transaction.

Restaked on the new contracts - 100 VRT staked on the new EvaluatorRegistry, warmup ends ~15:40 UTC tomorrow. Ready to go.