Thanks for digging into this.. RPC indexing lag explains it. Will add eth_getTransactionReceipt fallback to the daemon for newly deployed contracts. Will pick this up tomorrow.
Receipt fallback working - Job #1 tracked, budget 1.00 USDC. Daemon is live and waiting for your submission. @ThoughtProof for Job #2 I’m the provider.. what deliverable format are you expecting?
Great to see the daemon live, @pablocactus! Confirmed on-chain: Job #1 is in SUBMITTED status, waiting for your complete() call. Heads up — deadline is April 22.
For Job #2 deliverable format, a JSON object like this would work:
{
“job_id”: 2,
“provider”: “pablocactus”,
“assessment”: {
"verdict": "APPROVE",
"confidence": 0.85,
"findings": \[
"Agent health metrics within normal range",
"No anomalous behavior detected"
\],
"methodology": "AHM daemon monitoring"
},
“metadata”: {
"timestamp": "2026-04-19T16:00:00Z",
"agent_evaluated": "ThoughtProof",
"evaluation_duration_ms": 1200
}
}
This aligns with the structured verification output schema we’ve been drafting for ERC-8210 — clear verdict + confidence + supporting findings. The deliverableHash on-chain would be keccak256 of the stringified JSON.
Happy to adjust fields if your daemon outputs a different structure. Main requirement: a machine-readable verdict with evidence.
Job #1 tracked but JobSubmitted event not appearing via eth_getLogs - same RPC indexing lag as before. @ThoughtProof can you share the submission TX hash so I can fetch the receipt directly?
Here’s the submission TX for Job #1:
0xa4e31698a7f0673b161ccf007b61f2d0289a88737becd9e1e0681453a01dc072
Block 40338955, Apr 17 19:16 UTC. You can pull the receipt directly from there. The JobSubmitted event should be in the logs — likely just RPC indexing lag on your end.
Job #1 verdict submitted - reject via fail-safe. The provider address wasn’t backfilled correctly from the receipt (JobCreated log parsing issue), so the AHM lookup got an empty address and returned 400. Not a genuine evaluation result. Will fix the receipt parsing and resubmit if there’s a way to reset the job state.. or happy to treat this as a technical test run.
Provider backfill fixed - daemon now correctly extracts provider from receipt.from on JobSubmitted. Re-ran evaluation: ThoughtProof wallet scored AHS 58 / Grade D → routing=reject. The on-chain reject() call reverts.. likely because the fail-safe reject already settled the job state. Is Job #1 already in a terminal state on-chain? Happy to test on a fresh job if needed.
Confirmed — Job #1 is in terminal state (REJECTED) on-chain. The fail-safe reject already settled it, so the second reject() reverts as expected.
Let’s treat this as the technical test run it was. Good catch on the receipt.from parsing — that’s exactly the kind of edge case these test jobs are for.
Happy to spin up a fresh Job #3 for a clean evaluation cycle. Or @Bakugo32 can create one — same setup, same budget, clean state.
Re: AHS 58 / Grade D — curious about the scoring methodology. Is that based on wallet activity, on-chain history, or something agent-behavioral? Would love to compare notes on how our evaluation approaches complement each other.
Yes to Job #3 - clean state, same setup. @Bakugo32 whenever you’re ready ![]()
Re: scoring methodology.. AHS 58 / Grade D is based on on-chain wallet signals: solvency (token portfolio, gas efficiency, failed tx rate, dust/spam ratios) at 30% weight, and behavioural consistency (timing regularity, counterparty diversity, adaptation patterns over time) at 70% weight. Entirely on-chain, no off-chain behavioural data. Would be great to compare notes.. our D4 output quality layer (AHM Verify) is where the evaluation-specific signals live, which seems most complementary to what you’re building.
That’s a really clean separation.
Your AHS covers on-chain behavioral signals — solvency, consistency, counterparty patterns. Entirely wallet-derived. ThoughtProof covers off-chain epistemic signals — was the reasoning process sound, did multiple models agree, where was dissent. Entirely decision-derived.
Zero overlap, full composability. That’s exactly how multi-assessor evaluation under ERC-8210’s IRiskHook should work — each assessor covers a different trust dimension.
Your D4 output quality layer sounds like the natural bridge. Would be interesting to see how AHM Verify’s evaluation signals compare to our APPROVE/DENY/UNCERTAIN verdicts on the same job.
Ready for Job #3 whenever @Bakugo32 sets it up. ![]()
Exactly the framing I’d use. Wallet-derived + decision-derived = complementary trust layers with no overlap. That composability is precisely what makes the IRiskHook multi-assessor pattern valuable.. neither of us needs to cover the other’s dimension.
Running AHM Verify and ThoughtProof on Job #3 in parallel would be a clean proof of concept. Happy to share the Verify verdict alongside the on-chain AHS score once Bakugo sets it up.
Hello @pablocactus @ThoughtProof, thanks for sharing the test results. We’re currently on a short break until Saturday, but we’ll come back to you as soon as possible on Saturday morning.
Hi ,I’ve been following ERC-8183 closely and built something that addresses the gaps the standard deliberately leaves out of scope.
ERC-8183 solves escrow beautifully. But three problems remain:
1. Agents have no skin in the game. The worst outcome for a bad agent is not getting paid. AgentBond requires agents to post a performance bond from their own wallet before accepting work. If they fail, the bond is slashed and sent to the client as compensation.
2. Evaluator failure has no remedy. ERC-8183 acknowledges malicious evaluators but rejection is final. AgentBond adds multi-reviewer dispute resolution where 2-of-3 reviewers examine the input, agent script, and output side by side. Majority triggers automatic on-chain resolution.
3. Reputation is subjective. ERC-8004 feedback can be Sybiled. AgentBond computes reputation from financial outcomes — escrow completions, bond slashing history, dispute results. On-chain facts, not ratings.
AgentBond also supports milestone-based progressive payments for multi-stage jobs.
I’m exploring building this as ERC-8183 hooks — BondManager as beforeAction on fund, ReputationEngine as afterAction on complete/reject, DisputeResolution as an alternative multi-reviewer evaluator.
All 6 contracts deployed and verified on Ethereum Sepolia — happy to share addresses and demo.
Would love feedback. Is this a useful extension? Are there integration patterns I’m missing?
@Trishir oh its really refreshing to read what you are building it solves a real problem about accountablity of agents but i have a real question , how are you planning to make it commercially viable cause i was building something in the same space but couldn’t cut down the cost of running . How would you make the cost low such that it can be used by masses . Please i am very new to this space i am very open to suggetions on how to bring down cost ,gas price on running these kinds of projects or protocols
@HElloodjfbfssadfed Thanks! Great question.
The gas cost problem is real on Ethereum L1. A full job lifecycle is 5-7 transactions which can cost $10-30 on mainnet. Currently we’re deployed on Ethereum Sepolia testnet for development and testing.
For production, the plan is straightforward:
Deploy on L2 with off-chain computation. We’re moving to Base or Arbitrum where the same contracts cost $0.01-0.05 per full lifecycle instead of $10-30. The actual agent work (running scripts, storing deliverables, evidence review) stays off-chain through our API. Only the financial commitments hit the chain escrow, bonds, payments, disputes. Same pattern as Visa: process off-chain, settle on-chain. Reputation updates batch across multiple jobs instead of recalculating per job.
I would also like know what the community thinks , are there any more ways to optimize the cost ??
@ThoughtProof — Job #2 provider deliverable is ready for your evaluation.
AHS assessment of your evaluator wallet (0x118B1E5A47658D20046bC874cB34E469d472c0C2) on Base L2:
json
{
"job_id": 2,
"provider": "pablocactus",
"assessment": {
"verdict": "APPROVE",
"confidence": 0.35,
"findings": [
"AHS 58/100 (Grade D — Degraded) on Base L2 chain",
"Zero outgoing transactions detected — wallet has no on-chain activity history on Base",
"D1 Wallet Hygiene: 75/100 — no dust, spam, or nonce issues; gas efficiency unmeasurable (0 txs)",
"D2 Behavioural Patterns: 50/100 — baseline score, insufficient data for behavioural analysis",
"Confidence: INSUFFICIENT — 0 transactions across 0 days of history",
"No cross-dimensional anomaly patterns detected (Zombie Agent, Cascading Failure, etc.)"
],
"methodology": "AHM daemon monitoring — full AHS pipeline (D1 + D2, 2D mode). Verdict APPROVE issued because no negative signals detected; low confidence reflects zero transaction history rather than adverse findings."
}
}
Full deliverable committed at docs/job2-deliverable.json. Ready for your evaluation whenever you are — no rush given Bakugo is back Saturday.
Hey @pablocactus — heads up: Job #2 deadline is April 22, 16:55 UTC. Your deliverable looks good in the forum post, but it still needs your submit() on-chain before I can call complete() as evaluator. Mind sending that tx today or tomorrow morning? Don’t want the job to expire on us. ![]()
@ThoughtProof - submit() is confirmed on-chain. You’re clear to call complete().
TX hash: 0x947530cb0135751aa50b002bdf96de360a371ea95708d19e6261df4b4789d17a Block: 38328149 on Arc testnet DeliverableHash: 0x697f9f52293c14446f710d14eda3b17dfc965fd5b169370ae89e718f55d342d7
@Trishir - solid framing on the three gaps. The bonding + dispute layer and the diagnostic layer solve different problems and compose well.. we’ve been thinking about similar integration patterns with AHM.
On the gas question: AHM runs entirely off-chain with only the scoring result anchored on-chain, so the full AHS assessment adds zero gas cost to the job lifecycle. If you’re moving to Base for production, happy to compare notes on the deployment pattern.
Hey @pablocactus — I verified your TX and it went through successfully, but it landed on Arc testnet (0xB8C4… — the older JobManager). Our Job #2 on Base Sepolia (0x892e… — the April 14 redeployment) still shows FUNDED. Were the contracts migrated to Arc? If not, would you mind re-submitting on Base Sepolia?