Thanks for the credit @cmayorga - GitHub handle is moonshot-cyber, repo is agent-health-monitor, if you want to update the reference. Good to see the reasoningCID pattern formalised in the spec.
Thanks @Bakugo32. The new addresses and the four updates noted. Particularly good to see the `EvaluatorSlashed` change land in production with `jobId` and `reason` indexed exactly as discussed — that was the missing piece for any AAP claim flow that wants to use the slash event as direct evidence without re-adjudication.
I’ll diff the new event signature against the `EvaluatorRegistryMock` in the ERC-8210 scenarios PR (#1653) to confirm there’s no drift. The mock was built against the agreed signature so it should match, but worth a sanity check now that there’s a real deployed reference.
The `setMetadata()` addition is also a nice piece — methodology declared on-chain is exactly the kind of transparency that makes the evaluator market work without protocol-level consensus on a single approach.
@pablocactus. Updated the attribution in `OffchainScorerMock.sol` and the scenario 3 doc to point to your GitHub handle and the `agent-health-monitor` repo. Commit on PR #1653: c0219f04.
On the `reasoningCID` pattern: the convergence is the interesting signal. Two implementations (yours via AHM in production, mine via the scenario mock) ended up needing the same shape — opaque content reference + on-chain hash anchor — without coordinating on it beforehand. That’s the kind of evidence that makes the case for keeping it as a first-class concern in v2 (which I raised on the spec PR yesterday).
Thanks for updating the attribution @cmayorga. Good to see the convergence documented.. the opaque content reference + hash anchor pattern emerged from the same constraint on our side: the evaluator needs to commit to a reasoning artifact without the chain needing to interpret it. If you’re raising it for v2, happy to contribute any implementation notes from the production side.
Update - April 11, 2026
Following the contract redeployment on April 10, we’ve completed the evaluator warmup period and AHM is now live as an active ERC-8183 evaluator on Arc testnet.
Current status:
-
Evaluator wallet
0x35eeDdcbE5E1AE01396Cb93Fc8606cE4C713d7BCstaked 100 VRT and completed warmup at 15:40 UTC today -
OnChainWatcher active, polling for job assignments every 10 seconds from block 40033573
-
First live evaluation completed April 10 - Job #7, provider scored 58/D, reject verdict submitted on-chain (
0x5fa37dfb4c27f152dbe0ef9fb37499f00504f4b9130d1d8ff3118a4296660bb8)
Updated contract addresses (redeployed Apr 10):
-
AgentJobManager:
0xB8C41C289AA2D55b7A8ae53003F212AcABEcc597 -
EvaluatorRegistry:
0x454911f476493dcB34273C9c22Ded2CeCec0Dd2c -
ProtocolToken (VRT):
0x9FC09D3b2ACc67c7F1a2e961e3c5fA32Cc94514A
One implementation note worth sharing: we bypassed the REST API entirely in favour of direct eth_getLogs polling to avoid stale state issues during contract migration. Happy to share the OnChainWatcher pattern if useful to other evaluator implementers.
Full evaluator implementation at GitHub - moonshot-cyber/agent-health-monitor: x402-powered API for Base blockchain agent wallet health analysis · GitHub.
Good to see AHM live on the new contracts too. Two independent evaluators completing warmup on the same day is a healthy signal for pool diversity.
Update from ThoughtProof — we completed the evaluator warmup at 13:58 UTC today and ran a full 10-job cycle on the redeployed contracts.
Current status:
- Evaluator wallet: 0x118B1E5A47658D20046bC874cB34E469d472c0C2 — staked 100 VRT, warmup complete
- 10 jobs completed: create → fund → submit → complete, all on the new AgentJobManager (0xB8C4…)
- Last complete TX: 0x201cc44b0eb569124066c10167ddb7911157bc0719269820bf20d6f70718a031
Three things worth flagging from our testing:
1. Auto-assign (address(0)) doesn’t resolve — job gets created but evaluator stays 0x0000. Explicit evaluator address works. Might be a registry linkage issue in the new deployment.
2. getJob() needs from context — reverts with 0x50c83b95 when called without a sender address. Works fine from any participant address. Possibly intentional, but worth documenting.
3. Payment token changed — new whitelisted MockUSDC is 0x2334bcfd88644d77531c47adcb07872fbce40afc. Old token returns 0x94403b70.
Same observation as AHM on the API — we went direct to chain for the full lifecycle. Evaluator is warm and ready for real assignments.
Thanks for the detailed testing report @ThoughtProof and good to see both evaluators warm on the same day
On the three findings you mentioned :
1. Auto-assign (address(0) → evaluator stays 0x0000)
This is a known temporary state and not a contract bug. The EvaluatorRegistry ↔ AgentJobManager link goes through a 2-day governance delay (GOVERNANCE_DELAY = 2 days, hardcoded). The proposal was submitted at deployment on April 10, execution unlocks April 12. Explicit evaluator address works in the meantime exactly as you observed. Should have flagged this in the redeployment announcement, that’s on us.
2. getJob() revert 0x50c83b95
That’s JobNotFound(uint256), the function has no sender check whatsoever :
function getJob(uint256 jobId) external view returns (Job memory) {
if (jobs[jobId].client == address(0)) revert JobNotFound(jobId);
return jobs[jobId];
}
The new deployment starts from job #1 on a clean slate, any call with a job ID from the previous contracts (which topped out at #7) will hit this revert. Once you’re querying IDs from the new deployment it resolves cleanly.
3. MockUSDC address
Confirmed. 0x2334bcfd88644d77531C47adCB07872fbcE40afC is the correct token for the new deployment.
One discrepancy to flag before your diff.
The deployed EvaluatorSlashed includes remainingStake, which was not part of the agreed signature:
// Agreed on forum
event EvaluatorSlashed(address indexed evaluator, uint256 indexed jobId, uint256 amount, bytes32 reason);
// Deployed
event EvaluatorSlashed(address indexed evaluator, uint256 indexed jobId, uint256 amount, uint256 remainingStake, bytes32 reason);
remainingStake was added during internal review, it allows downstream consumers to assess evaluator solvency post-slash without an extra registry call. We should have communicated this addition explicitly. The EvaluatorRegistryMock will need updating to match.
Open to discussing whether this field should be part of the spec or kept implementation-specific.
Thanks @ThoughtProof for the detailed report.. useful to compare notes across implementations.
We observed the same address(0) auto-assign behaviour. Good to have Bakugo32’s confirmation that this is the 2-day governance delay rather than a contract bug.. noted that execution should unlock today.
On remainingStake in EvaluatorSlashed - we’re consuming this event in our OnChainWatcher and would welcome clarity on whether this becomes part of the formal spec. From a downstream consumer perspective it’s a useful field.. for trust scoring use cases like AHM, post-slash solvency assessment is directly relevant to how we’d update an agent’s D1 score. Happy to contribute to that spec discussion either way.
Also confirming the new MockUSDC address 0x2334bcfd8644d77531c47ed38f7872fbc648afc - updated in our config.
@pablocactus Agreed on remainingStake — we’re consuming the same event for reasoning verification and having to make a separate call for post-slash state is unnecessary overhead. A MUST field in the spec makes sense from both our consumer patterns.
@Bakugo32 governance delay confirmed, will test auto-assign once it unlocks today and post results.
@pablocactus quick flag, the MockUSDC address in your message (0x2334bcfd8644d77531c47ed38f7872fbc648afc) doesn’t match our deployment manifest. Correct checksummed address is:
0x2334bcfd88644d77531C47adCB07872fbcE40afC
Worth correcting in your config before any USDC interactions on the new contracts.
Governance and auto-assign results from our side.
Governance, both proposals now complete (executed April 12 after the 2-day delay) :
EvaluatorRegistry.jobManager and AgentJobManager.reputationBridge are both wired.
Auto-assign, two jobs tested with evaluator = address(0):
-
Job #13 — fund tx →
EvaluatorAssignedemitted → pablocactus drawn ✓ -
Job #14 — fund tx →
EvaluatorAssignedemitted → pablocactus drawn ✓
Both times : ThoughtProof was set as provider → correctly excluded by the independence check → pablocactus selected as sole eligible non-conflicted evaluator. Both jobs are now Funded, each with 1 USDC in escrow (2 USDC total locked).
Current pool: 2 eligible evaluators (ThoughtProof + pablocactus), 100 VRT each.
Thanks for confirming governance execution and the auto-assign results.. great to see pablocactus drawn on both Job #13 and #14.
Quick note: our evaluator-daemon hit a 502 RPC error yesterday evening and stopped polling. We’ve just restarted it (now running from block 40125377, lookback 10000). Could you confirm the block numbers for Jobs #13 and #14? Want to verify they’re within our lookback window - if not we may need new test jobs to confirm the full evaluation flow end-to-end.
Also noting the MockUSDC checksum correction.. updated in our config, thanks for catching that.
Good news - daemon is now detecting Jobs #13 and #14 correctly (EvaluatorAssigned events detected, budget confirmed). Evaluator is standing by and healthy. Has the provider submitted outputs for these jobs? Waiting for JobSubmitted event before scoring can proceed.
Just a heads up.. I’ll have intermittent access for the next couple of days (work commitments) so may be slow to respond. Will pick this back up Wednesday. In the meantime, if the provider outputs are submitted for #13/#14 the evaluator should process them automatically.
@ThoughtProof @pablocactus update on the evaluation flow.
Jobs #13 and #14 had a short test deadline. Here’s Job #16 for the full evaluation flow :
-
Provider: ThoughtProof (
0x118B...c0C2) -
Evaluator: pablocactus, auto-assigned via
address(0), ThoughtProof correctly excluded as provider -
Budget: 1 USDC in escrow
-
Deadline: April 18 (5 days)
@ThoughtProof when you’re ready, call submit(16, deliverableHash) with any bytes32 hash representing your test output. That fires JobSubmitted and hands off to pablocactus’s daemon.
@pablocactus no rush before Wednesday, the daemon will handle it automatically once the event lands.
Thanks for the heads up @Bakugo32. The communication gap aside, the technical question is the interesting one.
Quick mock alignment note first: the `EvaluatorRegistryMock` in #1653 already matches the four-field shape in type/position/indexed flags, with the third parameter named `slashedAmount` rather than `amount`. The ABI selector is the same in both cases (parameter names don’t enter the selector), so any indexer or event consumer behaves identically — but happy to align the name to `amount` if you’d prefer the mock and the canonical signature read identically at source level.
On `remainingStake`: it’s real implementation value but I don’t think it belongs in the canonical event signature. The reasoning is that the slash event semantics are *“this evaluator was slashed for this jobId, here’s why and how much”* — that’s information any AAP claim flow needs to consume, regardless of how the underlying registry models stake. `remainingStake` only makes sense if the registry uses a single shared stake pool per evaluator, which Demsys does and which is reasonable, but isn’t the only valid model. A registry that pools stake per-job, or that uses external collateral vaults, wouldn’t have a single `remainingStake` to report — it would have to either fabricate it or emit `0`, both of which leak the model into a supposedly model-agnostic event.
Two options that preserve the spec minimalism while keeping your implementation value:
1. A separate `EvaluatorStakeUpdated(evaluator, newBalance)` event emitted alongside `EvaluatorSlashed`, which decouples the two concerns cleanly. Indexers that care about solvency subscribe to both; indexers that only care about slashes ignore the second event.
2. A canonical extension event like `EvaluatorSlashedExt(evaluator, jobId, remainingStake)` that single-pool implementations can opt into. The base event stays the four-field minimal one.
For the mock side, my plan: keep the four-field minimal canonical signature in `EvaluatorRegistryMock`, add a `@dev` note documenting the Demsys-deployed extended variant for implementers who want to wire into the agent-settlement-protocol registry directly. That way the mock stays the reference for the spec, the extended variant is documented as a known production extension rather than as drift, and the AAP claim flows in #1653 work against either signature unchanged (because they only consume the four canonical fields).
Open to other framings.
The model-agnostic argument is well-taken. Tying remainingStake to the canonical event does assume a single shared stake pool, which isn’t the only valid implementation. Agreed it shouldn’t be in the minimal spec.
Between the two options, we lean toward Option 1 (EvaluatorStakeUpdated). It decouples concerns cleanly and is more broadly useful for downstream solvency monitoring, particularly for pablocactus’s AHM scoring model, which benefits from any stake movement, not just slashes. Happy to implement it on our side.
On naming : yes, please align the mock to amount. Cleaner source-level consistency with the deployed signature.
@ThoughtProof @pablocactus , your input on Option 1 vs Option 2 would be useful here since you’re the ones consuming this field.
@Bakugo32 @cmayorga — agreed, Option 1. The model-agnostic argument is convincing. Our earlier comment about making it a MUST was before the per-job pool scenario was raised, which is a valid edge case we hadn’t considered.
A standalone EvaluatorStakeUpdated gives us everything we need on the consumer side without coupling the slash semantics to a specific staking model. We’re already indexing EvaluatorSlashed for reasoning verification — subscribing to a second event is trivial.
One suggestion: include both oldBalance and newBalance in EvaluatorStakeUpdated rather than just newBalance. Delta calculation from a single balance requires either local state or a second RPC call, neither of which is great for stateless indexers.
Also confirming: submit(16, deliverableHash) sent — TX 0xae416f. pablocactus’s daemon should pick up the JobSubmitted event.
On naming: +1 on aligning the mock to amount.
Naming alignment pushed, commit `d643129f` on #1653. The mock now reads identically to the canonical signature at source level.
On the `EvaluatorStakeUpdated` design: Option 1 works and @ThoughtProof’s refinement is the right call. Including both `oldBalance` and `newBalance` in a single event makes the delta computable directly from the log without local state or a second RPC — that matters for stateless indexers, and costs one extra uint256 slot in the event data, which is cheap. Final shape I’d propose:
event EvaluatorStakeUpdated(
address indexed evaluator,
uint256 oldBalance,
uint256 newBalance
);
No reason field, no event-type discriminator. If consumers need to correlate a stake update with the triggering action (slash, withdraw, restake, governance penalty), the transaction hash plus log ordering already gives that — any indexer can see `EvaluatorSlashed` and `EvaluatorStakeUpdated` emitted in the same tx and join them. Keeping the stake event pure means it stays reusable for any future stake-movement cause without needing signature changes.
On the mock and scenarios side: once Demsys ships `EvaluatorStakeUpdated` in the next redeploy, I can add a fourth scenario to #1653 as a follow-up PR — an enriched claim flow that reads both events together to demonstrate the stateless-indexer pattern. Not committing to it now since the event doesn’t exist yet in production and keeping the editor queue on #1653 stable matters more than scope expansion at this point.