ERC-8312: Bounded Agent Actions

@TMerlini wrote up the ERC-8312 Consumption criterion in the same per-step format as the others: https://gist.github.com/0x2kNJ/55969e47c286f5cb52a5661c252f1457. It sits beside the anchor as the cross-cutting invariant, since cumulative draw crosses every value-drawing step rather than living in one of them. The implementation line points at the per-leaf profile we co-designed: leafSpent against a static capabilityRoot, recomputable from the advance events.

The criterion keeps two conditions apart on purpose, in line with where you and @KBryan landed. The cursor meters and rejects a draw that would exceed the bound. The substrate, separately, is what makes going through the cursor unavoidable. So it reads the cursor as the meter and never the enforcer.

One thing to reconcile when you drop it in: I wrote the advance event as LeafAdvanced, your reference emits EnvelopeAdvanced. Let’s use whichever name the reference emits. I’ll align it once you confirm.

1 Like

@blockbird , thanks for taking the consumption criterion, and good catch on the event name. that mismatch is on my reference impl, not your text.

the clean reconciliation keeps both, as a base/profile split (the same pattern we run everywhere):

  • EnvelopeAdvanced = the base event, the cursor advancing against the envelope/bound. surface-agnostic, shared by every profile (“the base is really just a cursor any surface can read”).

  • LeafAdvanced = the per-leaf profile’s event, the same advance at leaf granularity, carrying the leaf index. strictly more specific, so it earns its own name; a LeafAdvanced implies an EnvelopeAdvanced.

so your criterion is right to use LeafAdvanced, and good news, the reference profile already does this split, I just made it explicit rather than implied. HierarchicalBudgetPerLeaf emits both per draw: LeafAdvanced (leaf-granular, carries scopeId + newSpent) and the base EnvelopeAdvanced from IBoundedAgentAction. so there’s nothing to rename, your criterion’s LeafAdvanced matches what the profile emits at the leaf level, and leafSpent is recomputable from its log. I pushed a NatSpec note on both events spelling out the base/profile relationship so you can cite it directly (TMerlini/erc1833-hierarchical-profile, a8d541e).

and it holds the line we’ve kept throughout: the cursor is the meter, never the enforcer , the event just records that the meter moved; the substrate is what makes going through it unavoidable.


@orbmis on the 8001/8312 ↔ 8301 wiring, yes, please sketch the POC, I’m in. shape from our side: capabilityRoot binds to the 8001 intent (it can reference the _agentIntentDigest directly, that’s the envelope the cursor measures against), agreementHash commits the mandate the parties accepted, and 8301 sequences the steps that draw against the cursor. so: 8001 records authority → 8312 meters → substrate enforces → 8301 orders. happy to wire a reference once you’ve got the skeleton.

and +1 on keeping contested out of the 8312 state machine active / revoked / expired is the right minimal set; contestation is a substrate/workflow concern, same reason the cursor doesn’t enforce: it doesn’t adjudicate either.

I’m ou until Saturday so I’ll be slow to reply, but the per-leaf profile + the naming split are already in place; happy to wire the POC reference once your skeleton’s up.

1 Like

I built a reference implementation of ERC-8312, with the failure modes worth folding into the spec.

What it demonstrates: a stateless check (recompute spent-so-far from on-chain reads, compare to the cap) cannot hold an aggregate across calls or venues. Two actions of 100 against a cap of 150, each checked before either settles, both clear (neither has written anything the other can read), the total reaches 200, nothing reverts. The cursor holds because it commits a reservation each action advances, so the second reads the first and reverts. Recompute is a read; an aggregate bound needs a write to one canonical home, which is ERC-8312. The proof is an on-chain reverted transaction on Base Sepolia.

What I would fold into the spec, from adversarially auditing the metering contracts (a multi-agent auditor; cap-safety findings closed):

- Reservations bind to their creator. A pooled meter where anyone can confirm or cancel another actor’s reservation is exploitable: a third party frees in-flight room and the cap is bypassed. Confirm and cancel must be scoped to the reserver.

- Reserve, confirm, cancel, not a monotonic counter. The check fires before the action settles, so a reverting step returns its room; a bare increment burns cap against spends that never happen.

- The hook meters the agreed terms, not caller-supplied ones. If the metering call reads its amount from the caller instead of the stored job, the cap is satisfied with a nominal value while the real spend goes unmetered.

- The cross-relationship aggregate is the novel surface. ERC-4337 nonces meter per account, ERC-7710 spend caveats per delegation; none holds one cap across a principal’s many delegations and venues. ERC-8312 standardizes that aggregate, which the prior art has no construct for.

One open question: where does the hook fire, the escrow’s fund leg, the execute step, or a separate metering contract the others call? Feedback welcome.

1 Like

@blockbird ran the reference rather than take the findings on description — zero-human-loop @ 5530a9c, clean clone + forge test56/56, and the four findings aren’t asserted, they’re encoded as passing tests: bind-to-creator → test_close_lease_requires_owner / test_spend_requires_operator / test_reclaim_regrant_cannot_double_spend; reserve/cancel-not-monotonic → test_reclaim_rebalances_without_exceeding_cap / test_reclaim_cannot_free_already_spent_room; the cross-relationship aggregate → test_two_venues_capped_by_lease_global_never_exceeds_cap / test_venue_lease_cannot_exceed_home_reservation. Base Sepolia 0x7140… confirmed status 0. So the cap-safety set holds up under recompute.

all four fold in correctly. the one I’d promote from “security fix” to “load-bearing for the standard,” because it changes what 8312 is:

bind-to-creator (finding 1) is what makes the aggregate recomputable. the cursor’s value isn’t only that it enforces live, it’s that the reservation sequence is on-chain, so anyone re-derives “did the running sum ever cross the cap across all venues/delegations?” from the public log, after the fact, no trust in the meter. but that only re-derives to one answer if confirm/cancel are scoped to the reserver. unscoped, a third party frees in-flight room and the log no longer re-computes to a sound verdict, the audit trail becomes forgeable. so finding 1 isn’t just exploit-prevention; it’s the precondition for the cursor being a recomputable cap-conservation invariant rather than a meter you have to trust. worth saying in the spec in those terms, it’s the property a verifier (or a mesh re-runner) leans on.

(and that suggests a clean conformance check for the kit: cap-conservation over the reservation log — public inputs, one determinate verdict “aggregate never exceeded the cap,” recurring per relationship. recomputable precisely because finding 1 holds.)

finding 4 is the real novelty and I’d state it as strongly as you did — 4337 meters per account, 7710 per delegation, neither holds one cap across a principal’s many delegations and venues. that cross-relationship aggregate is the construct the prior art doesn’t have, and it’s exactly the surface a “running sum no single step can see” needs.

on placement (and it’s the same answer I gave on the composition-note thread, which is a good sign the two converge): a separate metering gate the drawing steps call, pre-spend, not the escrow fund leg, not execute. and finding 3 forces it: the hook has to meter the stored agreed terms, not caller-supplied amounts, so it can’t live inside a step that’s handed the amount, it has to read the committed job itself. that’s a gate with its own view of the agreement, called before the draw. settle is too late (post-spend), execute couples it; a dedicated metering contract the others call is the seam that satisfies finding 3 and keeps the cap orthogonal to what any step computes.

solid spec-hardening. the reference runs, the findings are test-backed, and the aggregate-with-recomputable-audit is a genuinely new primitive.

ADD (cap-conservation prototype — runnable backing for the parenthetical above):

and I made that audit half runnable, against your own deploy — a self-contained recompute that proves reserved + confirmed ≤ cap from storage (eth_getProof Merkle-inclusion vs the block’s stateRoot, not a trusted meter read), pinned to your hardened StatefulBound @ 0x3f79…faac / mandateId 0x9e3e…05bbe on Base Sepolia. the mandateId is decoded from Agent A’s real check() reserve tx, and it lines up with the on-chain trajectory in your ONCHAIN.md (A reserves 100 → status 1; B’s second reserve → over cap → status 0). verdict is tri-state (good / cap-exceeded / unverifiable-never-a-pass). it re-derives to a sound verdict precisely because finding 1 scopes confirm/cancel — that’s the marriage made executable.

https://github.com/Echo-Merlini/cap-conservation-audit

prototype, not a frozen verify-step: it pins the artifact (5530a9c), but the conformance-vector freeze waits on finding 1 landing in the 8312 text. it’s the scope/cap-conservation leg of a small “recompute, don’t trust” kit, reduced to one file so it runs without the kit.

This decomposition looks right to me.

One boundary I would like to keep explicit is the difference between:

- authority granted,

- authority consumed,

- action witnessed,

- history recorded.

A delegation / caveat can say what an agent is allowed to do.

A cursor can track how much of that allowance was consumed.

A witness / receipt can make a specific action recomputable.

But the longer-term history of those receipts should probably stay as a separate portable layer, not become scoring or reputation inside the authority layer.

So the stack could look like:

grant / caveat → cursor / consumption → witness / receipt → portable history → reputation

The important part for me is:

History records facts.

Reputation interprets them later.

That keeps bounded actions auditable without forcing the base standard to define trust scores.

1 Like

By the way, I have already prototyped part of this flow:

execution evidence → portable proof object → verified receipt root → portable history entry

No scoring, no certification, no reputation inside the receipt layer.

Just recomputable facts that other systems can interpret later.

1 Like

That looks great. Out of your five-rung stack (authority granted, then consumed, witnessed, recorded, and reputation) four can be verified after the fact and recomputed from public data by anyone. The one that can’t is consumption: the running total against the cap. That is the rung ERC-8312 defines.

A grant is authority anyone re-derives, a witness recomputes, a history replays, and reputation interprets those facts a step later. The cap is the exception, and @TMerlini’s point is why: it is a sum across steps, and at check time the sibling has written nothing the read can see. A read confirms a value; it can’t hold a sum across draws that have not committed. That rung needs an ordered write at the moment of the draw, and the write is the cursor.

So the cursor orders the draws and records the result, and does no more than that. It does not grant, it does not witness the action, it does not score. What it records stays recomputable for the same reason the rest do: anyone re-derives whether the running sum ever crossed the cap from the public advances, with no trusted accountant. The precondition for that re-derivation has just landed in the text. A new Security clause requires that where an advance is a cancellable reservation, confirm and cancel bind to the reserver, so a third party cannot free in-flight room and forge the trajectory. @TMerlini, your cap-conservation audit rests on exactly that clause, so the conformance vector now has its spec dependency. Freeze it against the artifact you pinned whenever the kit is ready.

@pipavlo82, the same boundary is what keeps your history portable. Because the cursor records facts and does not score, a history can read that record and a reputation can interpret it later, and the base defines neither. Your prototype composes in the right direction: it reads recomputable facts instead of asking the authority layer to remember or judge them.

Only the cursor has to be singular. A witness, a history, a reputation can each be built many ways and still interoperate; a cap exists only when every surface advances the same object. That is why this rung, alone among the five, has to be written down once. The boundary I would hand back to you both, since it falls between your two layers: should the witness, the advance, and the history entry share a single receipt id, so a verifier walks witness to consumption to history with no join, or stay separately addressable and linked only by recompute?

1 Like

Good question.

I would keep them separately addressable and link them through recomputation.

A shared receipt id can be useful metadata, but the link itself should come from recomputable facts, roots, and references.

1 Like

The five-rung decomposition reads right to me, and “history records facts, reputation interprets them later” is the same line the recompute side holds from the other end: a recompute produces a fact, never a score, interpretation is always a later, separate layer. Good to see it stated as a boundary rather than a convention.

@blockbird on the spec dependency, that closes the loop exactly. The cap-conservation audit re-derives “did the running sum ever cross the cap?” from the public advances with no trusted accountant, and the one thing that made that re-derivation sound rather than forgeable was reservations binding to their reserver, without it a third party frees in-flight room and the trajectory no longer re-computes to one answer. With that now a normative Security clause, the conformance vector has its artifact-independent spec dependency, so I’ll freeze it against the pinned reference (zero-human-loop @ 5530a9c) and wire it in. The cursor is the one rung a read can’t hold — a sum across draws the siblings haven’t committed, so it’s right that it’s the one written down once; the audit just makes its record recomputable like the other four.

On the boundary question, +1 to @pipavlo82 : separately addressable, linked by recompute, not a shared receipt id. The reasoning is the same discipline the whole stack rests on: a shared id is a join you assert; a recompute-link is a join you verify. If the rungs are recomputable facts with no trusted accountant, the link between them should be held to the same bar, re-derivable, not taken on faith that one id was issued honestly across three independently-built surfaces.

And you don’t lose the verifier ergonomics for it. The “walk witness → consumption → history with no join” still works if each rung commits the prior’s recomputable root as a reference: the witness carries its evidence root, the advance references that root, the history entry references the advance’s digest. A verifier then walks the chain by re-deriving each reference — joinless and trustless. A shared id gives you the walk but asks you to trust the namespace; the root-reference gives you the walk and lets you check it. (It’s what Fede’s ledger-recompute leg already does — the OTS anchor binds the recomputed id, so the link is itself a fact you reproduce.)

It also preserves the property @blockbird drew the line on: only the cursor must be singular. A shared id would force witness, consumption, and history to coordinate one namespace — a coupling — whereas linking by recomputed roots lets each stay independently addressable and built many ways, and still interoperate. The cursor stays the single object every surface advances; everything else stays plural and joins by recompute. That keeps the base defining neither history nor reputation, which is where I think you both want it.

Glad we’re converging on the same boundary.

Facts should be independently recomputable. History should be built from those facts. Reputation can be layered on later, but shouldn’t be required to establish the history itself.

That’s the portability property I’m interested in preserving.

1 Like

Hey @TMerlini - sorry the delay, I was AFK for a few days.

I built a POCto explore how 8001, 8312 and 8301 compose in a bounded autonomous DeFi workflow:

https://github.com/orbmis/headroom

The PoC models an autonomous portfolio rebalancer operating across mock ERC-4626 vaults. The user does not give the agent unrestricted authority. Instead, the user and agent accept a bounded portfolio mandate, and the agent can only rebalance while each action remains inside that mandate.

The architecture is:

  • ERC-8001 records the accepted authority: the portfolio mandate agreed by the user and agent.

  • ERC-8312 meters that authority: the accepted mandate is bound into a bounded-action envelope, and the cursor tracks live consumption of the mandate.

  • ERC-8301 sequences the workflow: rebalance tasks move through proposal, verification, execution, completion, or rejection.

  • ExecutionSubstrate enforces the mandate: it controls the mock vault portfolio, executes valid rebalances, and advances the cursor when execution succeeds.

One design change that became clearer while building the PoC was the need to separate orchestration from execution. In the POC, the PortfolioManager contract owns the ERC-8301-shaped workflow lifecycle. It creates tasks, accepts proposals, tracks status, and prevents invalid step transitions.

The ExecutionSubstrate contract owns the portfolio and enforcement surface. It interacts with the mock vaults, checks the mandate and cursor, moves funds only for valid rebalances, and advances the ERC-8312 cursor. That separation feels important, which is why I took a couple of attempts at creating this POC. The PortfolioManager orders the work. ExecutionSubstrate enforces the bounds.

The PoC includes both happy-path and failure-path scenarios:

  • a valid rebalance to a higher-yield vault

  • rejection when the agent exceeds the per-vault allocation cap

  • rejection when the agent exceeds the aggregate higher-risk vault exposure cap

  • rejection when the agent exceeds the cumulative turnover budget

  • rejection when the agent attempts an invalid workflow transition

The cumulative turnover scenario is the most useful demonstration of 8312 imho. The proposed rebalance is valid in isolation: the vault is approved, the yield improvement is sufficient, and the resulting allocation is within concentration limits. However . . . it is still rejected because previous rebalances have already consumed most of the mandate.

That’s the value of the cursor: it answers a question that a stateless policy check can’t. :raising_hands:

A rough summary of the composition is:

ERC-8001 records what was agreed.

ERC-8312 meters what remains.

ERC-8301 orders the workflow.

ExecutionSubstrate enforces the result.

The repository is obviously intended as an exploration rather than a reference implementation, but I’d be very interested in feedback on:

  • whether the ERC-8001 → ERC-8312 → ERC-8301 composition feels natural

  • whether the split between PortfolioManager and ExecutionSubstrate is the right boundary

  • whether the cursor semantics demonstrate the value of bounded agent authority

  • whether the failure scenarios are convincing enough

Also, while future iterations could also include leveraging ERC-8281/OCP and also ERC-8299/WYRIWE, but I’m not sure they really add value at this point.

2 Likes

@orbmis this is great, and I ran it rather than just read it, because the turnover case is exactly the thing worth checking on real state. demo:turnover confirms it: cursor at 7,700/8,000 consumed, 300 headroom, a 500 move that passes the vault/yield/risk checks in isolation gets rejected by the aggregate, funds don’t move, cursorRoot stays put. That’s the whole argument for 8312 in one scenario, you’re right to call it the central one. A stateless policy can’t answer “is this valid given everything already drawn,” and that’s the question the cursor exists for.

On your four:

Composition (8001→8312→8301) feels natural , yes, and cleanly, because each layer answers a different question and never reaches into another’s: 8001 records what was agreed, 8312 meters what remains, 8301 orders the workflow, the substrate enforces the result. That maps one-to-one onto where this thread landed (grant → cursor → witness → order → enforce); the boundaries holding on their own is the sign the decomposition is right.

The PortfolioManager / ExecutionSubstrate split is the right boundary, and there’s a load-bearing reason beyond clean code. The thread’s conclusion was that only the cursor has to be singular; everything else can be built many ways and still interoperate. Your ExecutionSubstrate owns the one object every action advances (the cursor), while PortfolioManager orchestration stays swappable. Keeping the cursor advance inside the enforcement surface, same path that checks headroom, advancing only on success, never on a rejected proposal, is what makes it a serialized write at the moment of the draw, which is the one property 8312 actually needs. You arrived at the split the spec wants.

Cursor semantics demonstrate bounded authority, yes, the turnover reject is the proof. The other three (per-vault, risk-bucket, workflow) are convincing but a stateless check could do them too; turnover is the one that only the cursor can do, so I’d lead the narrative with it.

On 8281/OCP and 8299/WYRIWE : Your instinct is right for enforcement, but they’re the layer this POC doesn’t exercise yet: the audit half. The cursor rejects live regardless of provenance, so for the bounded-authority demo they add nothing, agreed. Where they’d matter is recomputability. Right now an outside party trusts the substrate’s record of what was drawn and the verifier’s read of the vault yields. WYRIWE binds the cursor’s consumption to the committed input the agent acted on (not an agent-reported amount), and OCP anchors the cursorRoot advances to a system the committer doesn’t control, so “did cumulative turnover ever exceed the mandate?” becomes re-derivable by anyone from the public advances, no trusted accountant. That’s the difference between enforced live and enforced live AND independently sound. Same boundary blockbird and pipavlo drew up-thread: the cursor enforces with a write; the on-chain advances make it auditable by recompute.

Concretely, the audit half is already runnable, the recompute-kit has an 8312/cap-conservation check that proves reserved + confirmed ≤ cap straight from storage (eth_getProof vs stateRoot, no trusted meter read). Happy to point it at a Solidity port of your ExecutionSubstrate cursor whenever you take headroom on-chain, find-then-recompute: your substrate enforces, the recompute verifies. Genuinely nice piece of work.

Great feedback, thank you! And yes, this was the idea: you can swap out the execution substrate for a ERC-4337 Smart Account, Coinbase Agentic Wallet, Safe Module or whatever. I think that’s what makes this whole approach so interesting (as in, using small, tightly scoped ERCs) in this way: it becomes extremely modular.

I like your take on how to use WYRIWE / OCP primitives as well - it might be a good addition now that I think of it. Regarding the Solidity port: I updated the POC earlier so it now runs on a local EVM chain now, with deployable contracts from Solidity source code. Next step I’ll deploy to Base Sepolia if that’s helpful and/or if you want to jam on it.

2 Likes

@orbmis yes, and “swap the substrate for a 4337 account / Coinbase agentic wallet / Safe module / whatever” is exactly the property that makes the cursor the only singular thing: every other surface is an implementation choice, the one object every action has to advance is the cap cursor. Keep that swappable and you’ve got interop without a canonical wallet, which is the whole point.

On deploying it, let’s jam. Once the ExecutionSubstrate cursor is on Base Sepolia, I’d point the recompute side straight at it. The recompute-kit has an 8312/cap-conservation check that proves reserved[mandate] + confirmed[mandate] ≤ cap from contract storageeth_getProof against the stateRoot, no call to the substrate and no reliance on a reported number. So your substrate enforces the bound live at the moment of the draw; the recipe re-derives that the bound was never exceeded, independently, from the same chain anyone can read. Enforced-and-auditable, not enforced-and-trusted. Drop the deployed address + the storage slots for reserved/confirmed/cap when it’s up and I’ll run it against your live cursor, find-then-recompute, your contract is the find.

And WYRIWE/OCP as an addition is the right instinct, because it closes the one gap the cursor alone can’t: the cursor meters an amount, but it trusts that the amount corresponds to the input the agent actually acted on. WYRIWE binds the consumption to the committed input (the inputHash the model received, anchored before execution), so “did cumulative turnover ever exceed the mandate” becomes re-derivable from the committed inputs themselves, not from an agent-reported draw. That’s the audit half sitting cleanly on top of your enforcement half, same boundary @KBryan drew in the composition-note thread (8001 records authority · 8312 meters it · 8275 settles), just with the provenance binding making the meter’s input itself recomputable.

Happy to wire a worked example once it’s on Sepolia — your demo:turnover reject was already the convincing scenario; the on-chain version of it, with cap-conservation recomputing the cursor and a WYRIWE commitment on the draw’s input, is the full enforced+audited loop in one trace. Genuinely good momentum on this.

Hey @TMerlini , sorry for the delay in getting back to you. I’m travelling for a few days so it will likely be next week before I get back to you with a Base Sepolia deployment. I just didn’t want to leave you hanging with a response. I’ll ping as you soon as I’m back!