Cost & Escalation Subsystem¶
The cost subsystem enforces API budget limits, gates expensive operations through a user-facing decision tree, validates manually-built branches, surfaces missing tool gaps, and provides runtime dashboard overrides for all cost-related controls.
Realizes:
spec_v3.md §13,docs/superpowers/specs/manual-escalation.md
Overview¶
Donna operates under a $100/month hard cap on Claude API spend with a $20/day pause threshold. The cost subsystem (src/donna/cost/) is the enforcement layer: every LLM call passes through BudgetGuard pre-flight, and any operation that would exceed the daily budget or the per-task approval threshold triggers the EscalationGate to present the user with a decision.
The subsystem has grown across slices 17--24 into a substantial module (20+ files) covering five major concerns: cost tracking and aggregation, budget enforcement, the escalation decision tree (Approve/Manual/Pause/Cancel), the claude_code validation pipeline for manually-built branches, and tool-gap detection with surfacing. All runtime controls are overridable through the dashboard via the DashboardSettingResolver, which reads dashboard_setting rows with YAML defaults as fallback.
The design philosophy is "safety first, dial back later": autonomous spend is gated by default, budget extensions require explicit user clicks, and tool builds require human merge. Every decision point writes audit events for traceability.
Key Concepts¶
| Concept | Description |
|---|---|
| CostTracker | Queries invocation_log for daily/monthly cost aggregations by task type or model. Foundation for all budget checks. |
| BudgetGuard | Pre-call gate injected into ModelRouter. Enforces both the daily pause threshold (plus any approved extensions) and the $100 monthly hard cap, and fires the 90% monthly warning. Raises BudgetPausedError (with period = daily or monthly) on breach. |
| BudgetPausedError | Exception that halts all autonomous LLM work. Caught by the orchestrator to transition tasks to paused state. |
| EscalationGate | Estimate-driven gate. When a task's pre-flight cost estimate exceeds budget or the per-task threshold, it creates an escalation_request row, posts a Discord view with buttons, and awaits resolution. |
| GateOutcome | Result of fire_and_wait: indicates whether the caller should proceed (api_extended), park the task (pause/cancel), or hand off to the user (chat/claude_code). |
| EscalationRepository | aiosqlite CRUD for escalation_request and dashboard_setting tables. Supports idempotent resolution, optimistic-lock writes, and status lifecycle queries. |
| BudgetExtensionRepository | Manages daily_budget_extension rows. Idempotent grant with daily and monthly ceiling enforcement. |
| DashboardSettingResolver | Resolution layer: dashboard_setting table overrides -> YAML defaults. Supports legacy key aliases from pre-unification slices. |
| ClaudeCodePoller | Background coroutine (60s tick) that validates submitted claude_code branches: branch existence, SHA pin (force-push protection), diff scope check, skill/tool validation. |
| DiffValidator | Validates that a branch only touches files within the declared target_paths globs. |
| ManualValidationRouter | Routes validated branches to skill ingestion or tool lint depending on the task type. |
| ToolGap | Value object representing a missing tool. Carries severity (high/speculative), detection point, proposed signature. |
| ToolGapSurfacer | Routes ToolGap records to the tool_request table with dedup, promotes severity on re-emission, and triggers Discord pings for high-severity gaps. |
| ToolLint | AST-based lint pipeline for tool builds: anthropic import check, import-time I/O check, secrets scan, metadata validation, allowlist check, inert-test check. |
Architecture¶
flowchart TD
subgraph PreCall["Pre-Call Budget Enforcement"]
A[ModelRouter.complete] --> B[BudgetGuard.check_pre_call]
B -->|Under limit| C[Proceed with LLM call]
B -->|Over limit| D[BudgetPausedError]
end
subgraph Gate["Escalation Decision Tree"]
E[Agent estimates cost] --> F[EscalationGate.fire_and_wait]
F -->|Under threshold| G[Proceed]
F -->|Over threshold| H[Create escalation_request]
H --> I[Post Discord View]
I --> J{User Choice}
J -->|Approve $X| K[grant_budget_extension]
J -->|Manual chat| L[Render prompt, park task]
J -->|Manual claude_code| M[Render spec, park task]
J -->|Pause| N[Task → paused]
J -->|Cancel| O[Task → cancelled]
end
subgraph Poller["Claude Code Validation"]
P[ClaudeCodePoller.tick_once] --> Q[List submitted rows]
Q --> R[Branch exists?]
R -->|Yes| S[SHA pin check]
S --> T[DiffValidator scope check]
T --> U[ManualValidationRouter.validate]
U -->|Pass| V[status → validated]
U -->|Fail| W[status → failed]
W --> X{Iteration cap?}
X -->|Yes| Y[status → cancelled, human_review]
X -->|No| Z[User re-submits]
end
subgraph ToolGaps["Tool Gap Surfacing"]
TG1[Detection Point] --> TG2[ToolGapSurfacer.surface]
TG2 --> TG3[Upsert tool_request]
TG3 -->|high severity| TG4[Discord ping]
TG3 -->|speculative| TG5[Morning digest]
TG4 -->|File request click| TG6[open_tool_build_escalation]
TG6 --> TG7[Render spec + validation]
end
Budget Enforcement Flow¶
BudgetGuard.check_pre_call() runs before every LLM API call. It queries today's spend from invocation_log (excluding zero-cost audit rows like escalation_lifecycle and tool_gap_lifecycle), factors in any approved daily_budget_extension rows to raise the effective cap, and raises BudgetPausedError(period="daily") if spend exceeds the limit. It then checks monthly spend against the $100 hard cap and raises BudgetPausedError(period="monthly") on breach; below the cap, the 90% monthly warning fires (once per calendar month). Both caps are enforced regardless of the escalation-gate posture below.
Gate posture (shadow vs enforce)¶
config/manual_escalation.yaml → gate.mode selects the escalation-gate posture:
shadow(deployed default): the gate is consulted on every LLM call — the router derives a deterministic cost floor (cost.estimate_output_tokens× output rate + prompt-token input cost) when a caller supplies noestimate_usd, so the gate is never dark — but it only logs calls that would escalate (escalation_shadow_would_fire). No rows, no Discord prompts, no blocking. Used to calibrate estimate accuracy.enforce: runs the full interactive Approve/Manual/Pause/Cancel decision tree described below.
Prior to 2026-06-11 the gate fired only when a caller passed
estimate_usd— and no production caller did, so the decision tree was unreachable and the monthly cap unenforced. See2026-06-11-cost-escalation-fable-critique-design.md.
Escalation Modes¶
The gate dynamically builds the set of offered modes based on configuration and preconditions:
| Mode | Preconditions | Effect |
|---|---|---|
api_extended |
Budget extension enabled, daily/monthly headroom available | Grants a budget extension for the estimated amount. Caller proceeds with API call. |
chat |
Task type declares manual_escalation.mode: chat, prompt builder wired, original prompt supplied |
Renders the prompt to disk + DB. User pastes into Claude manually. |
claude_code |
Task type declares manual_escalation.mode: claude_code, host repo mounted, spec builder wired |
Renders a spec with worktree command, acceptance criteria. User builds in a git worktree. |
pause |
Always available | Task transitions to paused. No spend. |
cancel |
Always available | Task transitions to cancelled. No spend. |
Slice 23 added per-task-type override controls (force_api, force_manual, disabled) that filter the offered modes after precondition checks.
Tool Gap Detection Points¶
Five sites feed through ToolGapSurfacer:
| Site | Severity | When |
|---|---|---|
| Boot capability check | speculative | pending_review capability declares an unregistered tool |
| AutomationDispatcher | high | Skill path about to run; required tool missing |
| Discord automation creation | high | User creates an automation backed by a capability needing an unregistered tool |
| AutoDrafter pre-flight | speculative | Drafted skill YAML references unregistered tool |
| SkillExecutor runtime | high | Mid-run dispatch against unregistered tool name |
Tool Lint Pipeline¶
When a tool build branch is submitted, ManualValidationRouter._validate_tool runs six AST/regex checks:
anthropic_import-- rejectimport anthropicoutsidesrc/donna/llm/import_io-- reject module-level network/disk I/Osecrets-- curated regex for API keys, PEM headers, vault-key patternsmetadata-- requirerequires_rebuild: bool+default_timeout_seconds: intallowlist-- diff must touch a config allowlist file with the tool nameinert_test-- branch must includetests/skills/tools/test_<name>.pycallingis_inert_at_import
Configuration¶
Primary config: config/manual_escalation.yaml
Key sections:
enabled-- global kill switchmodes.chat.enabled/modes.claude_code.enabled-- per-mode togglesbudget_extension.max_daily_extension_usd/budget_extension.hard_monthly_ceiling_usd-- extension limitstriggers.task_approval_threshold_usd-- estimate threshold that triggers escalationtriggers.manual_iteration_limit-- max resubmissions before routing to human reviewtool_gap.*-- reping cooldowns, Discord channels, lint defaults
Budget thresholds: config/donna_models.yaml under cost:
daily_pause_threshold_usd: 20.0monthly_budget_usd: 100.0monthly_warning_pct: 0.90
Dashboard overrides: All settings in manual_escalation.yaml can be overridden at runtime via dashboard_setting rows. The DashboardSettingResolver checks the DB first, falls back to YAML. Slice 23 added optimistic-lock writes (set_dashboard_setting_with_lock) so concurrent dashboard edits do not silently clobber each other.
API¶
| Class / Function | Module | Description |
|---|---|---|
CostTracker |
tracker.py |
get_daily_cost(), get_monthly_cost(), get_projected_monthly_spend(), get_cost_by_task_type(), get_cost_by_agent() |
BudgetGuard |
budget.py |
check_pre_call() (raises BudgetPausedError), check_monthly_warning() |
EscalationGate |
escalation_gate.py |
fire_and_wait(), record_user_resolution(), grant_budget_extension(), record_manual_handoff(), open_tool_build_escalation() |
EscalationRepository |
escalation_repository.py |
CRUD for escalation_request + dashboard_setting. Supports idempotent resolve, optimistic-lock writes, lifecycle queries. |
DashboardSettingResolver |
dashboard_setting.py |
get(key, default) -- resolves overrides with legacy alias support. |
ClaudeCodePoller |
claude_code_poller.py |
run() (background loop), tick_once() (single pass for tests). |
DiffValidator |
diff_validator.py |
validate(diff_paths, target_paths) -- scope check returning matched/out-of-scope files. |
ManualValidationRouter |
manual_validation_router.py |
validate(row, branch, diff_paths) -- routes to skill ingestion or tool lint. |
ToolGapSurfacer |
tool_gap_surfacer.py |
surface(gap) -- dedup-aware upsert + Discord ping for high severity. |
lint_tool_branch |
tool_lint/__init__.py |
Runs all six lint rules against a branch's diff. Returns LintResult with failures/warnings. |
See Also¶
- Domain: Task Management -- manual escalation terminal and task state transitions
- Domain: Skill System -- skill lifecycle states, auto-drafting, evolution
- Domain: Orchestrator -- dispatches tasks that may trigger escalation
- Domain: Observability --
invocation_logtable that feeds cost tracking