Changelog¶

Recent changes, summarized from commits and PRs.

2026-06-11¶

Added¶

Personal-context injection. New orchestrator/task_context.py assembles a compact context block from vault notes (semantic search) + active learned-preference rules, injected into the parse prompt via a {{ personal_context }} slot. Degrades gracefully when the vault is empty. PreferenceApplier and memory_store are now wired into the live parser.
Domain/duration edit pathway. PATCH /tasks/{id} (API) and PATCH /admin/tasks/{id} (dashboard) now accept domain and estimated_duration, with an inline editor in the dashboard task detail panel. These edits fire the CorrectionSubscriber learning loop, which was previously dormant for these fields.
Skill-subsystem alerting: new skills/alerting.py raises fallback alerts for skill-lifecycle paths that were previously silent — degradation demotions, run-persistence failures, shadow-sample-loss streaks, and evolution park-in-draft. (Skill System)
Pending-approval surfacing: a Discord ping fires when the nightly auto-draft creates skills awaiting approval, and a standing "⏳ Pending your approval" section in the EOD digest lists every skill parked in draft (auto-drafted or evolution-parked) until acted on. (Skill System)

Changed¶

Task parsing is now local-first. parse_task routes to the local model (local_parser, qwen2.5:32b) as primary, with confidence-gated escalation to the cloud reasoner via a new parse_task_cloud route when the local parse confidence is below 0.7. Most tasks now parse at zero marginal cost; ambiguous ones get a cloud second opinion.
Calibrated duration estimates. prompts/parse_task.md gained explicit duration anchors (quick comms 15 min / errands 30 / focused work 60), fixing the prior "every task is ~1 hour" behavior. The domain rubric was sharpened and now leans on injected personal context to disambiguate work vs personal.
confidence_threshold re-added with its consumer. The Model-Layer audit (below) removed confidence_threshold from donna_models.yaml / the RoutingEntry model as read-nowhere config; it is re-added in this release together with its consuming logic — ModelRouter.confidence_threshold_for and the confidence-gated parse escalation above. (Model Layer)
Skill auto-draft now human-gated by default: auto-drafted skills default to requires_human_gate=1, so a single human approval at draft → sandbox is mandatory before a skill leaves draft; the sandbox→shadow_primary→trusted promotions thereafter stay automatic, gated only by the §23.4 run-validity and shadow-agreement thresholds. spec_v3.md §23.5 and lifecycle.md were reconciled to match. (Skill System, spec_v3.md §23.5)
Serialized placement choke point: Scheduler.schedule_task / schedule_dependency_chain run the read→find-slot→create-event section under an asyncio.Lock, realizing the spec_v3.md §3.7.1 double-booking guard (the earlier "async queue" wording was a design target). (Scheduling, spec_v3.md §3.7.1)
Budget enforcement: BudgetGuard.check_pre_call now enforces the $100/month hard cap (previously daily-only — the monthly check was dead code) and fires the 90% monthly warning. BudgetPausedError carries a period (daily / monthly). (Cost & Escalation)
Escalation-gate posture: new config/manual_escalation.yaml → gate.mode. The default shadow consults the gate on every call and logs would-escalate events (escalation_shadow_would_fire) without prompting, persisting, or blocking; enforce runs the interactive decision tree. The router now derives a deterministic cost floor when a caller omits estimate_usd, so the gate is no longer dark. (Cost & Escalation)
Ledger integrity at the model choke point: ModelRouter.complete() is now the accounting boundary, not just dispatch. Production routers are built via build_model_router(), which requires an invocation_logger, and complete() raises RoutingError rather than make an unlogged billed call — so all spend reaches invocation_log, the table BudgetGuard reads for the $100/month cap. Chat and bot routers are now wired through the factory. (Model Layer)
Config-driven pricing: per-call cost_usd is computed from the per-alias config rates (input/output_cost_per_token_usd) instead of hardcoded Sonnet $3/$15; the Anthropic provider fails loud on an unpriced model id rather than silently mispricing. (Model Layer)
Dead-config audit: confidence_threshold was flagged as read-nowhere and briefly removed from donna_models.yaml / the RoutingEntry model — then re-added in this same release with its consuming logic (confidence-gated parse escalation, see Changed above). (Model Layer)
Tool-validation allowlist can no longer be bypassed: ToolRegistry.execute now requires task_type+agent_name; previously omitting task_type skipped the allowlist entirely (principle #6). (Orchestrator)
§7.2 sub-agent pipeline documented as dormant: spec_v3.md §7.2, docs/domain/orchestrator.md, and docs/domain/agents.md now state that AgentDispatcher + the PM/Prep/Scheduler/Decomposition agents + the agent-layer ToolRegistry are built-but-unwired, and describe the real live flow (DiscordIntentDispatcher → ChallengerAgent → ClaudeNoveltyJudge → AutoScheduler).

Fixed¶

Timezone-correct slot placement: Scheduler.find_next_slot now steps candidates in UTC (DST-safe) but evaluates every time-window against the configured calendar.yaml zone, so the absolute blackout and domain windows are enforced on the user's wall clock instead of UTC — a work task can no longer land at ~4 AM local, and confirmations show the correct local time. (Scheduling, spec_v3.md §6.3)
Deadline-aware horizon: the search horizon is clamped to the task's deadline / earliest bound (honoring a constrained weekday); an unplaceable dated task now surfaces as needs_scheduling instead of being placed late within a flat 14-day window. (Scheduling)
Fail-closed calendar reads: placement now builds its busy-set from the union of all configured calendars (personal + work + family) and raises CalendarReadError (with a fallback alert) on any read error, rather than booking blind against an empty calendar. (Scheduling)
Billed spend dropped on token-limit truncation: a token-capped extension call raised TokenLimitReachedError before the invocation_log write, dropping real spend from budget accounting. The raise now happens after the log + payload writes; auto_drafter / evolution catch it; log-write failures now alert via fallback_alert_fn instead of warning silently. (Cost & Escalation)
Skill trust-gate evidence loop wired: the production executor factory now constructs SkillRunRepository and injects the bundle's ShadowSampler into SkillExecutor, with a boot invariant that alerts loudly if skills run live without run-persistence/sampler — previously the statistical trust gates ran on data that was never produced, so promotion and auto-demotion were inert in production. (Skill System)
Skill trust-gate landmines: the requires_human_gate check no longer blocks system-actor demotions (it is scoped to a promotion-destination allowlist); gate evidence is keyed on skill_version_id, so an evolved version no longer inherits its predecessor's track record; and a run counts as valid only if it succeeded with no continued/step_failed/skill_failed step, with a config-driven failure-rate ceiling guarding shadow→trusted. (Skill System)
Dead evolution transition removed: the contextlib.suppress(IllegalTransitionError) around an always-failing hop in evolution.py was deleted; an evolved version now parks in draft with an explicit alert, and transition() rejects reason="human_approval" from a system actor. (Skill System)
Discord clarification replies no longer vanish: DiscordIntentDispatcher._resume discarded the pending draft before checking the re-parse status, so a clarification reply that re-parsed to escalate_to_claude (or ready+chat) fell through to no_action and was silently dropped — no judge, no task, no message. _resume now mirrors dispatch(): escalations route to the novelty judge, ready/chat returns chat, an unknown status asks the user to rephrase, and the draft is discarded only on a terminal outcome. (Orchestrator)
Challenger fail-open is no longer silent: the three fail-open paths (transport error, OSError, execute() exception) now emit dispatch_fallback_alert, and a schema-validation failure degrades to escalate_to_claude instead of proceeding on unvalidated model output. Fail-open is kept (the Challenger must never block task creation). (Agents)
Atomic task-state transitions: Database.transition_task_state read+validated status before taking the write lock (TOCTOU); read + validate + write now happen inside the lock, honoring the spec §3.7.1 atomicity guarantee.

Notes¶

spec_v3.md model-routing and task-parsing sections describe the old cloud-first parsing; reconciliation tracked as S25 in followups.md.

2026-06-06¶

Added¶

Time intent: the parser now emits a structured time_intent classifying when a task happens (exact / window / constrained / recurring / none), persisted as tasks.time_intent_json (Alembic migration). deadline / deadline_type are derived from it. An LLM-free fallback re-extracts common date phrasings when the model omits it. (Task System)
Routing gate: a deterministic, LLM-free gate routes captured tasks to the scheduler (time-bound), automation (recurring), or backlog (undated). (Scheduling)
needs_scheduling state: time-bound tasks the scheduler can't place before their deadline surface in needs_scheduling instead of stranding in backlog. (Task System)
Persona-voice capture confirmations: slot-aware Discord confirmations (template-based, zero-token) replace the static "Scheduled: pending." reply. (Capture a Task)

Fixed¶

Strand bug: time-bound tasks are now scheduled immediately by the routing gate and no longer deferred for the Challenger, fixing cases where dated tasks stranded in backlog. (Scheduling)

2026-05-18¶

Added¶

Documentation system: global update-docs skill and docs-updater agent for bootstrapping, updating, and auditing docs across projects (Domain)

Changed¶

Documentation cleanup: extracted all inline "not implemented" / "deferred" / "obsolete" callouts from 11 domain docs into open-backlog.md with stable gap IDs (G-1 through G-29)
Skill System docs refactored: split 769-line skill-system.md into 5 focused subpages (index, setup, lifecycle, evolution, reference) under domain/skill-system/
Memory Vault docs refactored: split 312-line memory-vault.md into 4 focused subpages (index, semantic, episodic, templates) under domain/memory-vault/
Management GUI docs refactored: split 495-line management-gui.md into 4 focused subpages (index, api, pages, reference) under domain/management-gui/
Domain index enhanced: added Mermaid architecture diagram and 7-step "Start Here" reading guide to domain/index.md
spec_v3.md: added §0 Implementation Status Matrix, removed 5 inline status blocks, moved Phase 6 details to appendix
followups.md: archived 28 closed items, trimmed to open items only
properdocs.yml: link validation tightened to warn (CI catches broken links via --strict)
Clarified distinct purposes of open-backlog.md (feature gaps) vs followups.md (spec questions)

Fixed¶

PayloadWriter correctly wired into all ModelRouter instances
Removed copy-paste error in cost.md (contained skill-system.md content)
Fixed backup-recovery.md stub with proper intro text
Updated stale "in flight" marker on slice 15 in slices.md
Trimmed planned donna_logs.db schema from observability.md (archived to archive/)
Expanded slices.md with slices 16–24 (escalation, budget, dashboard, chat, tool gaps)

2026-05-17¶

Added¶

Claude Inspector: full forensics UI for browsing LLM calls, comparing payloads, and analyzing cost/performance insights (Insights, Management GUI)
Payload collection subsystem: PayloadWriter captures full request/response payloads; PayloadEvictor enforces disk budget (Collection)
Claude Inspector API endpoints for call browsing, payload retrieval, and insights queries
Deep-link query parameter support on Claude Inspector page
Claude Code project skills, agents, and hooks for development automation

Changed¶

Chat engine now supports session persistence, grouping, and optimistic message rendering

Fixed¶

Docker: added DONNA_PAYLOAD_DIR env var for payload storage path
CI: resolved lint and typecheck failures

2026-05-16¶

Changed¶

Preferences: migrated to event-driven correction pipeline via CorrectionSubscriber and TaskEventBus (Preferences)

Fixed¶

UI: migrated drawers to inline expansion (Tasks, Preferences, Candidates) and CenterDialog (Logs, SkillSystem, Shadow)

2026-05-15¶

Added¶

Chat action system: ActionRegistry with handlers for tasks, vault, skills, automations, and debug commands (Chat)
Quick Chat panel with floating button and Cmd+J toggle
CorrectionSubscriber for event-driven preference correction logging
DashboardContext provider and CenterDialog primitive for UI
Product watch v3 triage cascade with tool_use wiring
Unit tests for executor tool_use loop and correction event flow e2e test

Fixed¶

Automations: atomic success reset in advance_schedule; success un-pauses, failure notifications routed to donna-debug
Discord: text-based done intent, thread message routing guards
Skills: on_failure added to claude_with_triage, fallback condition fix
Calendar: delete Google Calendar event when task is cancelled
Migration: widen overdue_thread_map snowflake column to BigInteger

2026-05-14¶

Added¶

Automation alert pipeline: defaults, notification channels, multi-channel routing

Fixed¶

Discord: text-based done intent and thread message routing
Scheduler: authenticate calendar client in orchestrator startup

2026-05-13¶

Added¶

Calendar page: week view with time slots, data fetching, week navigation, completed task styling (Management GUI)
Calendar added to sidebar navigation and routing

Fixed¶

Calendar: auth token persistence, nginx proxy, day placement, DST index, overflow clipping
Auth: allow internal-network requests to user routes without Immich login
Docker: mount vault volume in API container
UI: sentinel value for Vault folder select (Radix empty-string fix)
Chat: use chat_respond template with JSON output instructions