Changelog¶
Recent changes, summarized from commits and PRs.
2026-06-11¶
Added¶
- Personal-context injection. New
orchestrator/task_context.pyassembles a compact context block from vault notes (semantic search) + active learned-preference rules, injected into the parse prompt via a{{ personal_context }}slot. Degrades gracefully when the vault is empty.PreferenceApplierandmemory_storeare now wired into the live parser. - Domain/duration edit pathway.
PATCH /tasks/{id}(API) andPATCH /admin/tasks/{id}(dashboard) now acceptdomainandestimated_duration, with an inline editor in the dashboard task detail panel. These edits fire theCorrectionSubscriberlearning loop, which was previously dormant for these fields. - Skill-subsystem alerting: new
skills/alerting.pyraises fallback alerts for skill-lifecycle paths that were previously silent — degradation demotions, run-persistence failures, shadow-sample-loss streaks, and evolution park-in-draft. (Skill System) - Pending-approval surfacing: a Discord ping fires when the nightly auto-draft creates skills awaiting approval, and a standing "⏳ Pending your approval" section in the EOD digest lists every skill parked in
draft(auto-drafted or evolution-parked) until acted on. (Skill System)
Changed¶
- Task parsing is now local-first.
parse_taskroutes to the local model (local_parser, qwen2.5:32b) as primary, with confidence-gated escalation to the cloudreasonervia a newparse_task_cloudroute when the local parse confidence is below 0.7. Most tasks now parse at zero marginal cost; ambiguous ones get a cloud second opinion. - Calibrated duration estimates.
prompts/parse_task.mdgained explicit duration anchors (quick comms 15 min / errands 30 / focused work 60), fixing the prior "every task is ~1 hour" behavior. The domain rubric was sharpened and now leans on injected personal context to disambiguate work vs personal. confidence_thresholdre-added with its consumer. The Model-Layer audit (below) removedconfidence_thresholdfromdonna_models.yaml/ theRoutingEntrymodel as read-nowhere config; it is re-added in this release together with its consuming logic —ModelRouter.confidence_threshold_forand the confidence-gated parse escalation above. (Model Layer)- Skill auto-draft now human-gated by default: auto-drafted skills default to
requires_human_gate=1, so a single human approval atdraft → sandboxis mandatory before a skill leaves draft; the sandbox→shadow_primary→trusted promotions thereafter stay automatic, gated only by the §23.4 run-validity and shadow-agreement thresholds.spec_v3.md§23.5 andlifecycle.mdwere reconciled to match. (Skill System,spec_v3.md§23.5) - Serialized placement choke point:
Scheduler.schedule_task/schedule_dependency_chainrun the read→find-slot→create-event section under anasyncio.Lock, realizing thespec_v3.md §3.7.1double-booking guard (the earlier "async queue" wording was a design target). (Scheduling,spec_v3.md§3.7.1) - Budget enforcement:
BudgetGuard.check_pre_callnow enforces the$100/month hard cap (previously daily-only — the monthly check was dead code) and fires the 90% monthly warning.BudgetPausedErrorcarries aperiod(daily/monthly). (Cost & Escalation) - Escalation-gate posture: new
config/manual_escalation.yaml→gate.mode. The defaultshadowconsults the gate on every call and logs would-escalate events (escalation_shadow_would_fire) without prompting, persisting, or blocking;enforceruns the interactive decision tree. The router now derives a deterministic cost floor when a caller omitsestimate_usd, so the gate is no longer dark. (Cost & Escalation) - Ledger integrity at the model choke point:
ModelRouter.complete()is now the accounting boundary, not just dispatch. Production routers are built viabuild_model_router(), which requires aninvocation_logger, andcomplete()raisesRoutingErrorrather than make an unlogged billed call — so all spend reachesinvocation_log, the tableBudgetGuardreads for the$100/month cap. Chat and bot routers are now wired through the factory. (Model Layer) - Config-driven pricing: per-call
cost_usdis computed from the per-alias config rates (input/output_cost_per_token_usd) instead of hardcoded Sonnet$3/$15; the Anthropic provider fails loud on an unpriced model id rather than silently mispricing. (Model Layer) - Dead-config audit:
confidence_thresholdwas flagged as read-nowhere and briefly removed fromdonna_models.yaml/ theRoutingEntrymodel — then re-added in this same release with its consuming logic (confidence-gated parse escalation, see Changed above). (Model Layer) - Tool-validation allowlist can no longer be bypassed:
ToolRegistry.executenow requirestask_type+agent_name; previously omittingtask_typeskipped the allowlist entirely (principle #6). (Orchestrator) - §7.2 sub-agent pipeline documented as dormant:
spec_v3.md §7.2,docs/domain/orchestrator.md, anddocs/domain/agents.mdnow state thatAgentDispatcher+ the PM/Prep/Scheduler/Decomposition agents + the agent-layerToolRegistryare built-but-unwired, and describe the real live flow (DiscordIntentDispatcher→ChallengerAgent→ClaudeNoveltyJudge→AutoScheduler).
Fixed¶
- Timezone-correct slot placement:
Scheduler.find_next_slotnow steps candidates in UTC (DST-safe) but evaluates every time-window against the configuredcalendar.yamlzone, so the absolute blackout and domain windows are enforced on the user's wall clock instead of UTC — a work task can no longer land at ~4 AM local, and confirmations show the correct local time. (Scheduling,spec_v3.md§6.3) - Deadline-aware horizon: the search horizon is clamped to the task's deadline /
earliestbound (honoring aconstrainedweekday); an unplaceable dated task now surfaces asneeds_schedulinginstead of being placed late within a flat 14-day window. (Scheduling) - Fail-closed calendar reads: placement now builds its busy-set from the union of all configured calendars (personal + work + family) and raises
CalendarReadError(with a fallback alert) on any read error, rather than booking blind against an empty calendar. (Scheduling) - Billed spend dropped on token-limit truncation: a token-capped extension call raised
TokenLimitReachedErrorbefore theinvocation_logwrite, dropping real spend from budget accounting. The raise now happens after the log + payload writes;auto_drafter/evolutioncatch it; log-write failures now alert viafallback_alert_fninstead of warning silently. (Cost & Escalation) - Skill trust-gate evidence loop wired: the production executor factory now constructs
SkillRunRepositoryand injects the bundle'sShadowSamplerintoSkillExecutor, with a boot invariant that alerts loudly if skills run live without run-persistence/sampler — previously the statistical trust gates ran on data that was never produced, so promotion and auto-demotion were inert in production. (Skill System) - Skill trust-gate landmines: the
requires_human_gatecheck no longer blocks system-actor demotions (it is scoped to a promotion-destination allowlist); gate evidence is keyed onskill_version_id, so an evolved version no longer inherits its predecessor's track record; and a run counts as valid only if it succeeded with nocontinued/step_failed/skill_failedstep, with a config-driven failure-rate ceiling guarding shadow→trusted. (Skill System) - Dead evolution transition removed: the
contextlib.suppress(IllegalTransitionError)around an always-failing hop inevolution.pywas deleted; an evolved version now parks indraftwith an explicit alert, andtransition()rejectsreason="human_approval"from a system actor. (Skill System) - Discord clarification replies no longer vanish:
DiscordIntentDispatcher._resumediscarded the pending draft before checking the re-parse status, so a clarification reply that re-parsed toescalate_to_claude(orready+chat) fell through tono_actionand was silently dropped — no judge, no task, no message._resumenow mirrorsdispatch(): escalations route to the novelty judge,ready/chatreturns chat, an unknown status asks the user to rephrase, and the draft is discarded only on a terminal outcome. (Orchestrator) - Challenger fail-open is no longer silent: the three fail-open paths (transport error, OSError,
execute()exception) now emitdispatch_fallback_alert, and a schema-validation failure degrades toescalate_to_claudeinstead of proceeding on unvalidated model output. Fail-open is kept (the Challenger must never block task creation). (Agents) - Atomic task-state transitions:
Database.transition_task_stateread+validated status before taking the write lock (TOCTOU); read + validate + write now happen inside the lock, honoring the spec §3.7.1 atomicity guarantee.
Notes¶
spec_v3.mdmodel-routing and task-parsing sections describe the old cloud-first parsing; reconciliation tracked as S25 infollowups.md.
2026-06-06¶
Added¶
- Time intent: the parser now emits a structured
time_intentclassifying when a task happens (exact/window/constrained/recurring/none), persisted astasks.time_intent_json(Alembic migration).deadline/deadline_typeare derived from it. An LLM-free fallback re-extracts common date phrasings when the model omits it. (Task System) - Routing gate: a deterministic, LLM-free gate routes captured tasks to the scheduler (time-bound), automation (recurring), or backlog (undated). (Scheduling)
needs_schedulingstate: time-bound tasks the scheduler can't place before their deadline surface inneeds_schedulinginstead of stranding in backlog. (Task System)- Persona-voice capture confirmations: slot-aware Discord confirmations (template-based, zero-token) replace the static "Scheduled: pending." reply. (Capture a Task)
Fixed¶
- Strand bug: time-bound tasks are now scheduled immediately by the routing gate and no longer deferred for the Challenger, fixing cases where dated tasks stranded in
backlog. (Scheduling)
2026-05-18¶
Added¶
- Documentation system: global
update-docsskill anddocs-updateragent for bootstrapping, updating, and auditing docs across projects (Domain)
Changed¶
- Documentation cleanup: extracted all inline "not implemented" / "deferred" / "obsolete" callouts from 11 domain docs into
open-backlog.mdwith stable gap IDs (G-1 through G-29) - Skill System docs refactored: split 769-line
skill-system.mdinto 5 focused subpages (index, setup, lifecycle, evolution, reference) underdomain/skill-system/ - Memory Vault docs refactored: split 312-line
memory-vault.mdinto 4 focused subpages (index, semantic, episodic, templates) underdomain/memory-vault/ - Management GUI docs refactored: split 495-line
management-gui.mdinto 4 focused subpages (index, api, pages, reference) underdomain/management-gui/ - Domain index enhanced: added Mermaid architecture diagram and 7-step "Start Here" reading guide to
domain/index.md spec_v3.md: added §0 Implementation Status Matrix, removed 5 inline status blocks, moved Phase 6 details to appendixfollowups.md: archived 28 closed items, trimmed to open items onlyproperdocs.yml: link validation tightened towarn(CI catches broken links via--strict)- Clarified distinct purposes of
open-backlog.md(feature gaps) vsfollowups.md(spec questions)
Fixed¶
- PayloadWriter correctly wired into all ModelRouter instances
- Removed copy-paste error in
cost.md(contained skill-system.md content) - Fixed
backup-recovery.mdstub with proper intro text - Updated stale "in flight" marker on slice 15 in
slices.md - Trimmed planned
donna_logs.dbschema fromobservability.md(archived toarchive/) - Expanded
slices.mdwith slices 16–24 (escalation, budget, dashboard, chat, tool gaps)
2026-05-17¶
Added¶
- Claude Inspector: full forensics UI for browsing LLM calls, comparing payloads, and analyzing cost/performance insights (Insights, Management GUI)
- Payload collection subsystem:
PayloadWritercaptures full request/response payloads;PayloadEvictorenforces disk budget (Collection) - Claude Inspector API endpoints for call browsing, payload retrieval, and insights queries
- Deep-link query parameter support on Claude Inspector page
- Claude Code project skills, agents, and hooks for development automation
Changed¶
- Chat engine now supports session persistence, grouping, and optimistic message rendering
Fixed¶
- Docker: added
DONNA_PAYLOAD_DIRenv var for payload storage path - CI: resolved lint and typecheck failures
2026-05-16¶
Changed¶
- Preferences: migrated to event-driven correction pipeline via
CorrectionSubscriberandTaskEventBus(Preferences)
Fixed¶
- UI: migrated drawers to inline expansion (Tasks, Preferences, Candidates) and CenterDialog (Logs, SkillSystem, Shadow)
2026-05-15¶
Added¶
- Chat action system:
ActionRegistrywith handlers for tasks, vault, skills, automations, and debug commands (Chat) - Quick Chat panel with floating button and Cmd+J toggle
CorrectionSubscriberfor event-driven preference correction loggingDashboardContextprovider andCenterDialogprimitive for UI- Product watch v3 triage cascade with tool_use wiring
- Unit tests for executor tool_use loop and correction event flow e2e test
Fixed¶
- Automations: atomic success reset in
advance_schedule; success un-pauses, failure notifications routed to donna-debug - Discord: text-based done intent, thread message routing guards
- Skills:
on_failureadded toclaude_with_triage, fallback condition fix - Calendar: delete Google Calendar event when task is cancelled
- Migration: widen
overdue_thread_mapsnowflake column to BigInteger
2026-05-14¶
Added¶
- Automation alert pipeline: defaults, notification channels, multi-channel routing
Fixed¶
- Discord: text-based done intent and thread message routing
- Scheduler: authenticate calendar client in orchestrator startup
2026-05-13¶
Added¶
- Calendar page: week view with time slots, data fetching, week navigation, completed task styling (Management GUI)
- Calendar added to sidebar navigation and routing
Fixed¶
- Calendar: auth token persistence, nginx proxy, day placement, DST index, overflow clipping
- Auth: allow internal-network requests to user routes without Immich login
- Docker: mount vault volume in API container
- UI: sentinel value for Vault folder select (Radix empty-string fix)
- Chat: use
chat_respondtemplate with JSON output instructions