Sub-Agent System¶

Split from Donna Project Spec v3.0 — Sections 7, 8

Agent Hierarchy¶

The Orchestrator (core process, not a sub-agent) receives all tasks and determines routing.

Agent	Responsibilities	Tool Access	Autonomy Level
Scheduler	Calendar management, time slots, rescheduling, reminders, weekly planning	Google Calendar (read-write), Task DB (read-write)	High — auto-schedules priority 1–3
Research / Prep	Web research, info compilation, resource gathering before flagged tasks	Web search (MCP), Gmail (read-only), Filesystem (MCP read), GitHub (MCP read)	High — runs autonomously when prep flagged
Project Manager	Task decomposition, requirements assessment, interrogation, work packaging	Task DB (read-write), all agents (dispatch)	Medium — can decompose and route, must confirm requirements with user
Coding	Code generation, file editing, project scaffolding	Filesystem (MCP sandboxed read-write), GitHub (MCP read-write), Claude Code CLI	Low — output for review only. Never pushes to main. Never deletes.
Communication / Drafting	Email drafts, message drafts, document creation	Gmail (draft only; send behind feature flag), Docs/markdown (write), Discord/Slack (specific channels only)	Low — always drafts. Never sends without explicit approval.
Challenger	Task quality evaluation, follow-up questions on vague tasks	Task DB (read-only)	Medium — probes task quality, returns questions to user via Discord thread

Agent Execution Flow¶

Orchestrator receives task → routes to PM Agent for assessment.
PM Agent evaluates completeness. If missing info → sends targeted questions (not open-ended).
Example: "For the Module A refactor, I need: (1) which API endpoints are affected, (2) should backward compatibility be maintained?"
User responds. PM Agent updates task.
Challenger Agent evaluates task quality. If the task is vague or missing critical context → opens a Discord thread with 1–3 probing questions about success criteria, dependencies, and scope. If the task is clear → passes through silently.
PM Agent packages task with full context, requirements, acceptance criteria, file references.
PM Agent dispatches to execution agent.
Execution agent works. Progress logged to activity log.
On completion → user receives summary via email + notification. Output available for review.

Challenger Agent Details¶

The Challenger Agent runs on the local LLM (challenge_task task type → local_parser alias → Ollama qwen2.5:32b) at zero API cost. It sits in the dispatcher pipeline between PM assessment and execution agent dispatch.

Behavior: - Evaluates task description richness, not just field presence (that's the PM Agent's job). - Generates 1–3 focused questions about: what "done" looks like, hidden dependencies, scope boundaries. - Returns status="complete" (no questions) if the task is well-specified. - On LLM failure, silently passes through — never blocks task creation.

Discord Integration: - When questions are needed, a Discord thread is created on the original task message. - User replies in-thread are appended to task description/notes. - One round of follow-up per task (thread closes after first reply).

LLM-Generated Nudges & Reminders¶

Overdue nudges and pre-task reminders are generated by the local LLM (generate_nudge / generate_reminder task types → local_parser). This replaces hardcoded template strings with contextual, Donna-persona messages at zero API cost.

Nudge generation (overdue.py): - Prompt includes task title, domain, priority, overdue duration, nudge count, and reschedule count. - Tone escalates based on nudge history: friendly → firm → assertive. - If reschedule_count > 3, calls out the pattern directly. - Fallback: original template string if Ollama is unreachable.

Reminder generation (reminders.py): - Pre-task reminder 15 minutes before scheduled start. - Prompt includes task context and description for personalized reminders. - Fallback: "⏰ '{title}' starts in 15 minutes. Duration: {duration}.".

Nudge tracking: - Every nudge is persisted to the nudge_events table (type, channel, tier, message, LLM flag). - tasks.nudge_count is atomically incremented on each nudge. - Stats available via Database.get_weekly_stats() for the weekly digest.

Weekly Efficiency Digest¶

Fires every Sunday at 7 PM UTC via the WeeklyDigest class. Assembles task completion stats from the past 7 days and generates a Donna-voiced efficiency report.

Stats collected: - Tasks completed vs created, completion rate percentage - Average time to complete (hours) - Top 5 most-nudged tasks (by nudge_count) - Top 5 most-rescheduled tasks (by reschedule_count) - Domain breakdown (completed, open, avg nudges per domain) - Total nudges sent this week - LLM cost this week (from invocation_log)

Output: - LLM-generated report (generate_weekly_digest → local_parser) posted as Discord embed in #donna-digest. - Donna persona: 2–3 sentence summary, one observed pattern, one actionable suggestion. - Fallback: plain-text stats table if Ollama is down.

Configuration: Task type generate_weekly_digest in config/task_types.yaml, routed to local_parser with parser (Claude) fallback in config/donna_models.yaml.

Agent Safety Constraints (Non-Negotiable)¶

Enforced at the system level, not reliant on agent prompting:

Constraint	Enforcement
No sending emails externally	Gmail API scoped to draft-only by default. Send scope gated behind feature flag (disabled by default). Enabling requires config change + OAuth re-auth.
No deleting files	Filesystem is append/modify only. Deletes require explicit user command.
No pushing to main/production	GitHub API restricts to feature branches. Branch protection at GitHub level.
No purchases or financial transactions	No payment APIs integrated. No browser automation.
No modifying user-created calendar events	Scheduler only modifies events tagged `donnaManaged: true`.
Backup before code changes	Coding agent creates git stash or branch backup before any file modification.
Agent timeout	Configurable per invocation (default 10 min coding, 5 min research). Timeout → user notification + `agent_status = failed`.

Principle: All agents start minimal. Constraints relaxed only after reviewing logged performance and explicitly updating config.

Local LLM Tool Use Progression¶

RTX 3090 is available. Local model validation on basic parsing is the prerequisite for each stage.

Stage 1: Read-Only Tools, Single Call (Month 1)¶

Tools: task_db_read, calendar_read
Purpose: Context enrichment during parsing (dedup check, resolving "before my meeting" to actual time)
Evaluation: Offline harness + shadow mode with Claude
Promotion threshold: 90%+ accuracy on tool selection and parameters over 100+ samples

Stage 2: Conditional Tool Use (Month 2)¶

Challenge: Model decides whether to use a tool. "buy milk" = no tool; "buy milk before my 3pm meeting" = calendar_read
Tracking: Log unnecessary tool calls (false positive) and missed tool calls (false negative)
Promotion threshold: 85%+ precision and recall over 100+ samples

Stage 3: Write Tools with Guardrails (Month 3, if Stage 2 solid)¶

Tools: task_db_write (create tasks directly)
Guardrails: Model proposes write → orchestrator validates against schema → rejects malformed entries. Model never writes to calendar or triggers notifications directly.
Evaluation: Compare model-proposed entries against Claude/human-created entries from same input.

Tool Execution Architecture¶

Model never calls tools directly. Flow:

Model outputs tool call request
Orchestrator validates (is this tool allowed for this task type? parameters well-formed?)
Orchestrator executes via integration module
Result fed back to model

Tool access per task type defined in config/task_types.yaml. A task type with tools: [calendar_read] cannot result in a task_db_write call regardless of what the model requests.