Insights Engine¶

The insights engine runs SQL analytics over the invocation_log table to surface cost centers, prompt group patterns, quality/cost mismatches, and token-bloat outliers for the Claude Inspector dashboard.

Realizes: spec_v3.md §27 (Admin API -- cost analytics), budget reporting requirements from spec_v3.md §13

Overview¶

The insights engine (src/donna/insights/engine.py) is a pure analytics module with no persistent state of its own. It accepts an open aiosqlite connection and a configurable look-back window, then executes four SQL queries against invocation_log to produce structured insight categories. Results are served at GET /admin/claude/insights and rendered in the management GUI's Claude Inspector panel.

All queries exclude shadow invocations (is_shadow = 0) so that evaluation/comparison runs do not distort production cost analysis. The engine uses window functions and temporary tables for the bloat-detection pass, computing per-task-type medians and then joining back to find individual calls that exceed twice the median token count. System-prompt grouping uses the input_hash column to deduplicate semantically identical prompts and extrapolate observed cost to weekly estimates.

The module is designed to be stateless and re-entrant. Each call computes results from scratch over the requested window. No caching or materialized views are used -- the invocation_log table is small enough (SQLite on NVMe) for real-time computation over typical look-back windows (1--90 days).

Key Concepts¶

Concept	Description
Cost center	A `task_type` value (e.g., `parse_task`, `challenger`, `prep_work`) aggregated by total cost, call count, and average tokens.
System-prompt group	A cluster of invocations sharing the same `input_hash` (SHA of the system prompt). Identifies expensive repeated prompts.
Quality/cost mismatch	A task type whose average cost exceeds the global average but whose average `quality_score` falls below 0.5. Candidates for model downgrade or prompt rewrite.
Token-bloat outlier	An individual invocation whose `tokens_in` exceeds twice the median for its task type. Flags runaway context or prompt injection.
Look-back window	Number of days to analyze (default 7, max 90). Applied as `WHERE timestamp >= ?`.
Shadow exclusion	All queries filter `is_shadow = 0` to analyze only production traffic.

Architecture¶

flowchart LR
    API["/admin/claude/insights"] -->|"days param"| Engine["compute_insights()"]
    Engine --> Q1["Top Cost Centers"]
    Engine --> Q2["System Prompt Groups"]
    Engine --> Q3["Quality/Cost Mismatches"]
    Engine --> Q4["Token Bloat Outliers"]

    Q1 --> IL["invocation_log"]
    Q2 --> IL
    Q3 --> IL
    Q4 --> IL
    Q4 -->|"temp table"| MT["median_tokens"]
    MT --> IL

Insight Categories¶

Top Cost Centers -- The 10 most expensive task types in the window, ranked by total USD spend.

Field	Description
`task_type`	The task type identifier
`total_cost`	Sum of `cost_usd` for this type
`call_count`	Number of invocations
`avg_tokens_in`	Average input tokens
`avg_tokens_out`	Average output tokens

System Prompt Groups -- The 10 most expensive prompt patterns, grouped by input_hash with a minimum of 5 occurrences. Weekly cost is extrapolated from the observed window.

Field	Description
`hash`	The `input_hash` value
`call_count`	Number of invocations with this hash
`avg_tokens_in`	Average input tokens per call
`estimated_weekly_cost`	Extrapolated weekly cost based on observed window
`sample_invocation_id`	One example invocation ID for drill-down

Quality/Cost Mismatches -- Task types that are both above-average cost and below 0.5 quality score, indicating poor value.

Field	Description
`task_type`	The task type identifier
`avg_cost`	Average cost per call
`avg_quality_score`	Average quality score (0.0--1.0)
`call_count`	Number of scored invocations

Token Bloat Outliers -- The 10 most expensive individual invocations where tokens_in exceeds twice the median for that task type.

Field	Description
`invocation_id`	The specific invocation
`task_type`	Task type of this call
`tokens_in`	Actual input tokens
`median_for_type`	Median input tokens for this task type
`ratio`	`tokens_in / median_for_type`
`cost_usd`	Cost of this specific call

Bloat Detection Algorithm¶

The token-bloat query uses a two-step approach because SQLite lacks a built-in MEDIAN() aggregate:

Median computation: A temporary table median_tokens is created using ROW_NUMBER() window function partitioned by task_type, picking the middle row (rn = (cnt + 1) / 2).
Outlier join: The main query joins invocation_log back to median_tokens and filters for rows where tokens_in > 2 * median_tokens_in.
Cleanup: The temporary table is dropped after the query completes.

Configuration¶

The insights engine has no dedicated config file. Its behavior is controlled by:

Parameter	Source	Default	Description
`days`	Query parameter on `/admin/claude/insights`	7	Look-back window in days (1--90)
`payload_dir`	`app.state.payload_dir`	`data/payloads`	Reserved for future payload-level analysis

The engine depends on the invocation_log table schema, which includes task_type, cost_usd, tokens_in, tokens_out, quality_score, input_hash, is_shadow, timestamp, and payload_path columns.

API¶

Interface	Module	Description
`compute_insights(conn, payload_dir, days)`	`donna.insights.engine`	Compute all four insight categories. Returns a dict with keys `top_cost_centers`, `system_prompt_groups`, `quality_cost_mismatches`, `token_bloat_outliers`.
`GET /admin/claude/insights?days=N`	`donna.api.routes.admin_claude`	REST endpoint exposing insights to the management GUI