donna.llm.queue¶
donna.llm.queue
¶
LLM queue worker — two-queue priority system for GPU access.
Internal queue (Donna tasks) always takes priority over external queue (API gateway). During active hours, running external requests are preempted. See docs/superpowers/specs/archive/2026-04-11-llm-gateway-queue-design.md.
QueueFullError
¶
Bases: Exception
Raised when the external queue is at max depth.
LLMQueueWorker
¶
LLMQueueWorker(config: GatewayConfig, ollama: Any, inv_logger: Any, alerter: GatewayAlerter, rate_limiter: RateLimiter, anthropic: Any | None = None)
Two-queue worker with priority, preemption, and rate limiting.
The worker loop calls process_one() repeatedly. Each call pops one item from the appropriate queue and executes it.
Source code in src/donna/llm/queue.py
enqueue_internal
async
¶
enqueue_internal(prompt: str, model: str, max_tokens: int, json_mode: bool, task_type: str, priority: Priority = Priority.NORMAL, task_id: str | None = None, user_id: str = 'system', is_chain_continuation: bool = False) -> asyncio.Future[Any]
Enqueue a Donna internal LLM call. Returns a Future for the result.
Source code in src/donna/llm/queue.py
enqueue_external
async
¶
enqueue_external(prompt: str, model: str, max_tokens: int, json_mode: bool, caller: str | None, allow_cloud: bool) -> asyncio.Future[Any]
Enqueue an external API call. Returns a Future for the result.
Source code in src/donna/llm/queue.py
process_one
async
¶
Pop and execute one item from the appropriate queue.
Returns True if an item was processed, False if both queues are empty.
Source code in src/donna/llm/queue.py
preempt_external
async
¶
Cancel the currently running external request and re-enqueue it.
Source code in src/donna/llm/queue.py
run
async
¶
Main worker loop — runs for the lifetime of the process.
Source code in src/donna/llm/queue.py
stop
async
¶
get_status
¶
Return queue status for the /llm/queue/status endpoint.
Source code in src/donna/llm/queue.py
get_item
¶
Return full details for a queued or in-progress item by sequence number.
Source code in src/donna/llm/queue.py
reload_config
¶
Hot-reload configuration. Preserves queue contents and counters.