Skip to content

Backup & Recovery

Source of truth: RECOVERY.md at the repo root (embedded below).

# Donna Recovery Procedures

## Database Recovery from Backup

### When to Use
- Database corruption detected (PRAGMA integrity_check fails)
- Accidental data deletion
- Failed migration that can't be rolled back via Alembic

### Procedure

1. **Stop the orchestrator:**
   ```bash
   docker compose -f docker/donna-core.yml down
   ```

2. **Identify the backup to restore:**
   ```bash
   ls -la /donna/backups/daily/
   # Pick the most recent valid backup
   ```

3. **Verify backup integrity:**
   ```bash
   sqlite3 /donna/backups/daily/donna_tasks_2026-03-17.db "PRAGMA integrity_check;"
   # Should return "ok"
   ```

4. **Replace the live database:**
   ```bash
   # Move corrupted DB aside (don't delete yet)
   mv /donna/db/donna_tasks.db /donna/db/donna_tasks.db.corrupted
   mv /donna/db/donna_tasks.db-wal /donna/db/donna_tasks.db-wal.corrupted 2>/dev/null
   mv /donna/db/donna_tasks.db-shm /donna/db/donna_tasks.db-shm.corrupted 2>/dev/null

   # Restore from backup
   cp /donna/backups/daily/donna_tasks_2026-03-17.db /donna/db/donna_tasks.db
   ```

5. **Restart the orchestrator:**
   ```bash
   docker compose -f docker/donna-core.yml up -d
   ```

6. **Verify recovery:**
   - Check `/health` endpoint returns 200
   - Check morning digest includes tasks
   - Orchestrator will auto-trigger Supabase full re-sync on detecting restored DB

7. **Clean up after confirming recovery:**
   ```bash
   rm /donna/db/donna_tasks.db.corrupted
   ```

### RPO (Recovery Point Objective)
- **Maximum data loss:** 24 hours (last daily backup)
- **Effective RPO:** Near-real-time if Supabase sync was operational (task data recoverable from cloud replica)

## Failed Alembic Migration Recovery

1. Alembic runner creates a pre-migration backup automatically.
2. If migration fails mid-apply:
   ```bash
   # Restore pre-migration backup
   cp /donna/backups/pre_migration_donna_tasks.db /donna/db/donna_tasks.db
   ```
3. Fix the migration script, then re-run:
   ```bash
   alembic upgrade head
   ```

## Circuit Breaker Recovery

If Claude API is down and the circuit breaker is open:
1. Donna automatically enters degraded mode (template-based digests, raw task capture).
2. Circuit breaker tests recovery every 5 minutes.
3. On first successful API response, circuit breaker closes automatically.
4. No manual intervention needed unless the outage exceeds 24 hours.

If the outage exceeds 24 hours:
1. Check Anthropic status page for incident reports.
2. All captured tasks during outage are flagged for re-parsing.
3. On recovery, the orchestrator processes the re-parse queue automatically.

## Full System Recovery (Server Failure)

1. Provision new server or restore from server backup.
2. Install Docker and Docker Compose.
3. Clone the Donna repo.
4. Copy `.env` file (from secure backup or recreate from `.env.example`).
5. Restore database backups from offsite storage:
   ```bash
   # From Backblaze B2 or Google Cloud Storage
   b2 download-file-by-name donna-backups donna_tasks_latest.db /donna/db/donna_tasks.db
   ```
6. Start all services:
   ```bash
   docker compose -f docker/donna-core.yml -f docker/donna-monitoring.yml up -d
   ```
7. Verify health, trigger Supabase re-sync.