Server Operations Guide¶
How to operate the OpenClaw server — the box that hosts the ReadyMode bot, the Aurora marketplace, and the WhatsApp owner-policy daemon. The audience is "an engineer (or Juanes) who has never SSH'd in before but knows their way around Linux." If Miguel is unavailable, this doc should let you connect, diagnose, and recover the server independently.
This doc focuses on the server. For bot internals see engineer-onboarding.md and architecture.md. For the bot's user-facing operations see operations.md. For the hardening posture and audit findings see security-audit.md.
1. Server Identity¶
| Field | Value |
|---|---|
| Hostname | OpenClow |
| Public IP | 159.89.179.179 |
| Provider | DigitalOcean droplet (KVM) |
| OS | Ubuntu 24.04.4 LTS |
| Kernel | 6.8.0-110-generic |
| Architecture | x86-64 |
| vCPU / RAM / Disk | 2 vCPU · 1.9 GiB RAM · 87 GiB root |
| Timezone | UTC |
| Public hostnames | agent.nheo.ai, clow.nheo.ai, aurora.nheo.ai (all via Cloudflare Tunnel) |
Single-server deployment — no staging box, no load balancer, no warm spare. Operating on this server is operating on production. Treat every change as a production change.
2. SSH Access¶
Connection¶
| Setting | Value | Source file |
|---|---|---|
| Port | 22022 (non-standard, NHE-29) |
/etc/ssh/sshd_config.d/99-hardening.conf |
| Password auth | DISABLED | /etc/ssh/sshd_config.d/60-cloudimg-settings.conf |
| Root login | DISABLED | /etc/ssh/sshd_config.d/99-hardening.conf |
| X11 forwarding | DISABLED | 99-hardening.conf |
| TCP forwarding | DISABLED | 99-hardening.conf |
| MaxAuthTries | 3 | 99-hardening.conf |
| LoginGraceTime | 30s | 99-hardening.conf |
| Banner | /etc/issue.net |
99-hardening.conf |
Common pitfall — "SSH timed out, must be the firewall." SSH is on 22022, not 22. Before blaming firewall rules, verify the port:
nc -zv 159.89.179.179 22022. The local hardened firewall does not block egress.
Accounts¶
| User | UID | Home | Shell | Purpose | Sudo? |
|---|---|---|---|---|---|
miguel |
1002 | /home/miguel |
bash | Operator (Miguel Legarda) | Yes — NOPASSWD: ALL via /etc/sudoers.d/90-miguel |
juanesteban |
1001 | /home/juanesteban |
bash | Operator (Juan Esteban) | Yes — NOPASSWD: ALL via /etc/sudoers.d/90-juanesteban |
nat |
1003 | /home/nat |
bash | Operator (Natalia) — added 2026-05-18 | Yes — NOPASSWD: ALL via /etc/sudoers.d/90-nat |
agent |
1000 | /home/agent |
bash | Service account — owns /home/agent/.openclaw/, runs gateway/worker/dispatcher |
No (sudo only via the bg-worker allowlist) |
do-agent |
999 | (system) | nologin | DigitalOcean droplet monitoring | No |
root |
0 | /root |
bash | System root — direct SSH disabled (PermitRootLogin no), reach it with sudo -i |
n/a |
miguel, juanesteban, and nat are the three human operators. Decision is fixed (per the Phase-3 hardening sprint): all retain NOPASSWD: ALL. Any additional human account requires explicit approval — the user table is small and audited.
agent does not have general sudo and must not get it — the bot's blast radius is contained by that boundary. A small allowlist of bg-worker sudo commands lives at /etc/sudoers.d/openclaw-bg-allowlist (with the runtime config at /home/agent/.openclaw/workspace/runtime/bg_sudo_allowlist.json).
Existing SSH keys¶
Each account's key is in its own ~/.ssh/authorized_keys. Snapshot at the time this guide was written:
| Account | Key type | Comment / fingerprint hint |
|---|---|---|
miguel |
ssh-rsa |
miguel@arpagrowth |
juanesteban |
ssh-ed25519 |
juanespg03@gmail.com |
nat |
ssh-ed25519 |
juanesteban (key generated on Nat's machine — added 2026-05-18 by Miguel) |
agent |
ssh-ed25519 |
openclaw-do (used for cross-machine bot automation) |
root |
ssh-ed25519 |
openclaw-do — inactive at runtime because PermitRootLogin no; kept for emergency single-user/recovery flows |
Nat's SSH command:
Adding a new operator¶
Password auth is disabled — the only way in is an SSH key. The pattern is one Linux account per operator, with their own home and their own authorized_keys (do not pile keys onto someone else's account).
# As miguel (or juanesteban):
NEW=jane
sudo useradd -m -s /bin/bash "$NEW"
sudo install -d -m 700 -o "$NEW" -g "$NEW" "/home/$NEW/.ssh"
sudo install -m 600 -o "$NEW" -g "$NEW" /dev/null "/home/$NEW/.ssh/authorized_keys"
echo 'ssh-ed25519 AAAA... jane@laptop' | sudo tee -a "/home/$NEW/.ssh/authorized_keys"
# Sudo (only if the operator is meant to administer the box):
echo "$NEW ALL=(ALL) NOPASSWD: ALL" | sudo tee "/etc/sudoers.d/90-$NEW"
sudo chmod 440 "/etc/sudoers.d/90-$NEW"
sudo visudo -cf "/etc/sudoers.d/90-$NEW" # syntax-check before logout
Test from the operator's workstation: ssh -p 22022 -i ~/.ssh/<their_key> jane@159.89.179.179. Do not add the new user to a shared account — accountability and lastlog rely on one human per UID.
3. Service Status¶
Long-running systemd units¶
| Unit | What it does | Owner | Safe to restart? |
|---|---|---|---|
openclaw-gateway.service |
Discord ↔ agent LLM bridge — the live bot serving Arpa Growth | agent | No — explicit signal from Miguel required |
openclaw-bg-worker.service |
Executes background /bg run jobs |
agent | Yes — see RUNBOOK_BG_WORKER |
openclaw-bg-dispatcher.service |
Routes /bg chat commands to the worker |
agent | Yes — see RUNBOOK_BG_WORKER |
openclaw-owner-policy.service |
WhatsApp command parser (owner policy daemon) | agent | Ask first |
aurora.service |
Aurora Next.js app (serves aurora.nheo.ai on 127.0.0.1:3000) |
aurora user | Ask first |
caddy.service |
Reverse proxy (HTTP → upstream) | root | Yes — caddy reload is graceful |
cloudflared.service |
Cloudflare Tunnel (public hostnames → loopback ports) | root | Yes (stateless) |
docker.service |
Container runtime (Aurora DB + pgAdmin) | root | Yes — live-restore: true keeps containers up |
fail2ban.service |
SSH brute-force jail | root | Yes (stateless) |
auditd.service |
Kernel audit subsystem | root | Yes |
unattended-upgrades.service |
Auto security patches | root | Yes |
Quick status snapshot¶
# All openclaw + supporting services up?
systemctl --no-pager --type=service --state=running \
| grep -E 'openclaw|aurora|caddy|cloudflared|docker|fail2ban'
# One unit at a time
systemctl status openclaw-gateway --no-pager
systemctl status caddy --no-pager
systemctl status cloudflared --no-pager
# Anything failed?
systemctl --no-pager --failed
Docker containers¶
| Container | Image purpose | Bound port |
|---|---|---|
aurora-postgres |
Aurora marketplace Postgres | 127.0.0.1:5432 (loopback only) |
aurora-pgadmin |
Web admin for the DB | 127.0.0.1:5050 (loopback only) |
Both ports are deliberately loopback-only. Reach them from a workstation with an SSH tunnel — see § 6.
sudo docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
sudo docker stats --no-stream # CPU/RAM per container
sudo docker compose -f /opt/aurora-infra/docker-compose.yml ps
Bg worker heartbeat¶
sudo cat /home/agent/.openclaw/workspace/runtime/worker-heartbeat.json # < 60s old
sudo /home/agent/.openclaw/workspace/bin/bgctl.py queue # active jobs
pstruncates usernames > 8 chars.cloudflaredshows up ascloudfl+. In any verify script useps -eo user:32,cmdto widen the column, or you'll think a service is missing when it's only truncated.
4. Logs¶
journalctl is the source of truth for every service on this box. The patterns below are the ones you'll use in 90% of incidents.
| Service | Live tail | Last 200 lines |
|---|---|---|
| openclaw-gateway | sudo journalctl -u openclaw-gateway -f |
sudo journalctl -u openclaw-gateway -n 200 --no-pager |
| openclaw-bg-worker | sudo journalctl -u openclaw-bg-worker -f |
sudo journalctl -u openclaw-bg-worker -n 200 --no-pager |
| openclaw-bg-dispatcher | sudo journalctl -u openclaw-bg-dispatcher -f |
sudo journalctl -u openclaw-bg-dispatcher -n 200 --no-pager |
| openclaw-owner-policy | sudo journalctl -u openclaw-owner-policy -f |
sudo journalctl -u openclaw-owner-policy -n 200 --no-pager |
| aurora | sudo journalctl -u aurora -f |
sudo journalctl -u aurora -n 200 --no-pager |
| caddy | sudo journalctl -u caddy -f |
sudo journalctl -u caddy -n 200 --no-pager |
| cloudflared | sudo journalctl -u cloudflared -f |
sudo journalctl -u cloudflared -n 200 --no-pager |
| fail2ban | sudo journalctl -u fail2ban -f |
sudo fail2ban-client status sshd |
| Audit (kernel) | sudo tail -f /var/log/audit/audit.log |
sudo ausearch -ts today \| tail |
Per-job bg logs¶
sudo /home/agent/.openclaw/workspace/bin/bgctl.py logs <job_id> --tail 120
sudo ls /home/agent/.openclaw/workspace/runtime/jobs/ # raw job log dir
Bot conversation history¶
# All sessions for a given agent (JSONL, one event per line — pipe through jq)
sudo ls /home/agent/.openclaw/agents/main/sessions/
sudo tail -n 200 /home/agent/.openclaw/agents/main/sessions/<file>.jsonl | jq
Time-bounded queries¶
# Last 10 minutes across the whole system
sudo journalctl --since '10 minutes ago' --no-pager
# Errors only, last hour, multiple units
sudo journalctl -p err --since '1 hour ago' \
-u openclaw-gateway -u aurora -u caddy -u cloudflared --no-pager
5. Manually Triggering Scripts¶
When the bot is broken, paused, or you need to sanity-check a single operation, you can run the bash scripts directly. The scripts live under the agent account.
Where the scripts are¶
/home/agent/.openclaw/workspace/skills/readymode-support/scripts/
├── _lib.sh # shared helpers (login, dash_link click, native value setter)
├── clear_licenses.sh # Operation 1
├── create_user.sh # Operation 3 (steps 1–4 only)
├── reset_leads.sh # Operation 2 (returns unavailable until G05)
└── upload_leads.sh # Operation 4
Running a script as agent¶
# Switch to the service account (agent has no login shell by default; use sudo)
sudo -iu agent
# Inside agent's shell:
cd /home/agent/.openclaw/workspace
bash skills/readymode-support/scripts/clear_licenses.sh
Dry-run mode (no production hits)¶
Always start here when you're unsure:
sudo -iu agent
cd /home/agent/.openclaw/workspace
OPENCLAW_DRY_RUN=true OPENCLAW_LOG_LEVEL=debug \
bash skills/readymode-support/scripts/clear_licenses.sh
OPENCLAW_DRY_RUN=true skips CDP / browser actions and prints what the script would do. Pair with OPENCLAW_LOG_LEVEL=debug for verbose output.
Running the gateway manually¶
If the systemd unit is stopped and you want to bring the bot up under your terminal (e.g. to see live logs while you reproduce a bug):
sudo systemctl stop openclaw-gateway
sudo -iu agent
/home/agent/.npm-global/bin/openclaw gateway \
--port 18789 --bind loopback --ws-log compact
Ctrl-C to stop. systemctl start openclaw-gateway puts it back under systemd.
Don't run the bot in foreground while the systemd unit is also active — both will try to bind the same port (
18789) and the second instance will exit immediately, or worse, the running instance will keep stale state while you think you're testing fresh.
Submitting a one-off background job¶
The /bg chat command is also available as a CLI:
sudo /home/agent/.openclaw/workspace/bin/bgctl.py queue
sudo /home/agent/.openclaw/workspace/bin/bgctl.py status <job_id>
sudo /home/agent/.openclaw/workspace/bin/bgctl.py logs <job_id> --tail 120
sudo /home/agent/.openclaw/workspace/bin/bgctl.py cancel <job_id>
Job states: QUEUED | RUNNING | DONE | FAILED | CANCELED | TIMEOUT. Full contract in RUNBOOK_BG_WORKER.md on the server.
6. Database Access¶
The Aurora Postgres instance and the pgAdmin web UI run as Docker containers under /opt/aurora-infra/. Both are bound to localhost only — there is no public DB endpoint.
Where the credentials live¶
The .env defines:
| Variable | Used for |
|---|---|
POSTGRES_DB |
Database name |
POSTGRES_USER |
Postgres superuser |
POSTGRES_PASSWORD |
Postgres superuser password |
PGADMIN_DEFAULT_EMAIL |
pgAdmin login email |
PGADMIN_DEFAULT_PASSWORD |
pgAdmin login password |
Never commit, paste, or transcribe
.envover an untrusted channel. If you need a teammate to have credentials, share via SOPS (NHE-66) or a 1-time-secret link, not Slack/email.
Option A — psql inside the container (fastest)¶
# Open a shell in the postgres container as the configured user
sudo docker exec -it aurora-postgres bash -lc \
'psql -U "$POSTGRES_USER" -d "$POSTGRES_DB"'
# Or one-shot a query
sudo docker exec -i aurora-postgres psql -U postgres -d postgres \
-tAc 'SELECT COUNT(*) FROM "User";'
Option B — psql from the host¶
# Read creds from the env file (don't echo to a shared terminal)
set -a; source /opt/aurora-infra/.env; set +a
psql "host=127.0.0.1 port=5432 dbname=$POSTGRES_DB user=$POSTGRES_USER"
unset POSTGRES_PASSWORD
Option C — psql / pgAdmin from your workstation (SSH tunnel)¶
# Tunnel both DB (5432) and pgAdmin (5050) over SSH
ssh -p 22022 \
-L 5432:127.0.0.1:5432 \
-L 5050:127.0.0.1:5050 \
miguel@159.89.179.179
While the tunnel is open:
- psql:
psql "host=127.0.0.1 port=5432 dbname=<db> user=<user>"— password from/opt/aurora-infra/.env. - pgAdmin: open
http://127.0.0.1:5050in your browser. Login withPGADMIN_DEFAULT_EMAIL/PGADMIN_DEFAULT_PASSWORD. Add a server pointing athost=postgres(the container name)port=5432.
Useful one-liners¶
# Health
sudo docker ps --filter name=aurora-postgres --format '{{.Status}}'
# Live size of the DB
sudo docker exec aurora-postgres psql -U postgres -d postgres \
-tAc "SELECT pg_size_pretty(pg_database_size(current_database()));"
# Top 10 tables by row count
sudo docker exec aurora-postgres psql -U postgres -d postgres -c \
"SELECT relname, n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC LIMIT 10;"
# Migrations applied
sudo docker exec aurora-postgres psql -U postgres -d postgres -c \
"SELECT migration_name, finished_at FROM _prisma_migrations ORDER BY finished_at DESC LIMIT 10;"
For backup/restore, see § 9 and RUNBOOK_BACKUP_RESTORE.md on the server.
7. Reverse Proxy & Public DNS¶
Public traffic reaches this server in two ways: Cloudflare Tunnel (cloudflared) and Caddy (local reverse proxy). Inbound port 80/443 on the public IP is firewalled — there is no direct ingress. Everything public goes through the tunnel.
Cloudflare Tunnel (cloudflared)¶
| Field | Value |
|---|---|
| Tunnel ID | 44d9537e-3633-4103-a970-1ab7737115d2 |
| Credentials | /etc/cloudflared/44d9537e-3633-4103-a970-1ab7737115d2.json |
| Config | /etc/cloudflared/config.yml |
| systemd unit | cloudflared.service |
Current ingress rules:
| Public hostname | Forwards to | Served by |
|---|---|---|
agent.nheo.ai |
http://127.0.0.1:18789 |
openclaw-gateway |
clow.nheo.ai |
http://127.0.0.1:18789 |
openclaw-gateway (alias) |
aurora.nheo.ai |
http://127.0.0.1:3000 |
aurora.service (Next.js) |
| (anything else) | http_status:404 |
— |
# View / edit the tunnel config
sudo cat /etc/cloudflared/config.yml
sudo $EDITOR /etc/cloudflared/config.yml
# Reload after editing
sudo systemctl reload cloudflared # or: restart, if reload not supported
sudo systemctl status cloudflared --no-pager
# Live ingress test
curl -fsS -H "Host: agent.nheo.ai" http://127.0.0.1:18789/healthz
curl -fsS -H "Host: aurora.nheo.ai" http://127.0.0.1:3000/
DNS for *.nheo.ai is managed in Cloudflare. The tunnel handles TLS termination at the edge — there is no certificate on this server.
Caddy¶
Caddy is the local reverse proxy. It currently only fronts aurora.nheo.ai for direct (non-tunnel) HTTP access.
| Field | Value |
|---|---|
| Binary | /usr/bin/caddy |
| Config | /etc/caddy/Caddyfile |
| systemd unit | caddy.service |
Current Caddyfile:
Editing and reloading:
# Validate before reloading
sudo caddy validate --config /etc/caddy/Caddyfile
# Graceful reload (no dropped connections)
sudo systemctl reload caddy
# or, if you want Caddy itself to do the reload via its admin API:
sudo caddy reload --config /etc/caddy/Caddyfile
# Inspect what Caddy is actually serving
sudo systemctl status caddy --no-pager
sudo journalctl -u caddy -n 100 --no-pager
Adding a new public hostname¶
- Decide where it terminates: Cloudflare Tunnel (preferred — zero firewall changes) or Caddy (only if you need on-box TLS or special headers).
- Tunnel route: add an
ingressblock in/etc/cloudflared/config.ymlabove the catch-allhttp_status:404line, thensystemctl reload cloudflared. Add the DNS CNAME in Cloudflare to the tunnel. - Caddy route: add a site block in
/etc/caddy/Caddyfile,caddy validate,systemctl reload caddy. - Test with
curl -fsS -H "Host: <name>" http://127.0.0.1:<port>/.
8. Common Troubleshooting¶
Triage flow (when "something is wrong")¶
systemctl --no-pager --failed— anything red?df -h /— disk full?free -h— out of memory?sudo docker ps— both Aurora containers up?sudo journalctl --since '10 minutes ago' -p err --no-pager— recent errors?- Drill into the specific unit:
journalctl -u <unit> -n 200 --no-pager.
Symptom → first check¶
| Symptom | First check | Likely cause |
|---|---|---|
| SSH "Connection timed out" | nc -zv 159.89.179.179 22022 |
Wrong port (22 vs 22022) |
| SSH "Connection refused" | sudo fail2ban-client status sshd |
Banned IP after MaxAuth 3 |
| Bot stops responding in Discord | systemctl status openclaw-gateway + journalctl -u openclaw-gateway -n 200 |
Gateway crashed; or ReadyMode session collision (NHE-56) |
agent.nheo.ai returns 502 |
curl -I http://127.0.0.1:18789/healthz; journalctl -u cloudflared -n 100 |
Gateway down or tunnel disconnected |
aurora.nheo.ai returns 502 |
curl -I http://127.0.0.1:3000/; systemctl status aurora caddy |
Aurora app or Caddy down |
/bg commands hang |
bgctl.py queue, journalctl -u openclaw-bg-worker -n 200 |
Worker stuck or dispatcher down |
| Bot reports "license error" loop | Charlie hasn't assigned agents to Office Map | NHE-G05 — external blocker |
| ReadyMode "logged out" mid-operation | Bot and Miguel share manager account |
NHE-56 (G22) — don't log into ReadyMode while the bot is running |
| Aurora DB unreachable from app | docker ps; docker logs aurora-postgres --tail 50 |
Container restart loop or healthcheck failure |
| Disk filling up | du -xhd 1 /var/log /home/agent/.openclaw/agents /opt/aurora-infra/backups |
Docker JSON logs (capped 10m × 3), session JSONL growth, or stale dumps |
| RAM/CPU spike | top, sudo docker stats --no-stream, journalctl --since '10 minutes ago' |
Bot retry loop, runaway bash subshell, or container OOM |
sudo from agent denied |
cat /home/agent/.openclaw/workspace/runtime/bg_sudo_allowlist.json |
Command not in allowlist (by design) |
Disk & memory checks¶
df -h / # root usage
du -xhd 1 / 2>/dev/null | sort -h | tail -20 # biggest top-level dirs
sudo journalctl --disk-usage # journal size
sudo docker system df # docker disk usage
free -h # RAM + swap
sudo dmesg -T | tail -50 # kernel-level errors (OOM kills, etc.)
Restart what is safe to restart¶
# SSHD config change — graceful, keeps your session alive
sudo systemctl reload sshd
# UFW / fail2ban — stateless
sudo systemctl restart fail2ban
sudo ufw reload
# Caddy / cloudflared — graceful reload
sudo systemctl reload caddy
sudo systemctl reload cloudflared
# Docker — live-restore keeps containers running
sudo systemctl restart docker
# Bg worker / dispatcher — RUNNING jobs auto-mark FAILED on next start
sudo systemctl restart openclaw-bg-worker
sudo systemctl restart openclaw-bg-dispatcher
When the bot misbehaves but the box looks fine¶
The bot is built to fail loud, not to recover silently. If openclaw-gateway is running, journal is clean, but Discord users say "it's broken":
- Open the relevant agent's session JSONL:
sudo tail -n 200 /home/agent/.openclaw/agents/main/sessions/*.jsonl | jq - Look for the last
tool_useand its result — the failure is almost always at the script ↔ LLM boundary, not the kernel. - Cross-reference
RUNBOOK_READYMODE_DEDUP.mdand the dedup timer pattern (NHE-64) before assuming a real outage.
AppArmor edits¶
The bot's profile (openclaw-bg-dispatcher) is inherited by anything agent launches under sudo, including sudo python3. To edit /etc/apparmor.d/ files without the profile getting in the way:
Plain sudo python3 will inherit the profile and silently fail at the policy boundary.
9. Hardening Profile (Summary)¶
The full posture lives in SECURITY-RUNBOOK.md on the server (and the audit baseline lives in security-audit.md). Quick reference of what is enforced in production today:
| Layer | Control |
|---|---|
| Network (inbound) | UFW default-deny; allow 22022/tcp, 80/tcp, 443/tcp+udp only |
| Brute-force | fail2ban SSH jail (maxretry 3, bantime 1h) |
| SSH | Key-only, port 22022, no root, no X11/TCP fwd, MaxAuth 3, LoginGrace 30s |
| Kernel | dmesg_restrict=1, randomize_va_space=2, rp_filter=2, syncookies, no redirects |
| Filesystem | /dev/shm mounted noexec,nosuid,nodev (persisted in /etc/fstab) |
| Docker | no-new-privileges, icc=false, userland-proxy=false, log size cap, live-restore |
| Audit | auditd running; logs in /var/log/audit/ |
| Patches | unattended-upgrades (security only) |
| Secrets | SOPS + age (NHE-66); master key at /root/.config/sops/age/keys.txt, paper-backed |
Verify the live config with the commands in SECURITY-RUNBOOK.md § Verify Commands. Don't trust this table over the server itself — re-run the verifies before relying on a property.
Not yet enforced (deferred)¶
- Docker
userns-remap— needs container recreation; planned maintenance window - AIDE file-integrity monitoring — Phase 4
- Fail2ban recidive jail — optional
10. Backups & Recovery¶
Source of truth: RUNBOOK_BACKUP_RESTORE.md on the server. Quick reference:
| Component | What | Where | Cadence | RPO |
|---|---|---|---|---|
| Aurora Postgres | pg_dump --format=custom + sha256 sidecar |
/opt/aurora-infra/backups/ |
Daily ~03:18 UTC | 24h |
| Aurora offsite | restic snapshots to Cloudflare R2 (NHE-66) | bucket claw-aurora/postgres |
Daily | 24h |
| Agent sessions | per-agent JSONL conversation logs | /home/agent/.openclaw/agents/<agent>/sessions/ |
None today | n/a |
| SOPS age key | master decryption key | /root/.config/sops/age/keys.txt, paper + USB offsite |
One-time backup | n/a |
Run the restore drill quarterly — a backup that has never been restored is not a backup. Drill commands and expected timing live in RUNBOOK_BACKUP_RESTORE.md. Append a row to its drill log every time.
Bare-metal recovery sketch¶
- Provision a fresh DigitalOcean droplet, install Docker + docker-compose.
- Restore
/opt/aurora-infra/from the most recent restic snapshot. - Pull the latest
aurora_*.dumpfrom R2 if local disk is gone. docker compose up -d aurora-postgres, thenpg_restoreper the runbook.- Restore SOPS age key to
/root/.config/sops/age/keys.txtfrom paper/USB. - Decrypt
openclaw.env.encand place it at/etc/openclaw.env. - Re-create the cloudflared tunnel config (or restore
/etc/cloudflared/) and startcloudflared.service. - Bring the bot up and run a no-op operation to confirm DB connectivity.
11. Maintenance & Change Control¶
- Change approval pattern: prepare the artifact (don't activate), present a short table — time to apply / downtime / rollback — and wait for the green light. This is how every hardening phase has shipped on this box.
- Audit before exec: plans more than a few weeks old need a fresh server-side audit (services, paths, perms) before applying. Report any drift before changing anything.
- Rotation deferral: when secret rotation needs an external console Miguel doesn't have access to, deliver the at-rest component (encrypted, deployed, decryptable) and defer the rotation step to a natural trigger.
- No autonomous LLM in prod: we do not give the agent a self-modifying loop on this server. Any "self-improving" feature lands as a scan + ask flow with a human in the loop.
12. Where to Look Next¶
| You want to | Read |
|---|---|
| Understand the bot's architecture | architecture.md, engineer-onboarding.md |
| Trace a specific bot operation | operations.md |
| See past incidents and lessons | incidents.md |
| Audit the security posture | security-audit.md + SECURITY-RUNBOOK.md (server-side) |
| Run a backup/restore drill | RUNBOOK_BACKUP_RESTORE.md (server-side) |
Operate /bg jobs in detail |
RUNBOOK_BG_WORKER.md (server-side) |
| Track open requirements | compliance-matrix.md, gap-analysis-roadmap.md |