Architecture¶

Execution Flow¶

The full request-to-response path:

Discord message (manager/agent)
        |
        v
OpenClaw Gateway (port 18789, localhost only)
        |
        v
Agent LLM — gpt-4.1-mini
  1. Understands intent (NLP)
  2. Extracts parameters (username, states, campaigns, etc.)
  3. Asks for confirmation if destructive operation
  4. Identifies the correct bash script via decision table
  5. Executes: exec script.sh [params] (yieldMs: 120,000ms)
        |
        v
Bash Script (autonomous — no LLM involvement)
  - Login via Chrome headless CDP
  - Navigate via UI clicks (never direct URL)
  - Fill forms, click buttons, handle popups
  - Output: single-line JSON {"success": true/false, "message": "..."}
        |
        v
Agent LLM
  6. Parses JSON result
  7. Responds to manager in natural language (ES or EN)
        |
        v
Discord response in #readymode-soporte

Design Decisions¶

D1 — `tools.deny: ["browser"]`¶

Decision: The agent LLM is explicitly blocked from calling the browser tool directly. It can only execute bash scripts, which internally use openclaw browser as a subprocess.

Reasoning: This was a deliberate architectural decision made after the agent kept trying to manually guide users through steps in Discord instead of actually executing operations. This forces a single controlled path: the agent's only job is to understand the request, extract parameters, and fire the right script. All DOM logic, selectors, and browser interactions live exclusively inside the scripts — never in the agent's context.

D2 — Dispatcher Pattern, Not Conversational Guide¶

Decision: The agent is a pure dispatcher: receive request → confirm if needed → execute script → report result. It never describes how operations work internally.

Reasoning: Early versions behaved as a conversational assistant, explaining how to do things manually in Discord. This was wrong. The agent was rewritten entirely. Without the dispatcher constraint, an agent with operational knowledge will share that knowledge conversationally rather than executing.

Decision: All navigation happens by clicking links within the dashboard (using a.dash_link selectors). Direct URL navigation is only used for a small set of known-safe routes (/+Team/ManageLicenses).

Reasoning: ReadyMode is a React SPA. Navigating directly to URLs like /+Team/ManageUsers returns a blank DOM because the app hasn't bootstrapped client-side state. All navigation must happen by clicking links in the dashboard, exactly as a human would (see Incident 5).

D4 — Native Value Setter for React Inputs¶

Decision: All form field inputs use the native value setter pattern plus input event dispatch.

Pattern:

const nativeInputValueSetter = Object.getOwnPropertyDescriptor(
  HTMLInputElement.prototype, 'value'
).set;
nativeInputValueSetter.call(input, value);
input.dispatchEvent(new Event('input', { bubbles: true }));

Reasoning: ReadyMode's input fields are controlled React components. Standard input.value = 'x' doesn't trigger React's state management, so the field appears filled but submits empty. Used in login, password setting, and user creation (see Incidents 3 & 10).

D5 — Anti-Hallucination Rules (Explicit)¶

Decision: AGENTS.md contains explicit prohibitions against fabricating results.

Key rule: "If the script output is ambiguous or still running, you must report that honestly. Never fabricate success."

Reasoning: A yieldMs of 120,000ms (2 minutes) gives scripts enough time to complete before the agent evaluates the result. Polling between script checks has a minimum of 20 seconds and a maximum of 4 polls to avoid saturating the gateway (see Incidents 7 & 9).

D6 — Confirmation Before Destructive Operations¶

Decision: Before executing Clear Licenses or any operation that affects multiple agents, the bot asks the manager to confirm ("Confirmas?").

Reasoning: Operations like Reset Leads permanently delete queue data. A misunderstood command or a test message from a manager should never trigger irreversible actions without a clear confirmation step.

Server Infrastructure¶

Component	Detail
Provider	DigitalOcean NYC3
IP	159.89.179.179
OS	Ubuntu 24.04.3 LTS
RAM	2 GB + 2 GB swap
Disk	87 GB (15% used)
Swap config	`vm.swappiness=10`, `vm.vfs_cache_pressure=50`
Reverse proxy	Caddy (TLS, aurora.nheo.ai)
Tunnel	cloudflared (agent.nheo.ai)
Database	PostgreSQL 16 via Docker
DB admin	pgAdmin4 via Docker

Ports — Internal Only (localhost)¶

Port	Service
18789	OpenClaw Gateway
18792	OpenClaw Gateway WebSocket
3000	Aurora (Next.js)
5432	PostgreSQL
5050	pgAdmin4

Ports — Exposed to Internet¶

Port	Service
22	SSH
80	HTTP (redirects to 443)
443	HTTPS (Caddy)

systemd Services¶

Service	User	Description
`openclaw-gateway`	agent	Core gateway — receives Discord messages, dispatches to agent
`openclaw-bg-dispatcher`	agent	Processes background commands
`openclaw-bg-worker`	agent	Background task worker
`openclaw-owner-policy`	agent	WhatsApp message parser
`aurora`	agent	Next.js web application
`caddy`	caddy	Reverse proxy + TLS
`cloudflared`	root	Cloudflare tunnel (currently runs as root — flagged in security audit F-10)

Workspace File Structure¶

File	Purpose
`AGENTS.md`	Agent behavior rules: dispatcher pattern, confirmation logic, anti-hallucination, polling limits, bilingual response
`SKILL.md`	Decision table (which script maps to which intent) + full KB for conversational support
`SOUL.md`	Tone and personality definition
`IDENTITY.md`	Who the agent is (role, name, context)
`_lib.sh`	Shared bash functions used by all scripts
`clear_licenses.sh`	Automates the Clear Licenses flow
`reset_leads.sh`	Automates Reset Leads (currently blocked)
`create_user.sh`	Automates Create User steps 1–4 (playlist steps pending)
`upload_leads.sh`	Automates CSV upload and campaign assignment

`_lib.sh` — Shared Functions¶

Function	Purpose
`readymode_login()`	Navigates to the login page, fills credentials using the native value setter pattern, dismisses overlays post-login. Selectors: `#login-account`, `#login-password`, `.sign-in`
`readymode_logout()`	Appends `?logout=1` to the URL to force logout
`dismiss_blocking_overlays()`	Detects and closes modal overlays (e.g., `#phone_test_ui` with z-index 600) that block post-login clicks. These overlays appear inconsistently and were a significant source of early failures

OpenClaw Configuration (`openclaw.json`)¶

Setting	Value	Reason
`tools.deny`	`["browser"]`	Force all browser interaction through bash scripts
`tools.exec.backgroundMs`	`90000`	Scripts can take 15–45 seconds; 90s gives 2x buffer
`yieldMs`	`120000`	Wait up to 2 minutes for script completion before evaluating

Architecture Improvement Proposal — 3-Layer Memory¶

Proposed by: Miguel Legarda — 2026-04-16
Status: Not yet implemented — pending team decision
Estimated effort: ~2 days, low risk

The Problem¶

The current agent LLM receives the entire workspace context on every turn: AGENTS.md (164 lines), SKILL.md (175 lines), SOUL.md, IDENTITY.md, and more. This includes information the agent never directly uses — DOM selectors, React gotchas, historical incident notes, full operation flows step-by-step. The scripts handle all of that, not the agent.

Approximately 40% of the agent's context window is consumed by irrelevant technical detail. Consequences: less space for actual conversation, higher latency per turn, the agent occasionally tries to "help" with DOM details it has no business mentioning, and adding a new operation means editing a large tangled AGENTS.md file.

The Proposal — 3-Layer Memory Separation¶

Layer	Contents	Used by	Lives in
Global Memory (persistent)	Available operations, required parameters, decision table, KB troubleshooting, support phone number	Agent LLM	`OPERATIONS.md` + `KNOWLEDGE.md`
Local Memory (per session)	Behavioral rules: language, Discord format, when to confirm, anti-hallucination rules, polling limits	Agent LLM	`BEHAVIOR.md`
Script Logic (autonomous)	DOM selectors, React gotchas, SPA navigation rules, overlay handling, incident workarounds, step-by-step flows	Only the bash scripts	`_lib.sh` + inline script comments

Proposed Execution Flow (Post-Refactor)¶

Discord message
    |
    v
Agent LLM (lightweight context: BEHAVIOR + OPERATIONS + KNOWLEDGE ~140 lines)
    1. Understands user intent
    2. Extracts required parameters
    3. Looks up decision table → identifies script
    4. exec script.sh with params (yieldMs: 120,000ms)
    |
    v
Bash Script (fully autonomous — all DOM logic internal)
    - Login, navigation, form fills, drag-and-drop
    - All gotchas and workarounds live here
    - Output: JSON {"success": true/false, "message": "..."}
    |
    v
Agent LLM
    5. Parses JSON
    6. Responds in natural language
    |
    v
Discord response

File Restructuring Plan¶

Current file	Proposed destination	What changes
`AGENTS.md` (164 lines)	`BEHAVIOR.md` (~50 lines)	Only behavioral rules remain: language, format, confirmation logic, anti-hallucination, polling limits
`SKILL.md` (175 lines)	`OPERATIONS.md` (~40 lines) + `KNOWLEDGE.md` (~50 lines)	Decision table + params go to OPERATIONS. KB troubleshooting goes to KNOWLEDGE. DOM flows and few-shot examples removed
DOM flows, selectors, gotchas, incidents (currently in AGENTS.md)	Deleted from agent context	Moved to inline comments inside each bash script
`SOUL.md` + `IDENTITY.md`	No change	Already lightweight (<40 lines total)

Net result: Agent context shrinks from ~340 lines to ~140 lines — a ~60% reduction.

Benefits¶

Less hallucination risk. Without DOM selectors and technical flows in its context, the agent physically cannot improvise with selectors or explain manual steps.
Faster responses. Fewer tokens in context = lower latency per LLM call.
Easier maintenance. Adding a new operation = create a bash script + add one row to the decision table in OPERATIONS.md.
Clean debugging separation. Agent problems live in BEHAVIOR.md. Automation problems live in the scripts.
Scripts become self-documented. Moving gotchas into script comments means the next engineer opening create_user.sh will find everything they need.

Implementation Plan¶

Phase	Task	Effort
A	Create `BEHAVIOR.md`, `OPERATIONS.md`, `KNOWLEDGE.md` by splitting and trimming existing files	1 day
B	Move DOM details, selectors, and gotchas into inline comments in each bash script	0.5 days
C	Test that agent still handles all operations and KB scenarios correctly with reduced context	0.5 days
Total	Low risk — scripts do not change, only context files	~2 days

Decision needed: The team should decide whether to implement this before or after closing the remaining feature gaps (G01–G10). Since it's independent and low-risk, it can run in parallel. If G01–G04 (Create User playlist) are about to be implemented, it may be cleaner to do the refactor first so the new script is written into the already-clean architecture.