refactor: rename QA role to Tester and update related documentation

- Updated role references from "QA" to "Tester" in workflow and code comments.
- Revised documentation to reflect the new role structure, including role instructions and completion rules.
- Enhanced the testing guide with clearer instructions and examples for unit and E2E tests.
- Improved tools reference to align with the new role definitions and completion rules.
- Adjusted the roadmap to highlight recent changes in role configuration and workflow state machine.
This commit is contained in:
Lauren ten Hoor
2026-02-16 13:55:38 +08:00
parent 371e760d94
commit f7aa47102f
8 changed files with 928 additions and 634 deletions

View File

@@ -10,22 +10,22 @@ graph TB
direction TB direction TB
A_O["Orchestrator"] A_O["Orchestrator"]
A_GL[GitHub/GitLab Issues] A_GL[GitHub/GitLab Issues]
A_DEV["DEV (worker session)"] A_DEV["DEVELOPER (worker session)"]
A_QA["QA (worker session)"] A_TST["TESTER (worker session)"]
A_O -->|work_start| A_GL A_O -->|work_start| A_GL
A_O -->|dispatches| A_DEV A_O -->|dispatches| A_DEV
A_O -->|dispatches| A_QA A_O -->|dispatches| A_TST
end end
subgraph "Group Chat B" subgraph "Group Chat B"
direction TB direction TB
B_O["Orchestrator"] B_O["Orchestrator"]
B_GL[GitHub/GitLab Issues] B_GL[GitHub/GitLab Issues]
B_DEV["DEV (worker session)"] B_DEV["DEVELOPER (worker session)"]
B_QA["QA (worker session)"] B_TST["TESTER (worker session)"]
B_O -->|work_start| B_GL B_O -->|work_start| B_GL
B_O -->|dispatches| B_DEV B_O -->|dispatches| B_DEV
B_O -->|dispatches| B_QA B_O -->|dispatches| B_TST
end end
AGENT["Single OpenClaw Agent"] AGENT["Single OpenClaw Agent"]
@@ -33,7 +33,7 @@ graph TB
AGENT --- B_O AGENT --- B_O
``` ```
Worker sessions are expensive to start — each new spawn reads the full codebase (~50K tokens). DevClaw maintains **separate sessions per level per role** ([session-per-level design](#session-per-level-design)). When a medior dev finishes task A and picks up task B on the same project, the accumulated context carries over — no re-reading the repo. The plugin handles all session dispatch internally via OpenClaw CLI; the orchestrator agent never calls `sessions_spawn` or `sessions_send`. Worker sessions are expensive to start — each new spawn reads the full codebase (~50K tokens). DevClaw maintains **separate sessions per level per role** ([session-per-level design](#session-per-level-design)). When a medior developer finishes task A and picks up task B on the same project, the accumulated context carries over — no re-reading the repo. The plugin handles all session dispatch internally via OpenClaw CLI; the orchestrator agent never calls `sessions_spawn` or `sessions_send`.
```mermaid ```mermaid
sequenceDiagram sequenceDiagram
@@ -42,7 +42,7 @@ sequenceDiagram
participant IT as Issue Tracker participant IT as Issue Tracker
participant S as Worker Session participant S as Worker Session
O->>DC: work_start({ issueId: 42, role: "dev" }) O->>DC: work_start({ issueId: 42, role: "developer" })
DC->>IT: Fetch issue, verify label DC->>IT: Fetch issue, verify label
DC->>DC: Assign level (junior/medior/senior) DC->>DC: Assign level (junior/medior/senior)
DC->>DC: Check existing session for assigned level DC->>DC: Check existing session for assigned level
@@ -62,19 +62,20 @@ Understanding the OpenClaw model is key to understanding how DevClaw works:
### Session-per-level design ### Session-per-level design
Each project maintains **separate sessions per developer level per role**. A project's DEV might have a junior session, a medior session, and a senior session — each accumulating its own codebase context over time. Each project maintains **separate sessions per developer level per role**. A project's DEVELOPER might have a junior session, a medior session, and a senior session — each accumulating its own codebase context over time.
``` ```
Orchestrator Agent (configured in openclaw.json) Orchestrator Agent (configured in openclaw.json)
└─ Main session (long-lived, handles all projects) └─ Main session (long-lived, handles all projects)
├─ Project A ├─ Project A
│ ├─ DEV sessions: { junior: <key>, medior: <key>, senior: null } │ ├─ DEVELOPER sessions: { junior: <key>, medior: <key>, senior: null }
QA sessions: { reviewer: <key>, tester: null } TESTER sessions: { junior: null, medior: <key>, senior: null }
│ └─ ARCHITECT sessions: { junior: <key>, senior: null }
└─ Project B └─ Project B
├─ DEV sessions: { junior: null, medior: <key>, senior: null } ├─ DEVELOPER sessions: { junior: null, medior: <key>, senior: null }
└─ QA sessions: { reviewer: <key>, tester: null } └─ TESTER sessions: { junior: null, medior: <key>, senior: null }
``` ```
Why per-level instead of switching models on one session: Why per-level instead of switching models on one session:
@@ -114,6 +115,18 @@ The agent's only job after `work_start` returns is to post the announcement to T
DevClaw provides equivalent guardrails for everything except auto-reporting, which the heartbeat handles. DevClaw provides equivalent guardrails for everything except auto-reporting, which the heartbeat handles.
## Roles
DevClaw ships with three built-in roles, defined in `lib/roles/registry.ts`. All roles use the same level scheme (junior/medior/senior) — levels describe task complexity, not the role.
| Role | ID | Levels | Default Level | Completion Results |
|---|---|---|---|---|
| Developer | `developer` | junior, medior, senior | medior | done, review, blocked |
| Tester | `tester` | junior, medior, senior | medior | pass, fail, refine, blocked |
| Architect | `architect` | junior, senior | junior | done, blocked |
Roles are extensible — add a new entry to `ROLE_REGISTRY` and corresponding workflow states to get a new role. The `workflow.yaml` config can also override levels, models, and emoji per role, or disable a role entirely (`architect: false`).
## System overview ## System overview
```mermaid ```mermaid
@@ -127,10 +140,11 @@ graph TB
MS[Main Session<br/>orchestrator agent] MS[Main Session<br/>orchestrator agent]
GW[Gateway RPC<br/>sessions.patch / sessions.list] GW[Gateway RPC<br/>sessions.patch / sessions.list]
CLI[openclaw gateway call agent] CLI[openclaw gateway call agent]
DEV_J[DEV session<br/>junior] DEV_J[DEVELOPER session<br/>junior]
DEV_M[DEV session<br/>medior] DEV_M[DEVELOPER session<br/>medior]
DEV_S[DEV session<br/>senior] DEV_S[DEVELOPER session<br/>senior]
QA_R[QA session<br/>reviewer] TST_M[TESTER session<br/>medior]
ARCH[ARCHITECT session<br/>junior]
end end
subgraph "DevClaw Plugin" subgraph "DevClaw Plugin"
@@ -196,12 +210,13 @@ graph TB
CLI -->|sends task| DEV_J CLI -->|sends task| DEV_J
CLI -->|sends task| DEV_M CLI -->|sends task| DEV_M
CLI -->|sends task| DEV_S CLI -->|sends task| DEV_S
CLI -->|sends task| QA_R CLI -->|sends task| TST_M
CLI -->|sends task| ARCH
DEV_J -->|writes code, creates MRs| REPO DEV_J -->|writes code, creates PRs| REPO
DEV_M -->|writes code, creates MRs| REPO DEV_M -->|writes code, creates PRs| REPO
DEV_S -->|writes code, creates MRs| REPO DEV_S -->|writes code, creates PRs| REPO
QA_R -->|reviews code, tests| REPO TST_M -->|reviews code, tests| REPO
``` ```
## End-to-end flow: human to sub-agent ## End-to-end flow: human to sub-agent
@@ -216,7 +231,7 @@ sequenceDiagram
participant DC as DevClaw Plugin participant DC as DevClaw Plugin
participant GW as Gateway RPC participant GW as Gateway RPC
participant CLI as openclaw gateway call agent participant CLI as openclaw gateway call agent
participant DEV as DEV Session<br/>(medior) participant DEV as DEVELOPER Session<br/>(medior)
participant GL as Issue Tracker participant GL as Issue Tracker
Note over H,GL: Issue exists in queue (To Do) Note over H,GL: Issue exists in queue (To Do)
@@ -225,51 +240,51 @@ sequenceDiagram
TG->>MS: delivers message TG->>MS: delivers message
MS->>DC: status() MS->>DC: status()
DC->>GL: list issues by label "To Do" DC->>GL: list issues by label "To Do"
DC-->>MS: { toDo: [#42], dev: idle } DC-->>MS: { toDo: [#42], developer: idle }
Note over MS: Decides to pick up #42 for DEV as medior Note over MS: Decides to pick up #42 for DEVELOPER as medior
MS->>DC: work_start({ issueId: 42, role: "dev", level: "medior", ... }) MS->>DC: work_start({ issueId: 42, role: "developer", level: "medior", ... })
DC->>DC: resolve level "medior" → model ID DC->>DC: resolve level "medior" → model ID
DC->>DC: lookup dev.sessions.medior → null (first time) DC->>DC: lookup developer.sessions.medior → null (first time)
DC->>GL: transition label "To Do" → "Doing" DC->>GL: transition label "To Do" → "Doing"
DC->>GW: sessions.patch({ key: new-session-key, model: "anthropic/claude-sonnet-4-5" }) DC->>GW: sessions.patch({ key: new-session-key, model: "anthropic/claude-sonnet-4-5" })
DC->>CLI: openclaw gateway call agent --params { sessionKey, message } DC->>CLI: openclaw gateway call agent --params { sessionKey, message }
CLI->>DEV: creates session, delivers task CLI->>DEV: creates session, delivers task
DC->>DC: store session key in projects.json + append audit.log DC->>DC: store session key in projects.json + append audit.log
DC-->>MS: { success: true, announcement: "🔧 Spawning DEV (medior) for #42" } DC-->>MS: { success: true, announcement: "🔧 Spawning DEVELOPER (medior) for #42" }
MS->>TG: "🔧 Spawning DEV (medior) for #42: Add login page" MS->>TG: "🔧 Spawning DEVELOPER (medior) for #42: Add login page"
TG->>H: sees announcement TG->>H: sees announcement
Note over DEV: Works autonomously — reads code, writes code, creates MR Note over DEV: Works autonomously — reads code, writes code, creates PR
Note over DEV: Calls work_finish when done Note over DEV: Calls work_finish when done
DEV->>DC: work_finish({ role: "dev", result: "done", ... }) DEV->>DC: work_finish({ role: "developer", result: "done", ... })
DC->>GL: transition label "Doing" → "To Test" DC->>GL: transition label "Doing" → "To Test"
DC->>DC: deactivate worker (sessions preserved) DC->>DC: deactivate worker (sessions preserved)
DC-->>DEV: { announcement: "✅ DEV DONE #42" } DC-->>DEV: { announcement: "✅ DEVELOPER DONE #42" }
MS->>TG: "✅ DEV DONE #42 — moved to QA queue" MS->>TG: "✅ DEVELOPER DONE #42 — moved to TESTER queue"
TG->>H: sees announcement TG->>H: sees announcement
``` ```
On the **next DEV task** for this project that also assigns medior: On the **next DEVELOPER task** for this project that also assigns medior:
```mermaid ```mermaid
sequenceDiagram sequenceDiagram
participant MS as Main Session participant MS as Main Session
participant DC as DevClaw Plugin participant DC as DevClaw Plugin
participant CLI as openclaw gateway call agent participant CLI as openclaw gateway call agent
participant DEV as DEV Session<br/>(medior, existing) participant DEV as DEVELOPER Session<br/>(medior, existing)
MS->>DC: work_start({ issueId: 57, role: "dev", level: "medior", ... }) MS->>DC: work_start({ issueId: 57, role: "developer", level: "medior", ... })
DC->>DC: resolve level "medior" → model ID DC->>DC: resolve level "medior" → model ID
DC->>DC: lookup dev.sessions.medior → existing key! DC->>DC: lookup developer.sessions.medior → existing key!
Note over DC: No sessions.patch needed — session already exists Note over DC: No sessions.patch needed — session already exists
DC->>CLI: openclaw gateway call agent --params { sessionKey, message } DC->>CLI: openclaw gateway call agent --params { sessionKey, message }
CLI->>DEV: delivers task to existing session (has full codebase context) CLI->>DEV: delivers task to existing session (has full codebase context)
DC-->>MS: { success: true, announcement: "⚡ Sending DEV (medior) for #57" } DC-->>MS: { success: true, announcement: "⚡ Sending DEVELOPER (medior) for #57" }
``` ```
Session reuse saves ~50K tokens per task by not re-reading the codebase. Session reuse saves ~50K tokens per task by not re-reading the codebase.
@@ -304,7 +319,7 @@ sequenceDiagram
A->>QS: status({ projectGroupId: "-123" }) A->>QS: status({ projectGroupId: "-123" })
QS->>PJ: readProjects() QS->>PJ: readProjects()
PJ-->>QS: { dev: idle, qa: idle } PJ-->>QS: { developer: idle, tester: idle }
QS->>GL: list issues by label "To Do" QS->>GL: list issues by label "To Do"
GL-->>QS: [{ id: 42, title: "Add login page" }] GL-->>QS: [{ id: 42, title: "Add login page" }]
QS->>GL: list issues by label "To Test" QS->>GL: list issues by label "To Test"
@@ -312,12 +327,12 @@ sequenceDiagram
QS->>GL: list issues by label "To Improve" QS->>GL: list issues by label "To Improve"
GL-->>QS: [] GL-->>QS: []
QS->>AL: append { event: "status", ... } QS->>AL: append { event: "status", ... }
QS-->>A: { dev: idle, queue: { toDo: [#42] } } QS-->>A: { developer: idle, queue: { toDo: [#42] } }
``` ```
**Orchestrator decides:** DEV is idle, issue #42 is in To Do → pick it up. Evaluates complexity → assigns medior level. **Orchestrator decides:** DEVELOPER is idle, issue #42 is in To Do → pick it up. Evaluates complexity → assigns medior level.
### Phase 3: DEV pickup ### Phase 3: DEVELOPER pickup
The plugin handles everything end-to-end — level resolution, session lookup, label transition, state update, **and** task dispatch to the worker session. The agent's only job after is to post the announcement. The plugin handles everything end-to-end — level resolution, session lookup, label transition, state update, **and** task dispatch to the worker session. The agent's only job after is to post the announcement.
@@ -332,13 +347,13 @@ sequenceDiagram
participant PJ as projects.json participant PJ as projects.json
participant AL as audit.log participant AL as audit.log
A->>WS: work_start({ issueId: 42, role: "dev", projectGroupId: "-123", level: "medior" }) A->>WS: work_start({ issueId: 42, role: "developer", projectGroupId: "-123", level: "medior" })
WS->>PJ: readProjects() WS->>PJ: readProjects()
WS->>GL: getIssue(42) WS->>GL: getIssue(42)
GL-->>WS: { title: "Add login page", labels: ["To Do"] } GL-->>WS: { title: "Add login page", labels: ["To Do"] }
WS->>WS: Verify label is "To Do" WS->>WS: Verify label is "To Do"
WS->>TIER: resolve "medior" → "anthropic/claude-sonnet-4-5" WS->>TIER: resolve "medior" → "anthropic/claude-sonnet-4-5"
WS->>PJ: lookup dev.sessions.medior WS->>PJ: lookup developer.sessions.medior
WS->>GL: transitionLabel(42, "To Do", "Doing") WS->>GL: transitionLabel(42, "To Do", "Doing")
alt New session alt New session
WS->>GW: sessions.patch({ key: new-key, model: "anthropic/claude-sonnet-4-5" }) WS->>GW: sessions.patch({ key: new-key, model: "anthropic/claude-sonnet-4-5" })
@@ -351,98 +366,116 @@ sequenceDiagram
**Writes:** **Writes:**
- `Issue Tracker`: label "To Do" → "Doing" - `Issue Tracker`: label "To Do" → "Doing"
- `projects.json`: dev.active=true, dev.issueId="42", dev.level="medior", dev.sessions.medior=key - `projects.json`: workers.developer.active=true, issueId="42", level="medior", sessions.medior=key
- `audit.log`: 2 entries (work_start, model_selection) - `audit.log`: 2 entries (work_start, model_selection)
- `Session`: task message delivered to worker session via CLI - `Session`: task message delivered to worker session via CLI
### Phase 4: DEV works ### Phase 4: DEVELOPER works
``` ```
DEV sub-agent session → reads codebase, writes code, creates MR DEVELOPER sub-agent session → reads codebase, writes code, creates PR
DEV sub-agent session → calls work_finish({ role: "dev", result: "done", ... }) DEVELOPER sub-agent session → calls work_finish({ role: "developer", result: "done", ... })
``` ```
This happens inside the OpenClaw session. The worker calls `work_finish` directly for atomic state updates. If the worker discovers unrelated bugs, it calls `task_create` to file them. This happens inside the OpenClaw session. The worker calls `work_finish` directly for atomic state updates. If the worker discovers unrelated bugs, it calls `task_create` to file them.
### Phase 5: DEV complete (worker self-reports) ### Phase 5: DEVELOPER complete (worker self-reports)
```mermaid ```mermaid
sequenceDiagram sequenceDiagram
participant DEV as DEV Session participant DEV as DEVELOPER Session
participant WF as work_finish participant WF as work_finish
participant GL as Issue Tracker participant GL as Issue Tracker
participant PJ as projects.json participant PJ as projects.json
participant AL as audit.log participant AL as audit.log
participant REPO as Git Repo participant REPO as Git Repo
participant QA as QA Session
DEV->>WF: work_finish({ role: "dev", result: "done", projectGroupId: "-123", summary: "Login page with OAuth" }) DEV->>WF: work_finish({ role: "developer", result: "done", projectGroupId: "-123", summary: "Login page with OAuth" })
WF->>PJ: readProjects() WF->>PJ: readProjects()
PJ-->>WF: { dev: { active: true, issueId: "42" } } PJ-->>WF: { developer: { active: true, issueId: "42" } }
WF->>REPO: git pull WF->>REPO: git pull
WF->>PJ: deactivateWorker(-123, dev) WF->>PJ: deactivateWorker(-123, developer)
Note over PJ: active→false, issueId→null<br/>sessions map PRESERVED Note over PJ: active→false, issueId→null<br/>sessions map PRESERVED
WF->>GL: transitionLabel "Doing" → "To Test" WF->>GL: transitionLabel "Doing" → "To Test"
WF->>AL: append { event: "work_finish", role: "dev", result: "done" } WF->>AL: append { event: "work_finish", role: "developer", result: "done" }
WF->>WF: tick queue (fill free slots) WF->>WF: tick queue (fill free slots)
Note over WF: Scheduler sees "To Test" issue, QA slot free → dispatches QA Note over WF: Scheduler sees "To Test" issue, TESTER slot free → dispatches TESTER
WF-->>DEV: { announcement: "✅ DEV DONE #42", tickPickups: [...] } WF-->>DEV: { announcement: "✅ DEVELOPER DONE #42", tickPickups: [...] }
``` ```
**Writes:** **Writes:**
- `Git repo`: pulled latest (has DEV's merged code) - `Git repo`: pulled latest (has DEVELOPER's merged code)
- `projects.json`: dev.active=false, dev.issueId=null (sessions map preserved for reuse) - `projects.json`: workers.developer.active=false, issueId=null (sessions map preserved for reuse)
- `Issue Tracker`: label "Doing" → "To Test" - `Issue Tracker`: label "Doing" → "To Test"
- `audit.log`: 1 entry (work_finish) + tick entries if workers dispatched - `audit.log`: 1 entry (work_finish) + tick entries if workers dispatched
### Phase 6: QA pickup ### Phase 5b: DEVELOPER requests review (alternative path)
Same as Phase 3, but with `role: "qa"`. Label transitions "To Test" → "Testing". Uses the reviewer level. Instead of merging the PR themselves, a developer can leave it open for human review:
### Phase 7: QA result (4 possible outcomes)
#### 7a. QA Pass
```mermaid ```mermaid
sequenceDiagram sequenceDiagram
participant QA as QA Session participant DEV as DEVELOPER Session
participant WF as work_finish
participant GL as Issue Tracker
participant PJ as projects.json
DEV->>WF: work_finish({ role: "developer", result: "review", ... })
WF->>GL: transitionLabel "Doing" → "In Review"
WF->>PJ: deactivateWorker (sessions preserved)
WF-->>DEV: { announcement: "👀 DEVELOPER REVIEW #42" }
```
The issue sits in "In Review" until the heartbeat's **review pass** detects the PR has been merged, then automatically transitions to "To Test".
### Phase 6: TESTER pickup
Same as Phase 3, but with `role: "tester"`. Label transitions "To Test" → "Testing". Level selection determines which tester session is used.
### Phase 7: TESTER result (4 possible outcomes)
#### 7a. TESTER Pass
```mermaid
sequenceDiagram
participant TST as TESTER Session
participant WF as work_finish participant WF as work_finish
participant GL as Issue Tracker participant GL as Issue Tracker
participant PJ as projects.json participant PJ as projects.json
participant AL as audit.log participant AL as audit.log
QA->>WF: work_finish({ role: "qa", result: "pass", projectGroupId: "-123" }) TST->>WF: work_finish({ role: "tester", result: "pass", projectGroupId: "-123" })
WF->>PJ: deactivateWorker(-123, qa) WF->>PJ: deactivateWorker(-123, tester)
WF->>GL: transitionLabel(42, "Testing", "Done") WF->>GL: transitionLabel(42, "Testing", "Done")
WF->>GL: closeIssue(42) WF->>GL: closeIssue(42)
WF->>AL: append { event: "work_finish", role: "qa", result: "pass" } WF->>AL: append { event: "work_finish", role: "tester", result: "pass" }
WF-->>QA: { announcement: "🎉 QA PASS #42. Issue closed." } WF-->>TST: { announcement: "🎉 TESTER PASS #42. Issue closed." }
``` ```
**Ticket complete.** Issue closed, label "Done". **Ticket complete.** Issue closed, label "Done".
#### 7b. QA Fail #### 7b. TESTER Fail
```mermaid ```mermaid
sequenceDiagram sequenceDiagram
participant QA as QA Session participant TST as TESTER Session
participant WF as work_finish participant WF as work_finish
participant GL as Issue Tracker participant GL as Issue Tracker
participant PJ as projects.json participant PJ as projects.json
participant AL as audit.log participant AL as audit.log
QA->>WF: work_finish({ role: "qa", result: "fail", projectGroupId: "-123", summary: "OAuth redirect broken" }) TST->>WF: work_finish({ role: "tester", result: "fail", projectGroupId: "-123", summary: "OAuth redirect broken" })
WF->>PJ: deactivateWorker(-123, qa) WF->>PJ: deactivateWorker(-123, tester)
WF->>GL: transitionLabel(42, "Testing", "To Improve") WF->>GL: transitionLabel(42, "Testing", "To Improve")
WF->>GL: reopenIssue(42) WF->>GL: reopenIssue(42)
WF->>AL: append { event: "work_finish", role: "qa", result: "fail" } WF->>AL: append { event: "work_finish", role: "tester", result: "fail" }
WF-->>QA: { announcement: "❌ QA FAIL #42 — OAuth redirect broken. Sent back to DEV." } WF-->>TST: { announcement: "❌ TESTER FAIL #42 — OAuth redirect broken. Sent back to DEVELOPER." }
``` ```
**Cycle restarts:** Issue goes to "To Improve". Next heartbeat, DEV picks it up again (Phase 3, but from "To Improve" instead of "To Do"). **Cycle restarts:** Issue goes to "To Improve". Next heartbeat, DEVELOPER picks it up again (Phase 3, but from "To Improve" instead of "To Do").
#### 7c. QA Refine #### 7c. TESTER Refine
``` ```
Label: "Testing" → "Refining" Label: "Testing" → "Refining"
@@ -450,14 +483,14 @@ Label: "Testing" → "Refining"
Issue needs human decision. Pipeline pauses until human moves it to "To Do" or closes it. Issue needs human decision. Pipeline pauses until human moves it to "To Do" or closes it.
#### 7d. Blocked (DEV or QA) #### 7d. Blocked (DEVELOPER or TESTER)
``` ```
DEV Blocked: "Doing" → "To Do" DEVELOPER Blocked: "Doing" → "Refining"
QA Blocked: "Testing" → "To Test" TESTER Blocked: "Testing" → "Refining"
``` ```
Worker cannot complete (missing info, environment errors, etc.). Issue returns to queue for retry. The task is available for the next heartbeat pickup. Worker cannot complete (missing info, environment errors, etc.). Issue enters hold state for human decision. The human can move it back to "To Do" to retry or take other action.
### Completion enforcement ### Completion enforcement
@@ -465,18 +498,19 @@ Three layers guarantee that `work_finish` always runs:
1. **Completion contract** — Every task message sent to a worker session includes a mandatory `## MANDATORY: Task Completion` section listing available results and requiring `work_finish` even on failure. Workers are instructed to use `"blocked"` if stuck. 1. **Completion contract** — Every task message sent to a worker session includes a mandatory `## MANDATORY: Task Completion` section listing available results and requiring `work_finish` even on failure. Workers are instructed to use `"blocked"` if stuck.
2. **Blocked result**Both DEV and QA can use `"blocked"` to gracefully return a task to queue without losing work. DEV blocked: `Doing → To Do`. QA blocked: `Testing → To Test`. This gives workers an escape hatch instead of silently dying. 2. **Blocked result**All roles can use `"blocked"` to gracefully hand off to a human. Developer blocked: `Doing → Refining`. Tester blocked: `Testing → Refining`. This gives workers an escape hatch instead of silently dying.
3. **Stale worker watchdog** — The heartbeat's health check detects workers active for >2 hours. With `fix=true`, it deactivates the worker and reverts the label back to queue. This catches sessions that crashed, ran out of context, or otherwise failed without calling `work_finish`. The `health` tool provides the same check for manual invocation. 3. **Stale worker watchdog** — The heartbeat's health check detects workers active for >2 hours. With `fix=true`, it deactivates the worker and reverts the label back to queue. This catches sessions that crashed, ran out of context, or otherwise failed without calling `work_finish`. The `health` tool provides the same check for manual invocation.
### Phase 8: Heartbeat (continuous) ### Phase 8: Heartbeat (continuous)
The heartbeat runs periodically (via background service or manual `work_heartbeat` trigger). It combines health check + queue scan: The heartbeat runs periodically (via background service or manual `work_heartbeat` trigger). It combines health check + review polling + queue scan:
```mermaid ```mermaid
sequenceDiagram sequenceDiagram
participant HB as Heartbeat Service participant HB as Heartbeat Service
participant SH as health check participant SH as health check
participant RV as review pass
participant TK as projectTick participant TK as projectTick
participant WS as work_start (dispatch) participant WS as work_start (dispatch)
Note over HB: Tick triggered (every 60s) Note over HB: Tick triggered (every 60s)
@@ -485,6 +519,10 @@ sequenceDiagram
Note over SH: Checks for zombies, stale workers Note over SH: Checks for zombies, stale workers
SH-->>HB: { fixes applied } SH-->>HB: { fixes applied }
HB->>RV: reviewPass per project
Note over RV: Polls PR status for "In Review" issues
RV-->>HB: { transitions made }
HB->>TK: projectTick per project HB->>TK: projectTick per project
Note over TK: Scans queue: To Improve > To Test > To Do Note over TK: Scans queue: To Improve > To Test > To Do
TK->>WS: dispatchTask (fill free slots) TK->>WS: dispatchTask (fill free slots)
@@ -492,6 +530,31 @@ sequenceDiagram
TK-->>HB: { pickups, skipped } TK-->>HB: { pickups, skipped }
``` ```
## Worker instructions (bootstrap hook)
Role-specific instructions (coding standards, deployment steps, completion rules) are injected into worker sessions via the `agent:bootstrap` hook — not appended to the task message.
```mermaid
sequenceDiagram
participant GW as Gateway
participant BH as Bootstrap Hook
participant FS as Filesystem
Note over GW: Worker session starts
GW->>BH: agent:bootstrap event (sessionKey, bootstrapFiles[])
BH->>BH: Parse session key → { projectName, role }
BH->>FS: Load role instructions (project-specific → default)
FS-->>BH: content + source path
BH->>BH: Push WORKER_INSTRUCTIONS.md into bootstrapFiles
BH-->>GW: bootstrapFiles now includes role instructions
```
**Resolution order:**
1. `devclaw/projects/<project>/prompts/<role>.md` (project-specific)
2. `devclaw/prompts/<role>.md` (workspace default)
The source path is logged for production traceability: `Bootstrap hook: injected developer instructions for project "my-app" from /path/to/prompts/developer.md`.
## Data flow map ## Data flow map
Every piece of data and where it lives: Every piece of data and where it lives:
@@ -503,15 +566,16 @@ Every piece of data and where it lives:
│ Issue #42: "Add login page" │ │ Issue #42: "Add login page" │
│ Labels: [Planning | To Do | Doing | To Test | Testing | ...] │ │ Labels: [Planning | To Do | Doing | To Test | Testing | ...] │
│ State: open / closed │ │ State: open / closed │
MRs/PRs: linked merge/pull requests │ PRs: linked pull/merge requests (status polled for In Review)
│ Created by: orchestrator (task_create), workers, or humans │ │ Created by: orchestrator (task_create), workers, or humans │
└─────────────────────────────────────────────────────────────────┘ └─────────────────────────────────────────────────────────────────┘
↕ gh/glab CLI (read/write, auto-detected) ↕ gh/glab CLI (read/write, auto-detected)
↕ cockatiel resilience: retry + circuit breaker
┌─────────────────────────────────────────────────────────────────┐ ┌─────────────────────────────────────────────────────────────────┐
│ DevClaw Plugin (orchestration logic) │ │ DevClaw Plugin (orchestration logic) │
│ │ │ │
│ setup → agent creation + workspace + model config │ │ setup → agent creation + workspace + model config │
│ work_start → level + label + dispatch + role instr (e2e) │ work_start → level + label + dispatch (e2e)
│ work_finish → label + state + git pull + tick queue │ │ work_finish → label + state + git pull + tick queue │
│ task_create → create issue in tracker │ │ task_create → create issue in tracker │
│ task_update → manual label state change │ │ task_update → manual label state change │
@@ -519,27 +583,38 @@ Every piece of data and where it lives:
│ status → read labels + read state │ │ status → read labels + read state │
│ health → check sessions + fix zombies │ │ health → check sessions + fix zombies │
│ project_register → labels + prompts + state init (one-time) │ │ project_register → labels + prompts + state init (one-time) │
│ design_task → architect dispatch │
│ │
│ Bootstrap hook → injects role instructions into worker sessions│
│ Review pass → polls PR status, auto-advances In Review │
│ Config loader → three-layer merge + Zod validation │
└─────────────────────────────────────────────────────────────────┘ └─────────────────────────────────────────────────────────────────┘
↕ atomic file I/O ↕ OpenClaw CLI (plugin shells out) ↕ atomic file I/O ↕ OpenClaw CLI (plugin shells out)
┌────────────────────────────────┐ ┌──────────────────────────────┐ ┌────────────────────────────────┐ ┌──────────────────────────────┐
projects/projects.json │ │ OpenClaw Gateway + CLI │ devclaw/projects.json │ │ OpenClaw Gateway + CLI │
│ │ │ (called by plugin, not agent)│ │ │ │ (called by plugin, not agent)│
│ Per project: │ │ │ │ Per project: │ │ │
dev: │ │ openclaw gateway call │ workers: │ │ openclaw gateway call │
active, issueId, level │ │ sessions.patch → create │ developer: │ │ sessions.patch → create │
sessions: │ │ sessions.list → health │ active, issueId, level │ │ sessions.list → health │
junior: <key> │ │ sessions.delete → cleanup │ sessions: │ │ sessions.delete → cleanup │
medior: <key> │ │ │ junior: <key> │ │ │
senior: <key> │ │ openclaw gateway call agent │ medior: <key> │ │ openclaw gateway call agent │
qa: │ │ --params { sessionKey, │ senior: <key> │ │ --params { sessionKey, │
active, issueId, level │ │ message, agentId } │ tester: │ │ message, agentId } │
sessions: │ │ → dispatches to session │ active, issueId, level │ │ → dispatches to session │
reviewer: <key> │ │ │ sessions: │ │ │
tester: <key> │ │ │ junior: <key> │ │ │
│ medior: <key> │ │ │
│ senior: <key> │ │ │
│ architect: │ │ │
│ sessions: │ │ │
│ junior: <key> │ │ │
│ senior: <key> │ │ │
└────────────────────────────────┘ └──────────────────────────────┘ └────────────────────────────────┘ └──────────────────────────────┘
↕ append-only ↕ append-only
┌─────────────────────────────────────────────────────────────────┐ ┌─────────────────────────────────────────────────────────────────┐
│ log/audit.log (observability) devclaw/log/audit.log (observability) │
│ │ │ │
│ NDJSON, one line per event: │ │ NDJSON, one line per event: │
│ work_start, work_finish, model_selection, │ │ work_start, work_finish, model_selection, │
@@ -553,21 +628,23 @@ Every piece of data and where it lives:
│ Telegram / WhatsApp (user-facing messages) │ │ Telegram / WhatsApp (user-facing messages) │
│ │ │ │
│ Per group chat: │ │ Per group chat: │
│ "🔧 Spawning DEV (medior) for #42: Add login page" │ "🔧 Spawning DEVELOPER (medior) for #42: Add login page" │
│ "⚡ Sending DEV (medior) for #57: Fix validation" │ "⚡ Sending DEVELOPER (medior) for #57: Fix validation" │
│ "✅ DEV DONE #42 — Login page with OAuth." │ "✅ DEVELOPER DONE #42 — Login page with OAuth." │
│ "🎉 QA PASS #42. Issue closed." │ "👀 DEVELOPER REVIEW #42 — PR open for review."
│ "❌ QA FAIL #42 — OAuth redirect broken." │ "🎉 TESTER PASS #42. Issue closed."
│ "🚫 DEV BLOCKED #42 — Missing dependencies." │ │ "❌ TESTER FAIL #42 — OAuth redirect broken." │
│ "🚫 QA BLOCKED #42 — Env not available." │ "🚫 DEVELOPER BLOCKED #42 — Missing dependencies."
│ "🚫 TESTER BLOCKED #42 — Env not available." │
└─────────────────────────────────────────────────────────────────┘ └─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐ ┌─────────────────────────────────────────────────────────────────┐
│ Git Repository (codebase) │ │ Git Repository (codebase) │
│ │ │ │
│ DEV sub-agent sessions: read code, write code, create MRs │ DEVELOPER sub-agent sessions: read code, write code, create PRs│
QA sub-agent sessions: read code, run tests, review MRs TESTER sub-agent sessions: read code, run tests, review PRs │
work_finish (DEV done): git pull to sync latest ARCHITECT sub-agent sessions: research, design, recommend
│ work_finish (developer done): git pull to sync latest │
└─────────────────────────────────────────────────────────────────┘ └─────────────────────────────────────────────────────────────────┘
``` ```
@@ -584,9 +661,12 @@ graph LR
SETUP[Agent + workspace setup] SETUP[Agent + workspace setup]
SD[Session dispatch<br/>create + send via CLI] SD[Session dispatch<br/>create + send via CLI]
AC[Scheduling<br/>tick queue after work_finish] AC[Scheduling<br/>tick queue after work_finish]
RI[Role instructions<br/>loaded per project] RI[Role instructions<br/>injected via bootstrap hook]
RV[Review polling<br/>PR status → auto-advance]
A[Audit logging] A[Audit logging]
Z[Zombie cleanup] Z[Zombie cleanup]
CFG[Config validation<br/>Zod + integrity checks]
RES[Provider resilience<br/>retry + circuit breaker]
end end
subgraph "Orchestrator handles (planning only)" subgraph "Orchestrator handles (planning only)"
@@ -600,7 +680,7 @@ graph LR
subgraph "Sub-agent sessions handle" subgraph "Sub-agent sessions handle"
CR[Code writing] CR[Code writing]
MR[MR creation/review] MR[PR creation/review]
WF_W[Task completion<br/>via work_finish] WF_W[Task completion<br/>via work_finish]
BUG[Bug filing<br/>via task_create] BUG[Bug filing<br/>via task_create]
end end
@@ -611,7 +691,7 @@ graph LR
end end
``` ```
**Key boundary:** The orchestrator is a planner and dispatcher — it never writes code. All implementation work (code edits, git operations, tests) must go through sub-agent sessions via the `task_create``work_start` pipeline. This ensures audit trails, tier selection, and QA review for every code change. **Key boundary:** The orchestrator is a planner and dispatcher — it never writes code. All implementation work (code edits, git operations, tests) must go through sub-agent sessions via the `task_create``work_start` pipeline. This ensures audit trails, level selection, and testing for every code change.
## IssueProvider abstraction ## IssueProvider abstraction
@@ -624,10 +704,13 @@ All issue tracker operations go through the `IssueProvider` interface, defined i
- `transitionLabel` — atomic label state transition (unlabel + label) - `transitionLabel` — atomic label state transition (unlabel + label)
- `closeIssue` / `reopenIssue` — issue lifecycle - `closeIssue` / `reopenIssue` — issue lifecycle
- `hasStateLabel` / `getCurrentStateLabel` — label inspection - `hasStateLabel` / `getCurrentStateLabel` — label inspection
- `getPrStatus` — get PR/MR state (open, merged, approved, none)
- `hasMergedMR` / `getMergedMRUrl` — MR/PR verification - `hasMergedMR` / `getMergedMRUrl` — MR/PR verification
- `addComment` — add comment to issue - `addComment` — add comment to issue
- `healthCheck` — verify provider connectivity - `healthCheck` — verify provider connectivity
**Provider resilience:** All provider calls are wrapped with cockatiel retry (3 attempts, exponential backoff) + circuit breaker (opens after 5 consecutive failures, half-opens after 30s). See `lib/providers/resilience.ts`.
**Current providers:** **Current providers:**
- **GitHub** (`lib/providers/github.ts`) — wraps `gh` CLI - **GitHub** (`lib/providers/github.ts`) — wraps `gh` CLI
- **GitLab** (`lib/providers/gitlab.ts`) — wraps `glab` CLI - **GitLab** (`lib/providers/gitlab.ts`) — wraps `glab` CLI
@@ -637,19 +720,34 @@ All issue tracker operations go through the `IssueProvider` interface, defined i
Provider selection is handled by `createProvider()` in `lib/providers/index.ts`. Auto-detects GitHub vs GitLab from the git remote URL. Provider selection is handled by `createProvider()` in `lib/providers/index.ts`. Auto-detects GitHub vs GitLab from the git remote URL.
## Configuration system
DevClaw uses a three-layer config system with `workflow.yaml` files:
```
Layer 1: Built-in defaults (ROLE_REGISTRY + DEFAULT_WORKFLOW)
Layer 2: Workspace: <workspace>/devclaw/workflow.yaml
Layer 3: Project: <workspace>/devclaw/projects/<project>/workflow.yaml
```
Each layer can override roles (levels, models, emoji), workflow states/transitions, and timeouts. Config is validated with Zod schemas at load time, with cross-reference integrity checks (transition targets exist, queue states have roles, terminal states have no outgoing transitions).
See [CONFIGURATION.md](CONFIGURATION.md) for the full reference.
## Error recovery ## Error recovery
| Failure | Detection | Recovery | | Failure | Detection | Recovery |
|---|---|---| |---|---|---|
| Session dies mid-task | `health` checks via `sessions.list` Gateway RPC | `fix=true`: reverts label, clears active state. Next heartbeat picks up task again (creates fresh session for that level). | | Session dies mid-task | `health` checks via `sessions.list` Gateway RPC | `fix=true`: reverts label, clears active state. Next heartbeat picks up task again (creates fresh session for that level). |
| gh/glab command fails | Plugin tool throws error, returns to agent | Agent retries or reports to Telegram group | | gh/glab command fails | Cockatiel retry (3 attempts), then circuit breaker | Circuit opens after 5 consecutive failures, prevents hammering. Plugin catches and returns error. |
| `openclaw gateway call agent` fails | Plugin catches error during dispatch | Plugin rolls back: reverts label, clears active state. Returns error. No orphaned state. | | `openclaw gateway call agent` fails | Plugin catches error during dispatch | Plugin rolls back: reverts label, clears active state. Returns error. No orphaned state. |
| `sessions.patch` fails | Plugin catches error during session creation | Plugin rolls back label transition. Returns error. | | `sessions.patch` fails | Plugin catches error during session creation | Plugin rolls back label transition. Returns error. |
| projects.json corrupted | Tool can't parse JSON | Manual fix needed. Atomic writes (temp+rename) prevent partial writes. | | projects.json corrupted | Tool can't parse JSON | Manual fix needed. Atomic writes (temp+rename) prevent partial writes. File locking prevents concurrent races. |
| Label out of sync | `work_start` verifies label before transitioning | Throws error if label doesn't match expected state. | | Label out of sync | `work_start` verifies label before transitioning | Throws error if label doesn't match expected state. |
| Worker already active | `work_start` checks `active` flag | Throws error: "DEV already active on project". Must complete current task first. | | Worker already active | `work_start` checks `active` flag | Throws error: "DEVELOPER already active on project". Must complete current task first. |
| Stale worker (>2h) | `health` and heartbeat health check | `fix=true`: deactivates worker, reverts label to queue. Task available for next pickup. | | Stale worker (>2h) | `health` and heartbeat health check | `fix=true`: deactivates worker, reverts label to queue. Task available for next pickup. |
| Worker stuck/blocked | Worker calls `work_finish` with `"blocked"` | Deactivates worker, reverts label to queue. Issue available for retry. | | Worker stuck/blocked | Worker calls `work_finish` with `"blocked"` | Deactivates worker, transitions to "Refining" (hold state). Requires human decision to proceed. |
| Config invalid | Zod schema validation at load time | Clear error message with field path. Prevents startup with broken config. |
| `project_register` fails | Plugin catches error during label creation or state write | Clean error returned. Labels are idempotent, projects.json not written until all labels succeed. | | `project_register` fails | Plugin catches error during label creation or state write | Clean error returned. Labels are idempotent, projects.json not written until all labels succeed. |
## File locations ## File locations
@@ -659,8 +757,11 @@ Provider selection is handled by `createProvider()` in `lib/providers/index.ts`.
| Plugin source | `~/.openclaw/extensions/devclaw/` | Plugin code | | Plugin source | `~/.openclaw/extensions/devclaw/` | Plugin code |
| Plugin manifest | `~/.openclaw/extensions/devclaw/openclaw.plugin.json` | Plugin registration | | Plugin manifest | `~/.openclaw/extensions/devclaw/openclaw.plugin.json` | Plugin registration |
| Agent config | `~/.openclaw/openclaw.json` | Agent definition + tool permissions + model config | | Agent config | `~/.openclaw/openclaw.json` | Agent definition + tool permissions + model config |
| Worker state | `~/.openclaw/workspace-<agent>/projects/projects.json` | Per-project DEV/QA state | | Worker state | `<workspace>/devclaw/projects.json` | Per-project worker state |
| Role instructions | `~/.openclaw/workspace-<agent>/projects/roles/<project>/` | Per-project `dev.md` and `qa.md` | | Workflow config (workspace) | `<workspace>/devclaw/workflow.yaml` | Workspace-level role/workflow overrides |
| Audit log | `~/.openclaw/workspace-<agent>/log/audit.log` | NDJSON event log | | Workflow config (project) | `<workspace>/devclaw/projects/<project>/workflow.yaml` | Project-specific overrides |
| Default role instructions | `<workspace>/devclaw/prompts/<role>.md` | Default `developer.md`, `tester.md`, `architect.md` |
| Project role instructions | `<workspace>/devclaw/projects/<project>/prompts/<role>.md` | Per-project role instruction overrides |
| Audit log | `<workspace>/devclaw/log/audit.log` | NDJSON event log |
| Session transcripts | `~/.openclaw/agents/<agent>/sessions/<uuid>.jsonl` | Conversation history per session | | Session transcripts | `~/.openclaw/agents/<agent>/sessions/<uuid>.jsonl` | Conversation history per session |
| Git repos | `~/git/<project>/` | Project source code | | Git repos | `~/git/<project>/` | Project source code |

View File

@@ -1,54 +1,236 @@
# DevClaw — Configuration Reference # DevClaw — Configuration Reference
All DevClaw configuration lives in two places: `openclaw.json` (plugin-level settings) and `projects.json` (per-project state). DevClaw uses a three-layer configuration system. All role, workflow, and timeout settings live in `workflow.yaml` files — not in `openclaw.json`.
## Plugin Configuration (`openclaw.json`) ## Three-Layer Config Resolution
DevClaw is configured under `plugins.entries.devclaw.config` in `openclaw.json`. ```
Layer 1: Built-in defaults (ROLE_REGISTRY + DEFAULT_WORKFLOW)
### Model Tiers Layer 2: Workspace: <workspace>/devclaw/workflow.yaml
Layer 3: Project: <workspace>/devclaw/projects/<project>/workflow.yaml
Override which LLM model powers each developer level:
```json
{
"plugins": {
"entries": {
"devclaw": {
"config": {
"models": {
"dev": {
"junior": "anthropic/claude-haiku-4-5",
"medior": "anthropic/claude-sonnet-4-5",
"senior": "anthropic/claude-opus-4-5"
},
"qa": {
"reviewer": "anthropic/claude-sonnet-4-5",
"tester": "anthropic/claude-haiku-4-5"
}
}
}
}
}
}
}
``` ```
**Resolution order** (per `lib/tiers.ts:resolveModel`): Each layer can partially override the one below it. Only the fields you specify are merged — everything else inherits from the layer below.
1. Plugin config `models.<role>.<level>` — explicit override **Source:** [`lib/config/loader.ts`](../lib/config/loader.ts)
2. `DEFAULT_MODELS[role][level]` — built-in defaults (table below)
3. Passthrough — treat the level string as a raw model ID **Validation:** Config is validated at load time with Zod schemas ([`lib/config/schema.ts`](../lib/config/schema.ts)). Integrity checks verify transition targets exist, queue states have roles, and terminal states have no outgoing transitions.
---
## Workflow Config (`workflow.yaml`)
The `workflow.yaml` file configures roles, workflow states, and timeouts. Place it at `<workspace>/devclaw/workflow.yaml` for workspace-wide settings, or at `<workspace>/devclaw/projects/<project>/workflow.yaml` for project-specific overrides.
### Role Configuration
Override which LLM model powers each level, customize levels, or disable roles entirely:
```yaml
roles:
developer:
models:
junior: anthropic/claude-haiku-4-5
medior: anthropic/claude-sonnet-4-5
senior: anthropic/claude-opus-4-6
tester:
models:
junior: anthropic/claude-haiku-4-5
medior: anthropic/claude-sonnet-4-5
senior: anthropic/claude-opus-4-6
architect:
models:
junior: anthropic/claude-sonnet-4-5
senior: anthropic/claude-opus-4-6
# Disable a role entirely:
# architect: false
```
**Role override fields** (all optional — only override what you need):
| Field | Type | Description |
|---|---|---|
| `levels` | string[] | Available levels for this role |
| `defaultLevel` | string | Default level when not specified |
| `models` | Record<string, string> | Model ID per level |
| `emoji` | Record<string, string> | Emoji per level for announcements |
| `completionResults` | string[] | Valid completion results |
**Default models:** **Default models:**
| Role | Level | Default model | | Role | Level | Default Model |
|---|---|---| |---|---|---|
| dev | junior | `anthropic/claude-haiku-4-5` | | developer | junior | `anthropic/claude-haiku-4-5` |
| dev | medior | `anthropic/claude-sonnet-4-5` | | developer | medior | `anthropic/claude-sonnet-4-5` |
| dev | senior | `anthropic/claude-opus-4-5` | | developer | senior | `anthropic/claude-opus-4-6` |
| qa | reviewer | `anthropic/claude-sonnet-4-5` | | tester | junior | `anthropic/claude-haiku-4-5` |
| qa | tester | `anthropic/claude-haiku-4-5` | | tester | medior | `anthropic/claude-sonnet-4-5` |
| tester | senior | `anthropic/claude-opus-4-6` |
| architect | junior | `anthropic/claude-sonnet-4-5` |
| architect | senior | `anthropic/claude-opus-4-6` |
**Source:** [`lib/roles/registry.ts`](../lib/roles/registry.ts)
**Model resolution order:**
1. Project `workflow.yaml``roles.<role>.models.<level>`
2. Workspace `workflow.yaml``roles.<role>.models.<level>`
3. Built-in defaults from `ROLE_REGISTRY`
4. Passthrough — treat the level string as a raw model ID
### Workflow States
The workflow section defines the state machine for issue lifecycle. Each state has a type, label, color, and optional transitions:
```yaml
workflow:
initial: planning
states:
planning:
type: hold
label: Planning
color: "#95a5a6"
on:
APPROVE: todo
todo:
type: queue
role: developer
label: To Do
color: "#428bca"
priority: 1
on:
PICKUP: doing
doing:
type: active
role: developer
label: Doing
color: "#f0ad4e"
on:
COMPLETE:
target: toTest
actions: [gitPull, detectPr]
REVIEW:
target: reviewing
actions: [detectPr]
BLOCKED: refining
toTest:
type: queue
role: tester
label: To Test
color: "#5bc0de"
priority: 2
on:
PICKUP: testing
testing:
type: active
role: tester
label: Testing
color: "#9b59b6"
on:
PASS:
target: done
actions: [closeIssue]
FAIL:
target: toImprove
actions: [reopenIssue]
REFINE: refining
BLOCKED: refining
toImprove:
type: queue
role: developer
label: To Improve
color: "#d9534f"
priority: 3
on:
PICKUP: doing
refining:
type: hold
label: Refining
color: "#f39c12"
on:
APPROVE: todo
reviewing:
type: review
label: In Review
color: "#c5def5"
check: prMerged
on:
APPROVED:
target: toTest
actions: [gitPull]
BLOCKED: refining
done:
type: terminal
label: Done
color: "#5cb85c"
toDesign:
type: queue
role: architect
label: To Design
color: "#0075ca"
priority: 1
on:
PICKUP: designing
designing:
type: active
role: architect
label: Designing
color: "#d4c5f9"
on:
COMPLETE: planning
BLOCKED: refining
```
**State types:**
| Type | Description |
|---|---|
| `queue` | Waiting for pickup. Must have a `role`. Has `priority` for ordering. |
| `active` | Worker is currently working on it. Must have a `role`. |
| `hold` | Paused, awaiting human decision. |
| `review` | Awaiting external check (PR merged/approved). Has `check` field. |
| `terminal` | Completed. No outgoing transitions. |
**Built-in actions:**
| Action | Description |
|---|---|
| `gitPull` | Pull latest from the base branch |
| `detectPr` | Auto-detect PR URL from the issue |
| `closeIssue` | Close the issue |
| `reopenIssue` | Reopen the issue |
**Review checks:**
| Check | Description |
|---|---|
| `prMerged` | Transition when the issue's PR is merged |
| `prApproved` | Transition when the issue's PR is approved or merged |
### Timeouts
```yaml
timeouts:
gitPullMs: 30000
gatewayMs: 120000
sessionPatchMs: 120000
dispatchMs: 120000
staleWorkerHours: 2
```
| Setting | Default | Description |
|---|---|---|
| `gitPullMs` | 30000 | Timeout for git pull operations |
| `gatewayMs` | 120000 | Timeout for gateway RPC calls |
| `sessionPatchMs` | 120000 | Timeout for session creation |
| `dispatchMs` | 120000 | Timeout for task dispatch |
| `staleWorkerHours` | 2 | Hours before a worker is considered stale |
---
## Plugin Configuration (`openclaw.json`)
Some settings still live in `openclaw.json` under `plugins.entries.devclaw.config`:
### Project Execution Mode ### Project Execution Mode
@@ -73,8 +255,6 @@ Controls cross-project parallelism:
| `"parallel"` (default) | Multiple projects can have active workers simultaneously | | `"parallel"` (default) | Multiple projects can have active workers simultaneously |
| `"sequential"` | Only one project's workers active at a time. Useful for single-agent deployments. | | `"sequential"` | Only one project's workers active at a time. Useful for single-agent deployments. |
Enforced in `work_heartbeat` and the heartbeat service before dispatching.
### Heartbeat Service ### Heartbeat Service
Token-free interval-based health checks + queue dispatch: Token-free interval-based health checks + queue dispatch:
@@ -105,7 +285,7 @@ Token-free interval-based health checks + queue dispatch:
**Source:** [`lib/services/heartbeat.ts`](../lib/services/heartbeat.ts) **Source:** [`lib/services/heartbeat.ts`](../lib/services/heartbeat.ts)
The heartbeat service runs as a plugin service tied to the gateway lifecycle. Every tick: health pass (auto-fix zombies, stale workers) → tick pass (fill free slots by priority). Zero LLM tokens consumed. The heartbeat service runs as a plugin service tied to the gateway lifecycle. Every tick: health pass (auto-fix zombies, stale workers) → review pass (poll PR status for "In Review" issues) → tick pass (fill free slots by priority). Zero LLM tokens consumed.
### Notifications ### Notifications
@@ -157,7 +337,8 @@ Restrict DevClaw tools to your orchestrator agent:
"work_heartbeat", "work_heartbeat",
"project_register", "project_register",
"setup", "setup",
"onboard" "onboard",
"design_task"
] ]
} }
} }
@@ -170,7 +351,7 @@ Restrict DevClaw tools to your orchestrator agent:
## Project State (`projects.json`) ## Project State (`projects.json`)
All project state lives in `<workspace>/projects/projects.json`, keyed by group ID. All project state lives in `<workspace>/devclaw/projects.json`, keyed by group ID.
**Source:** [`lib/projects.ts`](../lib/projects.ts) **Source:** [`lib/projects.ts`](../lib/projects.ts)
@@ -187,26 +368,40 @@ All project state lives in `<workspace>/projects/projects.json`, keyed by group
"deployBranch": "development", "deployBranch": "development",
"deployUrl": "https://my-webapp.example.com", "deployUrl": "https://my-webapp.example.com",
"channel": "telegram", "channel": "telegram",
"provider": "github",
"roleExecution": "parallel", "roleExecution": "parallel",
"dev": { "workers": {
"active": false, "developer": {
"issueId": null, "active": false,
"startTime": null, "issueId": null,
"level": null, "startTime": null,
"sessions": { "level": null,
"junior": null, "sessions": {
"medior": "agent:orchestrator:subagent:my-webapp-dev-medior", "junior": null,
"senior": null "medior": "agent:orchestrator:subagent:my-webapp-developer-medior",
} "senior": null
}, }
"qa": { },
"active": false, "tester": {
"issueId": null, "active": false,
"startTime": null, "issueId": null,
"level": null, "startTime": null,
"sessions": { "level": null,
"reviewer": "agent:orchestrator:subagent:my-webapp-qa-reviewer", "sessions": {
"tester": null "junior": null,
"medior": "agent:orchestrator:subagent:my-webapp-tester-medior",
"senior": null
}
},
"architect": {
"active": false,
"issueId": null,
"startTime": null,
"level": null,
"sessions": {
"junior": null,
"senior": null
}
} }
} }
} }
@@ -225,29 +420,28 @@ All project state lives in `<workspace>/projects/projects.json`, keyed by group
| `deployBranch` | string | Branch that triggers deployment | | `deployBranch` | string | Branch that triggers deployment |
| `deployUrl` | string | Deployment URL | | `deployUrl` | string | Deployment URL |
| `channel` | string | Messaging channel (`"telegram"`, `"whatsapp"`, etc.) | | `channel` | string | Messaging channel (`"telegram"`, `"whatsapp"`, etc.) |
| `roleExecution` | `"parallel"` \| `"sequential"` | DEV/QA parallelism for this project | | `provider` | `"github"` \| `"gitlab"` | Issue tracker provider (auto-detected, stored for reuse) |
| `roleExecution` | `"parallel"` \| `"sequential"` | DEVELOPER/TESTER parallelism for this project |
### Worker state fields ### Worker state fields
Each project has `dev` and `qa` worker state objects: Each role in the `workers` record has a `WorkerState` object:
| Field | Type | Description | | Field | Type | Description |
|---|---|---| |---|---|---|
| `active` | boolean | Whether this role has an active worker | | `active` | boolean | Whether this role has an active worker |
| `issueId` | string \| null | Issue being worked on (as string) | | `issueId` | string \| null | Issue being worked on (as string) |
| `startTime` | string \| null | ISO timestamp when worker became active | | `startTime` | string \| null | ISO timestamp when worker became active |
| `level` | string \| null | Current level (`junior`, `medior`, `senior`, `reviewer`, `tester`) | | `level` | string \| null | Current level (`junior`, `medior`, `senior`) |
| `sessions` | Record<string, string \| null> | Per-level session keys | | `sessions` | Record<string, string \| null> | Per-level session keys |
**DEV session keys:** `junior`, `medior`, `senior`
**QA session keys:** `reviewer`, `tester`
### Key design decisions ### Key design decisions
- **Session-per-level** — each level gets its own worker session, accumulating context independently. Level selection maps directly to a session key. - **Session-per-level** — each level gets its own worker session, accumulating context independently. Level selection maps directly to a session key.
- **Sessions preserved on completion** — when a worker completes a task, the sessions map is preserved (only `active`, `issueId`, and `startTime` are cleared). This enables session reuse. - **Sessions preserved on completion** — when a worker completes a task, the sessions map is preserved (only `active`, `issueId`, and `startTime` are cleared). This enables session reuse.
- **Atomic writes** — all writes go through temp-file-then-rename to prevent corruption. - **Atomic writes** — all writes go through temp-file-then-rename to prevent corruption. File locking prevents concurrent read-modify-write races.
- **Sessions persist indefinitely** — no auto-cleanup. The `health` tool handles manual cleanup. - **Sessions persist indefinitely** — no auto-cleanup. The `health` tool handles manual cleanup.
- **Dynamic workers** — the `workers` record is keyed by role ID (e.g., `developer`, `tester`, `architect`). New roles are created automatically when dispatched.
--- ---
@@ -255,37 +449,43 @@ Each project has `dev` and `qa` worker state objects:
``` ```
<workspace>/ <workspace>/
├── projects/ ├── devclaw/
│ ├── projects.json ← Project state (auto-managed) │ ├── projects.json ← Project state (auto-managed)
── roles/ ── workflow.yaml ← Workspace-level config overrides
├── my-webapp/ ← Per-project role instructions (editable) ├── prompts/
│ ├── dev.md │ │ ├── developer.md ← Default developer instructions
── qa.md │ │ ── tester.md ← Default tester instructions
── another-project/ ── architect.md ← Default architect instructions
│ ├── dev.md ├── projects/
── qa.md │ │ ── my-webapp/
└── default/ ← Fallback role instructions │ ├── workflow.yaml ← Project-specific config overrides
├── dev.md └── prompts/
── qa.md │ │ ── developer.md ← Project-specific developer instructions
├── log/ │ │ │ ├── tester.md ← Project-specific tester instructions
└── audit.log ← NDJSON event log (auto-managed) └── architect.md ← Project-specific architect instructions
├── AGENTS.md ← Agent identity documentation └── another-project/
└── HEARTBEAT.md ← Heartbeat operation guide │ │ └── prompts/
│ │ ├── developer.md
│ │ └── tester.md
│ └── log/
│ └── audit.log ← NDJSON event log (auto-managed)
├── AGENTS.md ← Agent identity documentation
└── HEARTBEAT.md ← Heartbeat operation guide
``` ```
### Role instruction files ### Role instruction files
`work_start` loads role instructions from `projects/roles/<project>/<role>.md` at dispatch time, falling back to `projects/roles/default/<role>.md`. These files are appended to the task message sent to worker sessions. Role instructions are injected into worker sessions via the `agent:bootstrap` hook at session startup. The hook loads instructions from `devclaw/projects/<project>/prompts/<role>.md`, falling back to `devclaw/prompts/<role>.md`.
Edit to customize: deployment steps, test commands, acceptance criteria, coding standards. Edit to customize: deployment steps, test commands, acceptance criteria, coding standards.
**Source:** [`lib/dispatch.ts:loadRoleInstructions`](../lib/dispatch.ts) **Source:** [`lib/bootstrap-hook.ts`](../lib/bootstrap-hook.ts)
--- ---
## Audit Log ## Audit Log
Append-only NDJSON at `<workspace>/log/audit.log`. Auto-truncated to 250 lines. Append-only NDJSON at `<workspace>/devclaw/log/audit.log`. Auto-truncated to 250 lines.
**Source:** [`lib/audit.ts`](../lib/audit.ts) **Source:** [`lib/audit.ts`](../lib/audit.ts)
@@ -331,6 +531,8 @@ DevClaw uses an `IssueProvider` interface (`lib/providers/provider.ts`) to abstr
| GitHub | `gh` | Remote contains `github.com` | | GitHub | `gh` | Remote contains `github.com` |
| GitLab | `glab` | Remote contains `gitlab` | | GitLab | `glab` | Remote contains `gitlab` |
**Provider resilience:** All calls are wrapped with cockatiel retry (3 attempts, exponential backoff) + circuit breaker (opens after 5 consecutive failures, half-opens after 30s). See [`lib/providers/resilience.ts`](../lib/providers/resilience.ts).
**Planned:** Jira (via REST API) **Planned:** Jira (via REST API)
**Source:** [`lib/providers/index.ts`](../lib/providers/index.ts) **Source:** [`lib/providers/index.ts`](../lib/providers/index.ts)

View File

@@ -19,7 +19,8 @@ DevClaw's level selection does exactly this. When a task comes in, the plugin ro
| Simple (typos, renames, copy) | Junior | The intern — just execute | | Simple (typos, renames, copy) | Junior | The intern — just execute |
| Standard (features, bug fixes) | Medior | Mid-level — think and build | | Standard (features, bug fixes) | Medior | Mid-level — think and build |
| Complex (architecture, security) | Senior | The architect — design and reason | | Complex (architecture, security) | Senior | The architect — design and reason |
| Review | Reviewer | Independent code reviewer |
All three roles — DEVELOPER, TESTER, and ARCHITECT — use the same junior/medior/senior scheme (architect uses junior/senior). The orchestrator picks the level per task, and the plugin resolves it to the appropriate model via the role registry and workflow config.
This isn't just cost optimization. It mirrors what effective managers do instinctively: match the delegation level to the task, not to a fixed assumption about the delegate. This isn't just cost optimization. It mirrors what effective managers do instinctively: match the delegation level to the task, not to a fixed assumption about the delegate.
@@ -27,14 +28,15 @@ This isn't just cost optimization. It mirrors what effective managers do instinc
Classical management theory — later formalized by Bernard Bass in his work on Transformational Leadership — introduced a concept called Management by Exception (MBE). The principle: a manager should only be pulled back into a workstream when something deviates from the expected path. Classical management theory — later formalized by Bernard Bass in his work on Transformational Leadership — introduced a concept called Management by Exception (MBE). The principle: a manager should only be pulled back into a workstream when something deviates from the expected path.
DevClaw's task lifecycle is built on this. The orchestrator delegates a task via `work_start`, then steps away. It only re-engages in three scenarios: DevClaw's task lifecycle is built on this. The orchestrator delegates a task via `work_start`, then steps away. It only re-engages in specific scenarios:
1. **DEV completes work** → The label moves to `To Test`. The scheduler dispatches QA on the next tick. No orchestrator involvement needed. 1. **DEVELOPER completes work** → The label moves to `To Test`. The scheduler dispatches TESTER on the next tick. No orchestrator involvement needed.
2. **QA passes** → The issue closes. Pipeline complete. 2. **DEVELOPER requests review** → The label moves to `In Review`. The heartbeat polls PR status. When merged, the scheduler dispatches TESTER. No orchestrator involvement needed.
3. **QA fails** → The label moves to `To Improve`. The scheduler dispatches DEV on the next tick. The orchestrator may need to adjust the model level. 3. **TESTER passes** → The issue closes. Pipeline complete.
4. **QA refines** → The task enters a holding state that _requires human decision_. This is the explicit escalation boundary. 4. **TESTER fails** → The label moves to `To Improve`. The scheduler dispatches DEVELOPER on the next tick. The orchestrator may need to adjust the level.
5. **Any role is blocked** → The task enters `Refining` — a holding state that _requires human decision_. This is the explicit escalation boundary.
The "refine" state is the most interesting from a delegation perspective. It's a conscious architectural decision that says: some judgments should not be automated. When the QA agent determines that a task needs rethinking rather than just fixing, it escalates to the only actor who has the full business context — the human. The "Refining" state is the most interesting from a delegation perspective. It's a conscious architectural decision that says: some judgments should not be automated. When a TESTER determines that a task needs rethinking rather than just fixing, or when a DEVELOPER hits an obstacle that requires business context, it escalates to the only actor who has the full picture — the human.
This is textbook MBE. The person behind the keyboard isn't monitoring every task. They're only pulled in when the system encounters something beyond its delegation authority. This is textbook MBE. The person behind the keyboard isn't monitoring every task. They're only pulled in when the system encounters something beyond its delegation authority.
@@ -42,14 +44,17 @@ This is textbook MBE. The person behind the keyboard isn't monitoring every task
Henry Mintzberg's work on organizational structure identified five coordination mechanisms. The one most relevant to DevClaw is **standardization of work processes** — when coordination happens not through direct supervision but through predetermined procedures that everyone follows. Henry Mintzberg's work on organizational structure identified five coordination mechanisms. The one most relevant to DevClaw is **standardization of work processes** — when coordination happens not through direct supervision but through predetermined procedures that everyone follows.
DevClaw enforces a single, fixed lifecycle for every task across every project: DevClaw enforces a configurable but consistent lifecycle for every task. The default workflow:
``` ```
Planning → To Do → Doing → To Test → Testing → Done Planning → To Do → Doing → To Test → Testing → Done
↘ In Review → (PR merged) → To Test
↘ To Improve → Doing (fix cycle) ↘ To Improve → Doing (fix cycle)
↘ Refining → (human decision) ↘ Refining → (human decision)
``` ```
The ARCHITECT role adds a parallel track: `To Design → Designing → Planning`.
Every label transition, state update, and audit log entry happens atomically inside the plugin. The orchestrator agent cannot skip a step, forget a label, or corrupt session state — because those operations are deterministic code, not instructions an LLM follows imperfectly. Every label transition, state update, and audit log entry happens atomically inside the plugin. The orchestrator agent cannot skip a step, forget a label, or corrupt session state — because those operations are deterministic code, not instructions an LLM follows imperfectly.
This is what allows a single orchestrator to manage multiple projects simultaneously. Management research has long debated the ideal span of control — typically cited as 5-9 direct reports for knowledge work. DevClaw sidesteps the constraint entirely by making every project follow identical processes. The orchestrator doesn't need to remember how Project A works versus Project B. They all work the same way. This is what allows a single orchestrator to manage multiple projects simultaneously. Management research has long debated the ideal span of control — typically cited as 5-9 direct reports for knowledge work. DevClaw sidesteps the constraint entirely by making every project follow identical processes. The orchestrator doesn't need to remember how Project A works versus Project B. They all work the same way.
@@ -60,9 +65,11 @@ One of the most common delegation failures is self-review. You don't ask the per
DevClaw enforces structural separation between development and review by design: DevClaw enforces structural separation between development and review by design:
- DEV and QA are separate sub-agent sessions with separate state. - DEVELOPER and TESTER are separate sub-agent sessions with separate state.
- QA uses the reviewer level, which can be a different model entirely, introducing genuine independence. - TESTER can use a different model entirely (e.g. senior for security reviews, junior for smoke tests), introducing genuine independence.
- The review happens after a clean label transition — QA picks up from `To Test`, not from watching DEV work in real time. - The review happens after a clean label transition — TESTER picks up from `To Test`, not from watching DEVELOPER work in real time.
For higher-stakes changes, the DEVELOPER can submit a PR for human review (`result: "review"`). The issue enters `In Review` and the heartbeat polls the PR until it's merged — only then does TESTER receive the work. This adds a human checkpoint without breaking the automated flow.
This mirrors a principle from organizational design: effective controls require independence between execution and verification. It's the same reason companies separate their audit function from their operations. This mirrors a principle from organizational design: effective controls require independence between execution and verification. It's the same reason companies separate their audit function from their operations.
@@ -72,7 +79,7 @@ Ronald Coase won a Nobel Prize for explaining why firms exist: transaction costs
DevClaw applies the same logic to AI sessions. Spawning a new sub-agent session costs approximately 50,000 tokens of context loading — the agent needs to read the full codebase before it can do useful work. That's the onboarding cost. DevClaw applies the same logic to AI sessions. Spawning a new sub-agent session costs approximately 50,000 tokens of context loading — the agent needs to read the full codebase before it can do useful work. That's the onboarding cost.
The plugin tracks session keys across task completions. When a DEV finishes task A and task B is ready on the same project, DevClaw detects the existing session and reuses it instead of spawning a new one. No re-onboarding. No context reload. The plugin tracks session keys across task completions. When a DEVELOPER finishes task A and task B is ready on the same project, DevClaw detects the existing session and reuses it instead of spawning a new one. No re-onboarding. No context reload. Each role maintains separate sessions per level, so a "medior developer" session accumulates project context independently from the "senior developer" session.
In management terms: keep your team stable. Reassigning the same person to the next task on their project is almost always cheaper than bringing in someone new — even if the new person is theoretically better qualified. In management terms: keep your team stable. Reassigning the same person to the next task on their project is almost always cheaper than bringing in someone new — even if the new person is theoretically better qualified.
@@ -85,15 +92,15 @@ The obvious saving is execution time: AI writes code faster than a human. But th
Without DevClaw, every task requires a human to make a series of small decisions: Without DevClaw, every task requires a human to make a series of small decisions:
- Which model should handle this? - Which model should handle this?
- Is the DEV session still alive, or do I need a new one? - Is the DEVELOPER session still alive, or do I need a new one?
- What label should this issue have now? - What label should this issue have now?
- Did I update the state file? - Did I update the state file?
- Did I log this transition? - Did I log this transition?
- Is the QA session free, or is it still working on something? - Is the TESTER session free, or is it still working on something?
None of these decisions are hard. But they accumulate. Each one consumes a small amount of the same cognitive resource you need for the decisions that actually matter — product direction, architecture choices, business priorities. None of these decisions are hard. But they accumulate. Each one consumes a small amount of the same cognitive resource you need for the decisions that actually matter — product direction, architecture choices, business priorities.
DevClaw eliminates entire categories of decisions by making them deterministic. The plugin picks the model. The plugin manages sessions. The plugin transitions labels. The plugin writes audit logs. The person behind the keyboard is left with only the decisions that require human judgment: what to build, what to prioritize, and what to do when QA says "this needs rethinking." DevClaw eliminates entire categories of decisions by making them deterministic. The plugin picks the model. The plugin manages sessions. The plugin transitions labels. The plugin writes audit logs. The person behind the keyboard is left with only the decisions that require human judgment: what to build, what to prioritize, and what to do when a worker says "this needs rethinking."
This is the deepest lesson from delegation theory: **good delegation isn't about getting someone else to do your work. It's about protecting your attention for the work only you can do.** This is the deepest lesson from delegation theory: **good delegation isn't about getting someone else to do your work. It's about protecting your attention for the work only you can do.**
@@ -101,11 +108,11 @@ This is the deepest lesson from delegation theory: **good delegation isn't about
Management research points to a few directions that could extend DevClaw's delegation model: Management research points to a few directions that could extend DevClaw's delegation model:
**Progressive delegation.** Blanchard's model suggests increasing task complexity for delegates as they prove competent. DevClaw could track QA pass rates per model level and automatically promote — if junior consistently passes QA on borderline tasks, start routing more work to it. This is how good managers develop their people, and it reduces cost over time. **Progressive delegation.** Blanchard's model suggests increasing task complexity for delegates as they prove competent. DevClaw could track TESTER pass rates per model level and automatically promote — if junior consistently passes TESTER on borderline tasks, start routing more work to it. This is how good managers develop their people, and it reduces cost over time.
**Delegation authority expansion.** The Vroom-Yetton decision model maps when a leader should decide alone versus consulting the team. Currently, sub-agents have narrow authority — they execute tasks but can't restructure the backlog. Selectively expanding this (e.g., allowing a DEV agent to split a task it judges too large) would reduce orchestrator bottlenecks, mirroring how managers gradually give high-performers more autonomy. **Delegation authority expansion.** The Vroom-Yetton decision model maps when a leader should decide alone versus consulting the team. Currently, sub-agents have narrow authority — they execute tasks but can't restructure the backlog. Selectively expanding this (e.g., allowing a DEVELOPER agent to split a task it judges too large) would reduce orchestrator bottlenecks, mirroring how managers gradually give high-performers more autonomy.
**Outcome-based learning.** Delegation research emphasizes that the _delegator_ learns from outcomes too. Aggregated metrics — QA fail rate by model level, average cycles to Done, time-in-state distributions — would help both the orchestrator agent and the human calibrate their delegation patterns over time. **Outcome-based learning.** Delegation research emphasizes that the _delegator_ learns from outcomes too. Aggregated metrics — TESTER fail rate by model level, average cycles to Done, time-in-state distributions — would help both the orchestrator agent and the human calibrate their delegation patterns over time.
--- ---

View File

@@ -52,13 +52,16 @@ openclaw devclaw setup
The setup wizard walks you through: The setup wizard walks you through:
1. **Agent** — Create a new orchestrator agent or configure an existing one 1. **Agent** — Create a new orchestrator agent or configure an existing one
2. **Developer team** — Choose which LLM model powers each developer level: 2. **Developer team** — Choose which LLM model powers each level:
- **DEV junior** (fast, cheap tasks) — default: `anthropic/claude-haiku-4-5` - **Developer junior** (fast, cheap tasks) — default: `anthropic/claude-haiku-4-5`
- **DEV medior** (standard tasks) — default: `anthropic/claude-sonnet-4-5` - **Developer medior** (standard tasks) — default: `anthropic/claude-sonnet-4-5`
- **DEV senior** (complex tasks) — default: `anthropic/claude-opus-4-5` - **Developer senior** (complex tasks) — default: `anthropic/claude-opus-4-6`
- **QA reviewer** (code review) — default: `anthropic/claude-sonnet-4-5` - **Tester junior** (quick checks) — default: `anthropic/claude-haiku-4-5`
- **QA tester** (manual testing) — default: `anthropic/claude-haiku-4-5` - **Tester medior** (standard review) — default: `anthropic/claude-sonnet-4-5`
3. **Workspace** — Writes AGENTS.md, HEARTBEAT.md, role templates, and initializes state - **Tester senior** (thorough review) — default: `anthropic/claude-opus-4-6`
- **Architect junior** (standard design) — default: `anthropic/claude-sonnet-4-5`
- **Architect senior** (complex architecture) — default: `anthropic/claude-opus-4-6`
3. **Workspace** — Writes AGENTS.md, HEARTBEAT.md, workflow.yaml, role templates, and initializes state
Non-interactive mode: Non-interactive mode:
```bash ```bash
@@ -68,7 +71,7 @@ openclaw devclaw setup --new-agent "My Dev Orchestrator"
# Configure existing agent with custom models # Configure existing agent with custom models
openclaw devclaw setup --agent my-orchestrator \ openclaw devclaw setup --agent my-orchestrator \
--junior "anthropic/claude-haiku-4-5" \ --junior "anthropic/claude-haiku-4-5" \
--senior "anthropic/claude-opus-4-5" --senior "anthropic/claude-opus-4-6"
``` ```
### Option C: Tool call (agent-driven) ### Option C: Tool call (agent-driven)
@@ -86,12 +89,12 @@ setup({
"newAgentName": "My Dev Orchestrator", "newAgentName": "My Dev Orchestrator",
"channelBinding": "telegram", "channelBinding": "telegram",
"models": { "models": {
"dev": { "developer": {
"junior": "anthropic/claude-haiku-4-5", "junior": "anthropic/claude-haiku-4-5",
"senior": "anthropic/claude-opus-4-5" "senior": "anthropic/claude-opus-4-6"
}, },
"qa": { "tester": {
"reviewer": "anthropic/claude-sonnet-4-5" "medior": "anthropic/claude-sonnet-4-5"
} }
} }
}) })
@@ -151,8 +154,8 @@ Go to the Telegram/WhatsApp group for the project and tell the orchestrator agen
The agent calls `project_register`, which atomically: The agent calls `project_register`, which atomically:
- Validates the repo and auto-detects GitHub/GitLab from remote - Validates the repo and auto-detects GitHub/GitLab from remote
- Creates all 8 state labels (idempotent) - Creates all 11 state labels (idempotent)
- Scaffolds role instruction files (`projects/roles/<project>/dev.md` and `qa.md`) - Scaffolds role instruction files (`devclaw/projects/<project>/prompts/developer.md`, `tester.md`, `architect.md`)
- Adds the project entry to `projects.json` - Adds the project entry to `projects.json`
- Logs the registration event - Logs the registration event
@@ -168,20 +171,30 @@ The agent calls `project_register`, which atomically:
"baseBranch": "development", "baseBranch": "development",
"deployBranch": "development", "deployBranch": "development",
"channel": "telegram", "channel": "telegram",
"provider": "github",
"roleExecution": "parallel", "roleExecution": "parallel",
"dev": { "workers": {
"active": false, "developer": {
"issueId": null, "active": false,
"startTime": null, "issueId": null,
"level": null, "startTime": null,
"sessions": { "junior": null, "medior": null, "senior": null } "level": null,
}, "sessions": { "junior": null, "medior": null, "senior": null }
"qa": { },
"active": false, "tester": {
"issueId": null, "active": false,
"startTime": null, "issueId": null,
"level": null, "startTime": null,
"sessions": { "reviewer": null, "tester": null } "level": null,
"sessions": { "junior": null, "medior": null, "senior": null }
},
"architect": {
"active": false,
"issueId": null,
"startTime": null,
"level": null,
"sessions": { "junior": null, "senior": null }
}
} }
} }
} }
@@ -194,7 +207,7 @@ The agent calls `project_register`, which atomically:
Issues can be created in multiple ways: Issues can be created in multiple ways:
- **Via the agent** — Ask the orchestrator in the Telegram group: "Create an issue for adding a login page" (uses `task_create`) - **Via the agent** — Ask the orchestrator in the Telegram group: "Create an issue for adding a login page" (uses `task_create`)
- **Via workers** — DEV/QA workers can call `task_create` to file follow-up bugs they discover - **Via workers** — DEVELOPER/TESTER workers can call `task_create` to file follow-up bugs they discover
- **Via CLI** — `cd ~/git/my-project && gh issue create --title "My first task" --label "To Do"` (or `glab issue create`) - **Via CLI** — `cd ~/git/my-project && gh issue create --title "My first task" --label "To Do"` (or `glab issue create`)
- **Via web UI** — Create an issue and add the "To Do" label - **Via web UI** — Create an issue and add the "To Do" label
@@ -208,9 +221,9 @@ Ask the agent in the Telegram group:
The agent should call `status` and report the "To Do" issue. Then: The agent should call `status` and report the "To Do" issue. Then:
> "Pick up issue #1 for DEV" > "Pick up issue #1 for developer"
The agent calls `work_start`, which assigns a developer level, transitions the label to "Doing", creates or reuses a worker session, and dispatches the task — all in one call. The agent posts the announcement. The agent calls `work_start`, which assigns a level, transitions the label to "Doing", creates or reuses a worker session, and dispatches the task — all in one call. The agent posts the announcement.
## Adding more projects ## Adding more projects
@@ -220,17 +233,20 @@ Each project is fully isolated — separate queue, separate workers, separate st
## Developer levels ## Developer levels
DevClaw assigns tasks to developer levels instead of raw model names. This makes the system intuitive — you're assigning a "junior dev" to fix a typo, not configuring model parameters. DevClaw assigns tasks to developer levels instead of raw model names. This makes the system intuitive — you're assigning a "junior" to fix a typo, not configuring model parameters. All roles use the same level scheme.
| Role | Level | Default model | When to assign | | Role | Level | Default Model | When to assign |
|------|-------|---------------|----------------| |------|-------|---------------|----------------|
| DEV | **junior** | `anthropic/claude-haiku-4-5` | Typos, single-file fixes, CSS changes | | Developer | **junior** | `anthropic/claude-haiku-4-5` | Typos, single-file fixes, CSS changes |
| DEV | **medior** | `anthropic/claude-sonnet-4-5` | Features, bug fixes, multi-file changes | | Developer | **medior** | `anthropic/claude-sonnet-4-5` | Features, bug fixes, multi-file changes |
| DEV | **senior** | `anthropic/claude-opus-4-5` | Architecture, migrations, system-wide refactoring | | Developer | **senior** | `anthropic/claude-opus-4-6` | Architecture, migrations, system-wide refactoring |
| QA | **reviewer** | `anthropic/claude-sonnet-4-5` | Code review, test validation | | Tester | **junior** | `anthropic/claude-haiku-4-5` | Quick smoke tests, basic checks |
| QA | **tester** | `anthropic/claude-haiku-4-5` | Manual testing, smoke tests | | Tester | **medior** | `anthropic/claude-sonnet-4-5` | Standard code review, test validation |
| Tester | **senior** | `anthropic/claude-opus-4-6` | Thorough security review, complex edge cases |
| Architect | **junior** | `anthropic/claude-sonnet-4-5` | Standard design investigation |
| Architect | **senior** | `anthropic/claude-opus-4-6` | Complex architecture decisions |
Change which model powers each level in `openclaw.json` — see [Configuration](CONFIGURATION.md#model-tiers). Change which model powers each level in `workflow.yaml` — see [Configuration](CONFIGURATION.md#role-configuration).
## What the plugin handles vs. what you handle ## What the plugin handles vs. what you handle
@@ -239,17 +255,19 @@ Change which model powers each level in `openclaw.json` — see [Configuration](
| Plugin installation | You (once) | `openclaw plugins install @laurentenhoor/devclaw` | | Plugin installation | You (once) | `openclaw plugins install @laurentenhoor/devclaw` |
| Agent + workspace setup | Plugin (`setup`) | Creates agent, configures models, writes workspace files | | Agent + workspace setup | Plugin (`setup`) | Creates agent, configures models, writes workspace files |
| Channel binding migration | Plugin (`setup` with `migrateFrom`) | Automatically moves channel-wide bindings between agents | | Channel binding migration | Plugin (`setup` with `migrateFrom`) | Automatically moves channel-wide bindings between agents |
| Label setup | Plugin (`project_register`) | 8 labels, created idempotently via IssueProvider | | Label setup | Plugin (`project_register`) | 11 labels, created idempotently via IssueProvider |
| Prompt file scaffolding | Plugin (`project_register`) | Creates `projects/roles/<project>/dev.md` and `qa.md` | | Prompt file scaffolding | Plugin (`project_register`) | Creates `devclaw/projects/<project>/prompts/<role>.md` for each role |
| Project registration | Plugin (`project_register`) | Entry in `projects.json` with empty worker state | | Project registration | Plugin (`project_register`) | Entry in `projects.json` with empty worker state |
| Telegram group setup | You (once per project) | Add bot to group | | Telegram group setup | You (once per project) | Add bot to group |
| Issue creation | Plugin (`task_create`) | Orchestrator or workers create issues from chat | | Issue creation | Plugin (`task_create`) | Orchestrator or workers create issues from chat |
| Label transitions | Plugin | Atomic transitions via issue tracker CLI | | Label transitions | Plugin | Atomic transitions via issue tracker CLI |
| Developer assignment | Plugin | LLM-selected level by orchestrator, keyword heuristic fallback | | Developer assignment | Plugin | LLM-selected level by orchestrator, keyword heuristic fallback |
| State management | Plugin | Atomic read/write to `projects.json` | | State management | Plugin | Atomic read/write to `projects.json` with file locking |
| Session management | Plugin | Creates, reuses, and dispatches to sessions via CLI. Agent never touches session tools. | | Session management | Plugin | Creates, reuses, and dispatches to sessions via CLI. Agent never touches session tools. |
| Task completion | Plugin (`work_finish`) | Workers self-report. Scheduler dispatches next role. | | Task completion | Plugin (`work_finish`) | Workers self-report. Scheduler dispatches next role. |
| Prompt instructions | Plugin (`work_start`) | Loaded from `projects/roles/<project>/<role>.md`, appended to task message | | Role instructions | Plugin (bootstrap hook) | Injected into worker sessions via `agent:bootstrap` hook at session startup |
| Review polling | Plugin (heartbeat) | Auto-advances "In Review" issues when PR is merged |
| Config validation | Plugin | Zod schemas validate `workflow.yaml` at load time |
| Audit logging | Plugin | Automatic NDJSON append per tool call | | Audit logging | Plugin | Automatic NDJSON append per tool call |
| Zombie detection | Plugin | `health` checks active vs alive | | Zombie detection | Plugin | `health` checks active vs alive |
| Queue scanning | Plugin | `status` queries issue tracker per project | | Queue scanning | Plugin | `status` queries issue tracker per project |

View File

@@ -20,7 +20,7 @@ task_comment({
projectGroupId: "<group-id>", projectGroupId: "<group-id>",
issueId: <issue-number>, issueId: <issue-number>,
body: "## QA Review\n\n**Tested:**\n- [List what you tested]\n\n**Results:**\n- [Pass/fail details]\n\n**Environment:**\n- [Test environment details]", body: "## QA Review\n\n**Tested:**\n- [List what you tested]\n\n**Results:**\n- [Pass/fail details]\n\n**Environment:**\n- [Test environment details]",
authorRole: "qa" authorRole: "tester"
}) })
``` ```
@@ -30,21 +30,21 @@ After posting your comment, call `work_finish`:
```javascript ```javascript
work_finish({ work_finish({
role: "qa", role: "tester",
projectGroupId: "<group-id>", projectGroupId: "<group-id>",
result: "pass", // or "fail", "refine", "blocked" result: "pass", // or "fail", "refine", "blocked"
summary: "Brief summary of review outcome" summary: "Brief summary of review outcome"
}) })
``` ```
## QA Results ## TESTER Results
| Result | Label transition | Meaning | | Result | Label transition | Meaning |
|---|---|---| |---|---|---|
| `"pass"` | Testing → Done | Approved. Issue closed. | | `"pass"` | Testing → Done | Approved. Issue closed. |
| `"fail"` | Testing → To Improve | Issues found. Issue reopened, sent back to DEV. | | `"fail"` | Testing → To Improve | Issues found. Issue reopened, sent back to DEVELOPER. |
| `"refine"` | Testing → Refining | Needs human decision. Pipeline pauses. | | `"refine"` | Testing → Refining | Needs human decision. Pipeline pauses. |
| `"blocked"` | Testing → To Test | Cannot complete (env issues, etc.). Returns to QA queue. | | `"blocked"` | Testing → Refining | Cannot complete (env issues, etc.). Awaits human decision. |
## Why Comments Are Required ## Why Comments Are Required
@@ -96,14 +96,14 @@ work_finish({
## Enforcement ## Enforcement
QA workers receive instructions via role templates to: TESTER workers receive instructions via role templates to:
- Always call `task_comment` BEFORE `work_finish` - Always call `task_comment` BEFORE `work_finish`
- Include specific details about what was tested - Include specific details about what was tested
- Document results, environment, and any notes - Document results, environment, and any notes
Prompt templates affected: Prompt templates affected:
- `projects/roles/<project>/qa.md` - `devclaw/projects/<project>/prompts/tester.md`
- All project-specific QA templates should follow this pattern - `devclaw/prompts/tester.md` (default)
## Best Practices ## Best Practices
@@ -116,5 +116,5 @@ Prompt templates affected:
## Related ## Related
- Tool: [`task_comment`](TOOLS.md#task_comment) — Add comments to issues - Tool: [`task_comment`](TOOLS.md#task_comment) — Add comments to issues
- Tool: [`work_finish`](TOOLS.md#work_finish) — Complete QA tasks - Tool: [`work_finish`](TOOLS.md#work_finish) — Complete TESTER tasks
- Config: [`projects/roles/<project>/qa.md`](CONFIGURATION.md#role-instruction-files) — QA role instructions - Config: [`devclaw/projects/<project>/prompts/tester.md`](CONFIGURATION.md#role-instruction-files) — Tester role instructions

View File

@@ -1,53 +1,77 @@
# DevClaw — Roadmap # DevClaw — Roadmap
## Configurable Roles ## Recently Completed
Currently DevClaw has two hardcoded roles: **DEV** and **QA**. Each project gets one worker slot per role. The pipeline is fixed: DEV writes code, QA reviews it. ### Dynamic Roles and Role Registry
This works for the common case but breaks down when you want: Roles are no longer hardcoded. The `ROLE_REGISTRY` in `lib/roles/registry.ts` defines three built-in roles — **developer**, **tester**, **architect** — each with configurable levels, models, emoji, and completion results. Adding a new role means adding one entry to the registry; everything else (workers, sessions, labels, prompts) derives from it.
- A **design** role that creates mockups before DEV starts
- A **devops** role that handles deployment after QA passes
- A **PM** role that triages and prioritizes the backlog
- Multiple DEV workers in parallel (e.g. frontend + backend)
- A project with no QA step at all
### Planned: role configuration per project All roles use a unified junior/medior/senior level scheme (architect uses junior/senior). Per-role model overrides live in `workflow.yaml`.
Roles become a configurable list instead of a hardcoded pair. Each role defines: ### Workflow State Machine
- **Name** — e.g. `design`, `dev`, `qa`, `devops`
- **Levels** — which developer levels can be assigned (e.g. design only needs `medior`)
- **Pipeline position** — where it sits in the task lifecycle
- **Worker count** — how many concurrent workers (default: 1)
```json The issue lifecycle is now a configurable state machine defined in `workflow.yaml`. The default workflow defines 11 states:
{
"roles": { ```
"dev": { "levels": ["junior", "medior", "senior"], "workers": 1 }, Planning → To Do → Doing → To Test → Testing → Done
"qa": { "levels": ["reviewer", "tester"], "workers": 1 }, ↘ In Review → (PR merged) → To Test
"devops": { "levels": ["medior", "senior"], "workers": 1 } ↘ To Improve → Doing
}, ↘ Refining → (human decision)
"pipeline": ["dev", "qa", "devops"] To Design → Designing → Planning
}
``` ```
The pipeline definition replaces the hardcoded `Doing → To Test → Testing → Done` flow. Labels and transitions are generated from the pipeline config. The scheduler follows the pipeline order when filling free slots. States have types (`queue`, `active`, `hold`, `review`, `terminal`), transitions with actions (`gitPull`, `detectPr`, `closeIssue`, `reopenIssue`), and review checks (`prMerged`, `prApproved`).
### Open questions ### Three-Layer Configuration
- How do custom labels map? Generate from role names, or let users define? Config resolution follows three layers, each partially overriding the one below:
- Should roles have their own instruction files (`projects/roles/<project>/<role>.md`) — yes, this already works
- How to handle parallel roles (e.g. frontend + backend DEV in parallel before QA)? 1. **Built-in defaults**`ROLE_REGISTRY` + `DEFAULT_WORKFLOW`
2. **Workspace**`<workspace>/devclaw/workflow.yaml`
3. **Project**`<workspace>/devclaw/projects/<project>/workflow.yaml`
Validated at load time with Zod schemas (`lib/config/schema.ts`). Integrity checks verify transition targets exist, queue states have roles, and terminal states have no outgoing transitions.
### Provider Resilience
All issue tracker calls (GitHub via `gh`, GitLab via `glab`) are wrapped with cockatiel retry (3 attempts, exponential backoff) and circuit breaker (opens after 5 consecutive failures, half-opens after 30s). See `lib/providers/resilience.ts`.
### Bootstrap Hook for Role Instructions
Worker sessions receive role-specific instructions via the `agent:bootstrap` hook at session startup, not appended to the task message. The hook reads from `devclaw/projects/<project>/prompts/<role>.md`, falling back to `devclaw/prompts/<role>.md`. Supports source tracking with `loadRoleInstructions(dir, { withSource: true })`.
### In Review State and PR Polling
DEVELOPER can submit a PR for human review (`result: "review"`), which transitions the issue to `In Review`. The heartbeat's review pass polls PR status via `getPrStatus()` on the provider. When the PR is merged, the issue auto-transitions to `To Test` for TESTER pickup.
### Architect Role
The architect role enables design investigations. `design_task` creates a `To Design` issue and dispatches an architect worker. The architect completes with `done` (→ Planning) or `blocked` (→ Refining).
### Workspace Layout Migration
Data directory moved from `<workspace>/projects/` to `<workspace>/devclaw/`. Automatic migration on first load — see `lib/setup/migrate-layout.ts`.
### E2E Test Infrastructure
Purpose-built test harness (`lib/testing/`) with:
- `TestProvider` — in-memory `IssueProvider` with call tracking
- `createTestHarness()` — scaffolds temp workspace, mock `runCommand`, test provider
- `simulateBootstrap()` — tests the full bootstrap hook chain without a live gateway
- `CommandInterceptor` — captures and filters CLI calls
--- ---
## Channel-agnostic Groups ## Planned
### Channel-agnostic Groups
Currently DevClaw maps projects to **Telegram group IDs**. The `projectGroupId` is a Telegram-specific negative number. This means: Currently DevClaw maps projects to **Telegram group IDs**. The `projectGroupId` is a Telegram-specific negative number. This means:
- WhatsApp groups can't be used as project channels (partially supported now via `channel` field) - WhatsApp groups can't be used as project channels (partially supported now via `channel` field)
- Discord, Slack, or other channels are excluded - Discord, Slack, or other channels are excluded
- The naming (`groupId`, `groupName`) is Telegram-specific - The naming (`groupId`, `groupName`) is Telegram-specific
### Planned: abstract channel binding **Planned: abstract channel binding**
Replace Telegram-specific group IDs with a generic channel identifier that works across any OpenClaw channel. Replace Telegram-specific group IDs with a generic channel identifier that works across any OpenClaw channel.
@@ -57,14 +81,12 @@ Replace Telegram-specific group IDs with a generic channel identifier that works
"whatsapp:120363140032870788@g.us": { "whatsapp:120363140032870788@g.us": {
"name": "my-project", "name": "my-project",
"channel": "whatsapp", "channel": "whatsapp",
"peer": "120363140032870788@g.us", "peer": "120363140032870788@g.us"
...
}, },
"telegram:-1234567890": { "telegram:-1234567890": {
"name": "other-project", "name": "other-project",
"channel": "telegram", "channel": "telegram",
"peer": "-1234567890", "peer": "-1234567890"
...
} }
} }
} }
@@ -79,7 +101,7 @@ Key changes:
This enables any OpenClaw channel (Telegram, WhatsApp, Discord, Slack, etc.) to host a project. This enables any OpenClaw channel (Telegram, WhatsApp, Discord, Slack, etc.) to host a project.
### Open questions #### Open questions
- Should one project be bindable to multiple channels? (e.g. Telegram for devs, WhatsApp for stakeholder updates) - Should one project be bindable to multiple channels? (e.g. Telegram for devs, WhatsApp for stakeholder updates)
- How does the orchestrator agent handle cross-channel context? - How does the orchestrator agent handle cross-channel context?
@@ -89,8 +111,9 @@ This enables any OpenClaw channel (Telegram, WhatsApp, Discord, Slack, etc.) to
## Other Ideas ## Other Ideas
- **Jira provider** — `IssueProvider` interface already abstracts GitHub/GitLab; Jira is the obvious next addition - **Jira provider** — `IssueProvider` interface already abstracts GitHub/GitLab; Jira is the obvious next addition
- **Deployment integration** — `work_finish` QA pass could trigger a deploy step via webhook or CLI - **Deployment integration** — `work_finish` TESTER pass could trigger a deploy step via webhook or CLI
- **Cost tracking** — log token usage per task/level, surface in `status` - **Cost tracking** — log token usage per task/level, surface in `status`
- **Priority scoring** — automatic priority assignment based on labels, age, and dependencies - **Priority scoring** — automatic priority assignment based on labels, age, and dependencies
- **Session archival** — auto-archive idle sessions after configurable timeout (currently indefinite) - **Session archival** — auto-archive idle sessions after configurable timeout (currently indefinite)
- **Progressive delegation** — track QA pass rates per level and auto-promote (see [Management Theory](MANAGEMENT.md)) - **Progressive delegation** — track TESTER pass rates per level and auto-promote (see [Management Theory](MANAGEMENT.md))
- **Custom workflow actions** — user-defined actions in `workflow.yaml` (e.g. deploy scripts, notifications)

View File

@@ -1,216 +1,215 @@
# DevClaw Testing Guide # DevClaw Testing Guide
Comprehensive automated testing for DevClaw onboarding and setup. DevClaw uses Node.js built-in test runner (`node:test`) with `node:assert/strict` for all tests.
## Quick Start ## Quick Start
```bash ```bash
# Install dependencies
npm install
# Run all tests # Run all tests
npm test npx tsx --test lib/**/*.test.ts
# Run with coverage report # Run a specific test file
npm run test:coverage npx tsx --test lib/roles/registry.test.ts
# Run in watch mode (auto-rerun on changes) # Run E2E tests only
npm run test:watch npx tsx --test lib/services/*.e2e.test.ts
# Run with UI (browser-based test explorer) # Build (also type-checks all test files)
npm run test:ui npm run build
``` ```
## Test Coverage ## Test Files
### Scenario 1: New User (No Prior DevClaw Setup) ### Unit Tests
**File:** `tests/setup/new-user.test.ts`
**What's tested:** | File | What it tests |
- First-time agent creation with default models |---|---|
- Channel binding creation (telegram/whatsapp) | [lib/roles/registry.test.ts](../lib/roles/registry.test.ts) | Role registry: role lookup, level resolution, model defaults |
- Workspace file generation (AGENTS.md, HEARTBEAT.md, projects/, log/) | [lib/projects.test.ts](../lib/projects.test.ts) | Project state: read/write, worker state, atomic file operations |
- Plugin configuration initialization | [lib/bootstrap-hook.test.ts](../lib/bootstrap-hook.test.ts) | Bootstrap hook: role instruction loading, source tracking, overloads |
- Error handling: channel not configured | [lib/tools/task-update.test.ts](../lib/tools/task-update.test.ts) | Task update tool: label transitions, validation |
- Error handling: channel disabled | [lib/tools/design-task.test.ts](../lib/tools/design-task.test.ts) | Design task tool: architect dispatch |
| [lib/tools/queue-status.test.ts](../lib/tools/queue-status.test.ts) | Queue status formatting |
| [lib/setup/migrate-layout.test.ts](../lib/setup/migrate-layout.test.ts) | Workspace layout migration: `projects/``devclaw/` |
### E2E Tests
| File | What it tests |
|---|---|
| [lib/services/pipeline.e2e.test.ts](../lib/services/pipeline.e2e.test.ts) | Full pipeline: completion rules, label transitions, actions |
| [lib/services/bootstrap.e2e.test.ts](../lib/services/bootstrap.e2e.test.ts) | Bootstrap hook chain: session key → parse → load instructions → inject |
## Test Infrastructure
### Test Harness (`lib/testing/`)
The [`lib/testing/`](../lib/testing/) module provides E2E test infrastructure:
**Example:**
```typescript ```typescript
// Before: openclaw.json has no DevClaw agents import { createTestHarness } from "../testing/index.js";
{
"agents": { "list": [{ "id": "main", ... }] },
"bindings": [],
"plugins": { "entries": {} }
}
// After: New orchestrator created const h = await createTestHarness({
{ projectName: "my-project",
"agents": { groupId: "-1234567890",
"list": [ workflow: DEFAULT_WORKFLOW,
{ "id": "main", ... }, workers: {
{ "id": "my-first-orchestrator", ... } developer: { active: true, issueId: "42", level: "medior" },
]
}, },
"bindings": [ });
{ "agentId": "my-first-orchestrator", "match": { "channel": "telegram" } } try {
], // ... run tests against h.provider, h.commands, etc.
"plugins": { } finally {
"entries": { await h.cleanup();
"devclaw": {
"config": {
"models": {
"dev": {
"junior": "anthropic/claude-haiku-4-5",
"medior": "anthropic/claude-sonnet-4-5",
"senior": "anthropic/claude-opus-4-5"
},
"qa": {
"reviewer": "anthropic/claude-sonnet-4-5",
"tester": "anthropic/claude-haiku-4-5"
}
}
}
}
}
}
} }
``` ```
### Scenario 2: Existing User (Migration) **`createTestHarness()`** scaffolds:
**File:** `tests/setup/existing-user.test.ts` - Temporary workspace directory with `devclaw/` data dir and `log/` subdirectory
- `projects.json` with test project and configurable worker state
- Mock `runCommand` via `CommandInterceptor` (captures all CLI calls)
- `TestProvider` — in-memory `IssueProvider` with call tracking
**What's tested:** ### TestProvider
- Channel conflict detection (existing channel-wide binding)
- Binding migration from old agent to new agent In-memory implementation of `IssueProvider` for testing. Tracks all provider method calls and maintains in-memory issue state:
- Custom model preservation during migration
- Old agent preservation (not deleted)
- Error handling: migration source doesn't exist
- Error handling: migration source has no binding
**Example:**
```typescript ```typescript
// Before: Old orchestrator has telegram binding const h = await createTestHarness();
{ h.provider.seedIssue(42, {
"agents": { title: "Fix the bug",
"list": [ labels: ["Doing"],
{ "id": "main", ... }, state: "open",
{ "id": "old-orchestrator", ... } });
]
},
"bindings": [
{ "agentId": "old-orchestrator", "match": { "channel": "telegram" } }
]
}
// After: Binding migrated to new orchestrator // After running pipeline code:
{ const calls = h.provider.calls; // All method invocations
"agents": {
"list": [
{ "id": "main", ... },
{ "id": "old-orchestrator", ... },
{ "id": "new-orchestrator", ... }
]
},
"bindings": [
{ "agentId": "new-orchestrator", "match": { "channel": "telegram" } }
]
}
``` ```
### Scenario 3: Power User (Multiple Agents) ### CommandInterceptor
**File:** `tests/setup/power-user.test.ts`
**What's tested:** Captures all `runCommand` calls during tests. Provides filtering and extraction helpers:
- No conflicts with group-specific bindings
- Channel-wide binding creation alongside group bindings
- Multiple orchestrators coexisting
- Routing logic (specific bindings win over channel-wide)
- WhatsApp support
- Scale testing (12+ orchestrators)
**Example:**
```typescript ```typescript
// Before: Two project orchestrators with group-specific bindings // All captured commands
{ h.commands.commands;
"agents": {
"list": [
{ "id": "project-a-orchestrator", ... },
{ "id": "project-b-orchestrator", ... }
]
},
"bindings": [
{
"agentId": "project-a-orchestrator",
"match": { "channel": "telegram", "peer": { "kind": "group", "id": "-1001234567890" } }
},
{
"agentId": "project-b-orchestrator",
"match": { "channel": "telegram", "peer": { "kind": "group", "id": "-1009876543210" } }
}
]
}
// After: Channel-wide orchestrator added (no conflicts) // Filter by command name
{ h.commands.commandsFor("openclaw");
"agents": {
"list": [
{ "id": "project-a-orchestrator", ... },
{ "id": "project-b-orchestrator", ... },
{ "id": "global-orchestrator", ... }
]
},
"bindings": [
{
"agentId": "project-a-orchestrator",
"match": { "channel": "telegram", "peer": { "kind": "group", "id": "-1001234567890" } }
},
{
"agentId": "project-b-orchestrator",
"match": { "channel": "telegram", "peer": { "kind": "group", "id": "-1009876543210" } }
},
{
"agentId": "global-orchestrator",
"match": { "channel": "telegram" } // Channel-wide (no peer)
}
]
}
// Routing: Group messages go to specific agents, everything else goes to global // Extract task messages dispatched to workers
h.commands.taskMessages();
// Extract session creation patches
h.commands.sessionPatches();
// Reset between test cases
h.commands.reset();
``` ```
## Test Architecture ### simulateBootstrap
### Mock File System Tests the full bootstrap hook chain without a live OpenClaw gateway:
The tests use an in-memory mock file system (`MockFileSystem`) that simulates:
- Reading/writing openclaw.json
- Creating/reading workspace files
- Tracking command executions (openclaw agents add)
**Why?** Tests run in isolation without touching the real file system, making them:
- Fast (no I/O)
- Reliable (no file conflicts)
- Repeatable (clean state every test)
### Fixtures
Pre-built configurations for different user types:
- `createNewUserConfig()` - Empty slate
- `createCommonUserConfig()` - One orchestrator with binding
- `createPowerUserConfig()` - Multiple orchestrators with group bindings
- `createNoChannelConfig()` - Channel not configured
- `createDisabledChannelConfig()` - Channel disabled
### Assertions
Reusable assertion helpers that make tests readable:
```typescript ```typescript
assertAgentExists(mockFs, "my-agent", "My Agent"); // Write a project-specific prompt
assertChannelBinding(mockFs, "my-agent", "telegram"); await h.writePrompt("developer", "Custom dev instructions", "my-project");
assertWorkspaceFilesExist(mockFs, "my-agent");
assertDevClawConfig(mockFs, { junior: "anthropic/claude-haiku-4-5" }); // Simulate bootstrap for a developer session
const files = await h.simulateBootstrap(
"agent:orchestrator:subagent:my-project-developer-medior"
);
// Verify injected bootstrap files
assert.strictEqual(files.length, 1);
assert.strictEqual(files[0].content, "Custom dev instructions");
```
## Writing Tests
### Pattern: Unit Test
```typescript
import { describe, it } from "node:test";
import assert from "node:assert/strict";
describe("my feature", () => {
it("should do something", () => {
const result = myFunction("input");
assert.strictEqual(result, "expected");
});
});
```
### Pattern: E2E Pipeline Test
```typescript
import { describe, it, afterEach } from "node:test";
import assert from "node:assert/strict";
import { createTestHarness, type TestHarness } from "../testing/index.js";
import { executeCompletion } from "./pipeline.js";
describe("pipeline completion", () => {
let h: TestHarness;
afterEach(async () => {
if (h) await h.cleanup();
});
it("developer:done transitions Doing → To Test", async () => {
h = await createTestHarness({
workers: {
developer: { active: true, issueId: "42", level: "medior" },
},
});
h.provider.seedIssue(42, { labels: ["Doing"], state: "open" });
const result = await executeCompletion({
workspaceDir: h.workspaceDir,
groupId: h.groupId,
project: h.project,
workflow: h.workflow,
provider: h.provider,
role: "developer",
result: "done",
});
assert.strictEqual(result.rule.to, "To Test");
});
});
```
### Pattern: Bootstrap Hook Test
```typescript
import { describe, it, afterEach } from "node:test";
import assert from "node:assert/strict";
import { createTestHarness, type TestHarness } from "../testing/index.js";
describe("bootstrap instructions", () => {
let h: TestHarness;
afterEach(async () => {
if (h) await h.cleanup();
});
it("injects project-specific prompt for developer", async () => {
h = await createTestHarness({ projectName: "webapp" });
await h.writePrompt("developer", "Build with React", "webapp");
const files = await h.simulateBootstrap(
"agent:orchestrator:subagent:webapp-developer-medior"
);
assert.strictEqual(files.length, 1);
assert.ok(files[0].content?.includes("React"));
});
});
``` ```
## CI/CD Integration ## CI/CD Integration
### GitHub Actions ### GitHub Actions
```yaml ```yaml
name: Test name: Test
on: [push, pull_request] on: [push, pull_request]
@@ -218,122 +217,52 @@ jobs:
test: test:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@v3 - uses: actions/checkout@v4
- uses: actions/setup-node@v3 - uses: actions/setup-node@v4
with: with:
node-version: 20 node-version: 20
- run: npm ci - run: npm ci
- run: npm test - run: npm run build
- run: npm run test:coverage - run: npx tsx --test lib/**/*.test.ts
- uses: codecov/codecov-action@v3
with:
files: ./coverage/coverage-final.json
``` ```
### GitLab CI ### GitLab CI
```yaml ```yaml
test: test:
image: node:20 image: node:20
script: script:
- npm ci - npm ci
- npm test - npm run build
- npm run test:coverage - npx tsx --test lib/**/*.test.ts
coverage: '/Lines\s*:\s*(\d+\.\d+)%/'
artifacts:
reports:
coverage_report:
coverage_format: cobertura
path: coverage/cobertura-coverage.xml
``` ```
## Debugging Tests ## Debugging Tests
### Run specific test ### Run specific test
```bash ```bash
npm test -- new-user # Run all new-user tests # Run by file
npm test -- "should create agent" # Run tests matching pattern npx tsx --test lib/roles/registry.test.ts
# Run by name pattern
npx tsx --test --test-name-pattern "should have all expected roles" lib/**/*.test.ts
``` ```
### Debug with Node inspector ### Debug with Node inspector
```bash ```bash
node --inspect-brk node_modules/.bin/vitest run node --inspect-brk node_modules/.bin/tsx --test lib/roles/registry.test.ts
``` ```
Then open Chrome DevTools at `chrome://inspect` Then open Chrome DevTools at `chrome://inspect`.
### View coverage report
```bash
npm run test:coverage
open coverage/index.html
```
## Adding Tests
### 1. Choose the right test file
- New feature → `tests/setup/new-user.test.ts`
- Migration feature → `tests/setup/existing-user.test.ts`
- Multi-agent feature → `tests/setup/power-user.test.ts`
### 2. Write the test
```typescript
import { describe, it, expect, beforeEach } from "vitest";
import { MockFileSystem } from "../helpers/mock-fs.js";
import { createNewUserConfig } from "../helpers/fixtures.js";
import { assertAgentExists } from "../helpers/assertions.js";
describe("My new feature", () => {
let mockFs: MockFileSystem;
beforeEach(() => {
mockFs = new MockFileSystem(createNewUserConfig());
});
it("should do something useful", async () => {
// GIVEN: initial state (via fixture)
const beforeCount = countAgents(mockFs);
// WHEN: execute the operation
const config = mockFs.getConfig();
config.agents.list.push({
id: "test-agent",
name: "Test Agent",
workspace: "/home/test/.openclaw/workspace-test-agent",
agentDir: "/home/test/.openclaw/agents/test-agent/agent",
});
mockFs.setConfig(config);
// THEN: verify the outcome
assertAgentExists(mockFs, "test-agent", "Test Agent");
expect(countAgents(mockFs)).toBe(beforeCount + 1);
});
});
```
### 3. Run your test
```bash
npm test -- "should do something useful"
```
## Best Practices ## Best Practices
### ✅ DO - **Use `node:test` + `node:assert/strict`** — no test framework dependencies
- Test one thing per test - **Use `createTestHarness()`** for any test that needs workspace state, providers, or command interception
- Use descriptive test names ("should create agent with telegram binding") - **Always call `h.cleanup()`** in `afterEach` to remove temp directories
- Use fixtures for initial state - **Seed provider state** with `h.provider.seedIssue()` before testing pipeline operations
- Use assertion helpers for readability - **Use `h.commands`** to verify what CLI commands were dispatched without actually running them
- Test error cases - **One assertion focus per test** — test one behavior, not the whole pipeline
- **Test error cases** — invalid roles, missing projects, bad state transitions
### ❌ DON'T
- Test implementation details (test behavior, not internals)
- Share state between tests (use beforeEach)
- Mock everything (only mock file system and commands)
- Write brittle tests (avoid hard-coded UUIDs, timestamps)
## Test Metrics
Current coverage:
- **Lines:** Target 80%+
- **Functions:** Target 90%+
- **Branches:** Target 75%+
Run `npm run test:coverage` to see detailed metrics.

View File

@@ -1,6 +1,6 @@
# DevClaw — Tools Reference # DevClaw — Tools Reference
Complete reference for all 11 tools registered by DevClaw. See [`index.ts`](../index.ts) for registration. Complete reference for all tools registered by DevClaw. See [`index.ts`](../index.ts) for registration.
## Worker Lifecycle ## Worker Lifecycle
@@ -17,9 +17,9 @@ Pick up a task from the issue queue. Handles level assignment, label transition,
| Parameter | Type | Required | Description | | Parameter | Type | Required | Description |
|---|---|---|---| |---|---|---|---|
| `issueId` | number | No | Issue ID. If omitted, picks next by priority. | | `issueId` | number | No | Issue ID. If omitted, picks next by priority. |
| `role` | `"dev"` \| `"qa"` | No | Worker role. Auto-detected from issue label if omitted. | | `role` | `"developer"` \| `"tester"` \| `"architect"` | No | Worker role. Auto-detected from issue label if omitted. |
| `projectGroupId` | string | No | Project group ID. Auto-detected from group context. | | `projectGroupId` | string | No | Project group ID. Auto-detected from group context. |
| `level` | string | No | Developer level (`junior`, `medior`, `senior`, `reviewer`). Auto-detected if omitted. | | `level` | string | No | Level (`junior`, `medior`, `senior`). Auto-detected if omitted. |
**What it does atomically:** **What it does atomically:**
@@ -28,15 +28,14 @@ Pick up a task from the issue queue. Handles level assignment, label transition,
3. Fetches issue from tracker, verifies correct label state 3. Fetches issue from tracker, verifies correct label state
4. Assigns level (LLM-chosen via `level` param → label detection → keyword heuristic fallback) 4. Assigns level (LLM-chosen via `level` param → label detection → keyword heuristic fallback)
5. Resolves level to model ID via config or defaults 5. Resolves level to model ID via config or defaults
6. Loads prompt instructions from `projects/roles/<project>/<role>.md` 6. Looks up existing session for assigned level (session-per-level)
7. Looks up existing session for assigned level (session-per-level) 7. Transitions label (e.g. `To Do``Doing`)
8. Transitions label (e.g. `To Do``Doing`) 8. Creates session via Gateway RPC if new (`sessions.patch`)
9. Creates session via Gateway RPC if new (`sessions.patch`) 9. Dispatches task to worker session via CLI (`openclaw gateway call agent`)
10. Dispatches task to worker session via CLI (`openclaw gateway call agent`) 10. Updates `projects.json` state (active, issueId, level, session key)
11. Updates `projects.json` state (active, issueId, level, session key) 11. Writes audit log entries (work_start + model_selection)
12. Writes audit log entries (work_start + model_selection) 12. Sends notification
13. Sends notification 13. Returns announcement text
14. Returns announcement text
**Level selection priority:** **Level selection priority:**
@@ -55,7 +54,7 @@ Pick up a task from the issue queue. Handles level assignment, label transition,
### `work_finish` ### `work_finish`
Complete a task with a result. Called by workers (DEV/QA sub-agent sessions) directly, or by the orchestrator. Complete a task with a result. Called by workers (DEVELOPER/TESTER/ARCHITECT sub-agent sessions) directly, or by the orchestrator.
**Source:** [`lib/tools/work-finish.ts`](../lib/tools/work-finish.ts) **Source:** [`lib/tools/work-finish.ts`](../lib/tools/work-finish.ts)
@@ -63,7 +62,7 @@ Complete a task with a result. Called by workers (DEV/QA sub-agent sessions) dir
| Parameter | Type | Required | Description | | Parameter | Type | Required | Description |
|---|---|---|---| |---|---|---|---|
| `role` | `"dev"` \| `"qa"` | Yes | Worker role | | `role` | `"developer"` \| `"tester"` \| `"architect"` | Yes | Worker role |
| `result` | string | Yes | Completion result (see table below) | | `result` | string | Yes | Completion result (see table below) |
| `projectGroupId` | string | Yes | Project group ID | | `projectGroupId` | string | Yes | Project group ID |
| `summary` | string | No | Brief summary for the announcement | | `summary` | string | No | Brief summary for the announcement |
@@ -73,12 +72,15 @@ Complete a task with a result. Called by workers (DEV/QA sub-agent sessions) dir
| Role | Result | Label transition | Side effects | | Role | Result | Label transition | Side effects |
|---|---|---|---| |---|---|---|---|
| DEV | `"done"` | Doing → To Test | git pull, auto-detect PR URL | | developer | `"done"` | Doing → To Test | git pull, auto-detect PR URL |
| DEV | `"blocked"` | Doing → To Do | Task returns to queue | | developer | `"review"` | Doing → In Review | auto-detect PR URL, heartbeat polls for merge |
| QA | `"pass"` | Testing → Done | Issue closed | | developer | `"blocked"` | Doing → Refining | Awaits human decision |
| QA | `"fail"` | Testing → To Improve | Issue reopened | | tester | `"pass"` | Testing → Done | Issue closed |
| QA | `"refine"` | Testing → Refining | Awaits human decision | | tester | `"fail"` | Testing → To Improve | Issue reopened |
| QA | `"blocked"` | Testing → To Test | Task returns to QA queue | | tester | `"refine"` | Testing → Refining | Awaits human decision |
| tester | `"blocked"` | Testing → Refining | Awaits human decision |
| architect | `"done"` | Designing → Planning | Design complete |
| architect | `"blocked"` | Designing → Refining | Awaits human decision |
**What it does atomically:** **What it does atomically:**
@@ -111,7 +113,7 @@ Create a new issue in the project's issue tracker.
| `description` | string | No | Full issue body (markdown) | | `description` | string | No | Full issue body (markdown) |
| `label` | StateLabel | No | State label. Defaults to `"Planning"`. | | `label` | StateLabel | No | State label. Defaults to `"Planning"`. |
| `assignees` | string[] | No | GitHub/GitLab usernames to assign | | `assignees` | string[] | No | GitHub/GitLab usernames to assign |
| `pickup` | boolean | No | If true, immediately pick up for DEV after creation | | `pickup` | boolean | No | If true, immediately pick up for DEVELOPER after creation |
**Use cases:** **Use cases:**
@@ -138,7 +140,7 @@ Change an issue's state label manually without going through the full pickup/com
| `state` | StateLabel | Yes | New state label | | `state` | StateLabel | Yes | New state label |
| `reason` | string | No | Audit log reason for the change | | `reason` | string | No | Audit log reason for the change |
**Valid states:** `Planning`, `To Do`, `Doing`, `To Test`, `Testing`, `Done`, `To Improve`, `Refining` **Valid states:** `Planning`, `To Do`, `Doing`, `To Test`, `Testing`, `Done`, `To Improve`, `Refining`, `In Review`, `To Design`, `Designing`
**Use cases:** **Use cases:**
@@ -161,12 +163,12 @@ Add a comment to an issue for feedback, notes, or discussion.
| `projectGroupId` | string | Yes | Project group ID | | `projectGroupId` | string | Yes | Project group ID |
| `issueId` | number | Yes | Issue ID to comment on | | `issueId` | number | Yes | Issue ID to comment on |
| `body` | string | Yes | Comment body (markdown) | | `body` | string | Yes | Comment body (markdown) |
| `authorRole` | `"dev"` \| `"qa"` \| `"orchestrator"` | No | Attribution role prefix | | `authorRole` | `"developer"` \| `"tester"` \| `"orchestrator"` | No | Attribution role prefix |
**Use cases:** **Use cases:**
- QA adds review feedback before pass/fail decision - TESTER adds review feedback before pass/fail decision
- DEV posts implementation notes or progress updates - DEVELOPER posts implementation notes or progress updates
- Orchestrator adds summary comments - Orchestrator adds summary comments
When `authorRole` is provided, the comment is prefixed with a role emoji and attribution label. When `authorRole` is provided, the comment is prefixed with a role emoji and attribution label.
@@ -191,7 +193,7 @@ Lightweight queue + worker state dashboard.
**Returns per project:** **Returns per project:**
- Worker state: active/idle, current issue, level, start time - Worker state per role: active/idle, current issue, level, start time
- Queue counts: To Do, To Test, To Improve - Queue counts: To Do, To Test, To Improve
- Role execution mode - Role execution mode
@@ -226,7 +228,7 @@ Worker health scan with optional auto-fix.
### `work_heartbeat` ### `work_heartbeat`
Manual trigger for heartbeat: health fix + queue dispatch. Same logic as the background heartbeat service, but invoked on demand. Manual trigger for heartbeat: health fix + review polling + queue dispatch. Same logic as the background heartbeat service, but invoked on demand.
**Source:** [`lib/tools/work-heartbeat.ts`](../lib/tools/work-heartbeat.ts) **Source:** [`lib/tools/work-heartbeat.ts`](../lib/tools/work-heartbeat.ts)
@@ -239,15 +241,16 @@ Manual trigger for heartbeat: health fix + queue dispatch. Same logic as the bac
| `maxPickups` | number | No | Max worker dispatches per tick. | | `maxPickups` | number | No | Max worker dispatches per tick. |
| `activeSessions` | string[] | No | Active session IDs for zombie detection. | | `activeSessions` | string[] | No | Active session IDs for zombie detection. |
**Two-pass sweep:** **Three-pass sweep:**
1. **Health pass** — Runs `checkWorkerHealth` per project per role. Auto-fixes zombies, stale workers, orphaned state. 1. **Health pass** — Runs `checkWorkerHealth` per project per role. Auto-fixes zombies, stale workers, orphaned state.
2. **Tick pass**Calls `projectTick` per project. Fills free worker slots by priority (To Improve > To Test > To Do). 2. **Review pass**Polls PR status for issues in "In Review" state. Transitions to "To Test" when PR is merged.
3. **Tick pass** — Calls `projectTick` per project. Fills free worker slots by priority (To Improve > To Test > To Do).
**Execution guards:** **Execution guards:**
- `projectExecution: "sequential"` — only one project active at a time - `projectExecution: "sequential"` — only one project active at a time
- `roleExecution: "sequential"` — only one role (DEV or QA) active at a time per project (enforced in `projectTick`) - `roleExecution: "sequential"` — only one role active at a time per project
--- ---
@@ -272,18 +275,16 @@ One-time project setup. Creates state labels, scaffolds prompt files, adds proje
| `baseBranch` | string | Yes | Base branch for development | | `baseBranch` | string | Yes | Base branch for development |
| `deployBranch` | string | No | Deploy branch. Defaults to baseBranch. | | `deployBranch` | string | No | Deploy branch. Defaults to baseBranch. |
| `deployUrl` | string | No | Deployment URL | | `deployUrl` | string | No | Deployment URL |
| `roleExecution` | `"parallel"` \| `"sequential"` | No | DEV/QA parallelism. Default: `"parallel"`. | | `roleExecution` | `"parallel"` \| `"sequential"` | No | DEVELOPER/TESTER parallelism. Default: `"parallel"`. |
**What it does atomically:** **What it does atomically:**
1. Validates project not already registered 1. Validates project not already registered
2. Resolves repo path, auto-detects GitHub/GitLab from git remote 2. Resolves repo path, auto-detects GitHub/GitLab from git remote
3. Verifies provider health (CLI installed and authenticated) 3. Verifies provider health (CLI installed and authenticated)
4. Creates all 8 state labels (idempotent — safe to run again) 4. Creates all 11 state labels (idempotent — safe to run again)
5. Adds project entry to `projects.json` with empty worker state 5. Adds project entry to `projects.json` with empty worker state for all registered roles
- DEV sessions: `{ junior: null, medior: null, senior: null }` 6. Scaffolds prompt files: `devclaw/projects/<project>/prompts/<role>.md` for each role
- QA sessions: `{ reviewer: null, tester: null }`
6. Scaffolds prompt files: `projects/roles/<project>/dev.md` and `qa.md`
7. Writes audit log 7. Writes audit log
--- ---
@@ -301,7 +302,7 @@ Agent + workspace initialization.
| `newAgentName` | string | No | Create a new agent. Omit to configure current workspace. | | `newAgentName` | string | No | Create a new agent. Omit to configure current workspace. |
| `channelBinding` | `"telegram"` \| `"whatsapp"` | No | Channel to bind (with `newAgentName` only) | | `channelBinding` | `"telegram"` \| `"whatsapp"` | No | Channel to bind (with `newAgentName` only) |
| `migrateFrom` | string | No | Agent ID to migrate channel binding from | | `migrateFrom` | string | No | Agent ID to migrate channel binding from |
| `models` | object | No | Model overrides per role and level (see [Configuration](CONFIGURATION.md#model-tiers)) | | `models` | object | No | Model overrides per role and level (see [Configuration](CONFIGURATION.md#role-configuration)) |
| `projectExecution` | `"parallel"` \| `"sequential"` | No | Project execution mode | | `projectExecution` | `"parallel"` \| `"sequential"` | No | Project execution mode |
**What it does:** **What it does:**
@@ -309,8 +310,8 @@ Agent + workspace initialization.
1. Creates a new agent or configures existing workspace 1. Creates a new agent or configures existing workspace
2. Optionally binds messaging channel (Telegram/WhatsApp) 2. Optionally binds messaging channel (Telegram/WhatsApp)
3. Optionally migrates channel binding from another agent 3. Optionally migrates channel binding from another agent
4. Writes workspace files: AGENTS.md, HEARTBEAT.md, `projects/projects.json` 4. Writes workspace files: AGENTS.md, HEARTBEAT.md, `devclaw/projects.json`, `devclaw/workflow.yaml`
5. Configures model tiers in `openclaw.json` 5. Scaffolds default prompt files for all roles
--- ---
@@ -328,34 +329,47 @@ Conversational onboarding guide. Returns step-by-step instructions for the agent
|---|---|---|---| |---|---|---|---|
| `mode` | `"first-run"` \| `"reconfigure"` | No | Auto-detected from current state | | `mode` | `"first-run"` \| `"reconfigure"` | No | Auto-detected from current state |
**Flow:** ---
1. Call `onboard` — returns QA-style step-by-step instructions ### `design_task`
2. Agent walks user through: agent selection, channel binding, model tiers
3. Agent calls `setup` with collected answers Spawn an architect for a design investigation. Creates a "To Design" issue and dispatches an architect worker.
4. User registers projects via `project_register` in group chats
**Source:** [`lib/tools/design-task.ts`](../lib/tools/design-task.ts)
**Parameters:**
| Parameter | Type | Required | Description |
|---|---|---|---|
| `projectGroupId` | string | Yes | Project group ID |
| `title` | string | Yes | Design task title |
| `description` | string | No | Design problem description |
| `level` | `"junior"` \| `"senior"` | No | Architect level. Default: `"junior"`. |
--- ---
## Completion Rules Reference ## Completion Rules Reference
The pipeline service (`lib/services/pipeline.ts`) defines declarative completion rules: The pipeline service (`lib/services/pipeline.ts`) derives completion rules from the workflow config:
``` ```
dev:done → Doing → To Test (git pull, detect PR) developer:done → Doing → To Test (git pull, detect PR)
dev:blocked → Doing → To Do (return to queue) developer:review → Doing In Review (detect PR, heartbeat polls for merge)
qa:pass → Testing → Done (close issue) developer:blocked → Doing → Refining (awaits human decision)
qa:fail → Testing → To Improve (reopen issue) tester:pass → Testing Done (close issue)
qa:refine → Testing → Refining (await human decision) tester:fail → Testing To Improve (reopen issue)
qa:blocked → Testing → To Test (return to QA queue) tester:refine → Testing → Refining (awaits human decision)
tester:blocked → Testing → Refining (awaits human decision)
architect:done → Designing → Planning (design complete)
architect:blocked → Designing → Refining (awaits human decision)
``` ```
## Issue Priority Order ## Issue Priority Order
When the heartbeat or `work_heartbeat` fills free worker slots, issues are prioritized: When the heartbeat or `work_heartbeat` fills free worker slots, issues are prioritized:
1. **To Improve**QA failures get fixed first (highest priority) 1. **To Improve**Tester failures get fixed first (highest priority)
2. **To Test** — Completed DEV work gets reviewed next 2. **To Test** — Completed developer work gets reviewed next
3. **To Do** — Fresh tasks are picked up last 3. **To Do** — Fresh tasks are picked up last
This ensures the pipeline clears its backlog before starting new work. This ensures the pipeline clears its backlog before starting new work.