refactor: rename QA role to Tester and update related documentation

- Updated role references from "QA" to "Tester" in workflow and code comments. - Revised documentation to reflect the new role structure, including role instructions and completion rules. - Enhanced the testing guide with clearer instructions and examples for unit and E2E tests. - Improved tools reference to align with the new role definitions and completion rules. - Adjusted the roadmap to highlight recent changes in role configuration and workflow state machine.
2026-02-16 13:55:38 +08:00
parent 371e760d94
commit f7aa47102f
8 changed files with 928 additions and 634 deletions
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -10,22 +10,22 @@ graph TB
        direction TB
        A_O["Orchestrator"]
        A_GL[GitHub/GitLab Issues]
-        A_DEV["DEV (worker session)"]
+        A_DEV["DEVELOPER (worker session)"]
-        A_QA["QA (worker session)"]
+        A_TST["TESTER (worker session)"]
        A_O -->|work_start| A_GL
        A_O -->|dispatches| A_DEV
-        A_O -->|dispatches| A_QA
+        A_O -->|dispatches| A_TST
    end
    subgraph "Group Chat B"
        direction TB
        B_O["Orchestrator"]
        B_GL[GitHub/GitLab Issues]
-        B_DEV["DEV (worker session)"]
+        B_DEV["DEVELOPER (worker session)"]
-        B_QA["QA (worker session)"]
+        B_TST["TESTER (worker session)"]
        B_O -->|work_start| B_GL
        B_O -->|dispatches| B_DEV
-        B_O -->|dispatches| B_QA
+        B_O -->|dispatches| B_TST
    end
    AGENT["Single OpenClaw Agent"]
@@ -33,7 +33,7 @@ graph TB
    AGENT --- B_O
 ```
-Worker sessions are expensive to start — each new spawn reads the full codebase (~50K tokens). DevClaw maintains **separate sessions per level per role** ([session-per-level design](#session-per-level-design)). When a medior dev finishes task A and picks up task B on the same project, the accumulated context carries over — no re-reading the repo. The plugin handles all session dispatch internally via OpenClaw CLI; the orchestrator agent never calls `sessions_spawn` or `sessions_send`.
+Worker sessions are expensive to start — each new spawn reads the full codebase (~50K tokens). DevClaw maintains **separate sessions per level per role** ([session-per-level design](#session-per-level-design)). When a medior developer finishes task A and picks up task B on the same project, the accumulated context carries over — no re-reading the repo. The plugin handles all session dispatch internally via OpenClaw CLI; the orchestrator agent never calls `sessions_spawn` or `sessions_send`.
 ```mermaid
 sequenceDiagram
@@ -42,7 +42,7 @@ sequenceDiagram
    participant IT as Issue Tracker
    participant S as Worker Session
-    O->>DC: work_start({ issueId: 42, role: "dev" })
+    O->>DC: work_start({ issueId: 42, role: "developer" })
    DC->>IT: Fetch issue, verify label
    DC->>DC: Assign level (junior/medior/senior)
    DC->>DC: Check existing session for assigned level
@@ -62,19 +62,20 @@ Understanding the OpenClaw model is key to understanding how DevClaw works:
 ### Session-per-level design
-Each project maintains **separate sessions per developer level per role**. A project's DEV might have a junior session, a medior session, and a senior session — each accumulating its own codebase context over time.
+Each project maintains **separate sessions per developer level per role**. A project's DEVELOPER might have a junior session, a medior session, and a senior session — each accumulating its own codebase context over time.
 ```
 Orchestrator Agent (configured in openclaw.json)
  └─ Main session (long-lived, handles all projects)
       │
       ├─ Project A
-       │    ├─ DEV sessions: { junior: <key>, medior: <key>, senior: null }
+       │    ├─ DEVELOPER sessions: { junior: <key>, medior: <key>, senior: null }
-       │    └─ QA sessions:  { reviewer: <key>, tester: null }
+       │    ├─ TESTER sessions:    { junior: null, medior: <key>, senior: null }
       │    └─ ARCHITECT sessions: { junior: <key>, senior: null }
       │
       └─ Project B
-            ├─ DEV sessions: { junior: null, medior: <key>, senior: null }
+            ├─ DEVELOPER sessions: { junior: null, medior: <key>, senior: null }
-            └─ QA sessions:  { reviewer: <key>, tester: null }
+            └─ TESTER sessions:    { junior: null, medior: <key>, senior: null }
 ```
 Why per-level instead of switching models on one session:
@@ -114,6 +115,18 @@ The agent's only job after `work_start` returns is to post the announcement to T
 DevClaw provides equivalent guardrails for everything except auto-reporting, which the heartbeat handles.
 ## Roles
 DevClaw ships with three built-in roles, defined in `lib/roles/registry.ts`. All roles use the same level scheme (junior/medior/senior) — levels describe task complexity, not the role.
 | Role | ID | Levels | Default Level | Completion Results |
 |---|---|---|---|---|
 | Developer | `developer` | junior, medior, senior | medior | done, review, blocked |
 | Tester | `tester` | junior, medior, senior | medior | pass, fail, refine, blocked |
 | Architect | `architect` | junior, senior | junior | done, blocked |
 Roles are extensible — add a new entry to `ROLE_REGISTRY` and corresponding workflow states to get a new role. The `workflow.yaml` config can also override levels, models, and emoji per role, or disable a role entirely (`architect: false`).
 ## System overview
 ```mermaid
@@ -127,10 +140,11 @@ graph TB
        MS[Main Session<br/>orchestrator agent]
        GW[Gateway RPC<br/>sessions.patch / sessions.list]
        CLI[openclaw gateway call agent]
-        DEV_J[DEV session<br/>junior]
+        DEV_J[DEVELOPER session<br/>junior]
-        DEV_M[DEV session<br/>medior]
+        DEV_M[DEVELOPER session<br/>medior]
-        DEV_S[DEV session<br/>senior]
+        DEV_S[DEVELOPER session<br/>senior]
-        QA_R[QA session<br/>reviewer]
+        TST_M[TESTER session<br/>medior]
        ARCH[ARCHITECT session<br/>junior]
    end
    subgraph "DevClaw Plugin"
@@ -196,12 +210,13 @@ graph TB
    CLI -->|sends task| DEV_J
    CLI -->|sends task| DEV_M
    CLI -->|sends task| DEV_S
-    CLI -->|sends task| QA_R
+    CLI -->|sends task| TST_M
    CLI -->|sends task| ARCH
-    DEV_J -->|writes code, creates MRs| REPO
+    DEV_J -->|writes code, creates PRs| REPO
-    DEV_M -->|writes code, creates MRs| REPO
+    DEV_M -->|writes code, creates PRs| REPO
-    DEV_S -->|writes code, creates MRs| REPO
+    DEV_S -->|writes code, creates PRs| REPO
-    QA_R -->|reviews code, tests| REPO
+    TST_M -->|reviews code, tests| REPO
 ```
 ## End-to-end flow: human to sub-agent
@@ -216,7 +231,7 @@ sequenceDiagram
    participant DC as DevClaw Plugin
    participant GW as Gateway RPC
    participant CLI as openclaw gateway call agent
-    participant DEV as DEV Session<br/>(medior)
+    participant DEV as DEVELOPER Session<br/>(medior)
    participant GL as Issue Tracker
    Note over H,GL: Issue exists in queue (To Do)
@@ -225,51 +240,51 @@ sequenceDiagram
    TG->>MS: delivers message
    MS->>DC: status()
    DC->>GL: list issues by label "To Do"
-    DC-->>MS: { toDo: [#42], dev: idle }
+    DC-->>MS: { toDo: [#42], developer: idle }
-    Note over MS: Decides to pick up #42 for DEV as medior
+    Note over MS: Decides to pick up #42 for DEVELOPER as medior
-    MS->>DC: work_start({ issueId: 42, role: "dev", level: "medior", ... })
+    MS->>DC: work_start({ issueId: 42, role: "developer", level: "medior", ... })
    DC->>DC: resolve level "medior" → model ID
-    DC->>DC: lookup dev.sessions.medior → null (first time)
+    DC->>DC: lookup developer.sessions.medior → null (first time)
    DC->>GL: transition label "To Do" → "Doing"
    DC->>GW: sessions.patch({ key: new-session-key, model: "anthropic/claude-sonnet-4-5" })
    DC->>CLI: openclaw gateway call agent --params { sessionKey, message }
    CLI->>DEV: creates session, delivers task
    DC->>DC: store session key in projects.json + append audit.log
-    DC-->>MS: { success: true, announcement: "🔧 Spawning DEV (medior) for #42" }
+    DC-->>MS: { success: true, announcement: "🔧 Spawning DEVELOPER (medior) for #42" }
-    MS->>TG: "🔧 Spawning DEV (medior) for #42: Add login page"
+    MS->>TG: "🔧 Spawning DEVELOPER (medior) for #42: Add login page"
    TG->>H: sees announcement
-    Note over DEV: Works autonomously — reads code, writes code, creates MR
+    Note over DEV: Works autonomously — reads code, writes code, creates PR
    Note over DEV: Calls work_finish when done
-    DEV->>DC: work_finish({ role: "dev", result: "done", ... })
+    DEV->>DC: work_finish({ role: "developer", result: "done", ... })
    DC->>GL: transition label "Doing" → "To Test"
    DC->>DC: deactivate worker (sessions preserved)
-    DC-->>DEV: { announcement: "✅ DEV DONE #42" }
+    DC-->>DEV: { announcement: "✅ DEVELOPER DONE #42" }
-    MS->>TG: "✅ DEV DONE #42 — moved to QA queue"
+    MS->>TG: "✅ DEVELOPER DONE #42 — moved to TESTER queue"
    TG->>H: sees announcement
 ```
-On the **next DEV task** for this project that also assigns medior:
+On the **next DEVELOPER task** for this project that also assigns medior:
 ```mermaid
 sequenceDiagram
    participant MS as Main Session
    participant DC as DevClaw Plugin
    participant CLI as openclaw gateway call agent
-    participant DEV as DEV Session<br/>(medior, existing)
+    participant DEV as DEVELOPER Session<br/>(medior, existing)
-    MS->>DC: work_start({ issueId: 57, role: "dev", level: "medior", ... })
+    MS->>DC: work_start({ issueId: 57, role: "developer", level: "medior", ... })
    DC->>DC: resolve level "medior" → model ID
-    DC->>DC: lookup dev.sessions.medior → existing key!
+    DC->>DC: lookup developer.sessions.medior → existing key!
    Note over DC: No sessions.patch needed — session already exists
    DC->>CLI: openclaw gateway call agent --params { sessionKey, message }
    CLI->>DEV: delivers task to existing session (has full codebase context)
-    DC-->>MS: { success: true, announcement: "⚡ Sending DEV (medior) for #57" }
+    DC-->>MS: { success: true, announcement: "⚡ Sending DEVELOPER (medior) for #57" }
 ```
 Session reuse saves ~50K tokens per task by not re-reading the codebase.
@@ -304,7 +319,7 @@ sequenceDiagram
    A->>QS: status({ projectGroupId: "-123" })
    QS->>PJ: readProjects()
-    PJ-->>QS: { dev: idle, qa: idle }
+    PJ-->>QS: { developer: idle, tester: idle }
    QS->>GL: list issues by label "To Do"
    GL-->>QS: [{ id: 42, title: "Add login page" }]
    QS->>GL: list issues by label "To Test"
@@ -312,12 +327,12 @@ sequenceDiagram
    QS->>GL: list issues by label "To Improve"
    GL-->>QS: []
    QS->>AL: append { event: "status", ... }
-    QS-->>A: { dev: idle, queue: { toDo: [#42] } }
+    QS-->>A: { developer: idle, queue: { toDo: [#42] } }
 ```
-**Orchestrator decides:** DEV is idle, issue #42 is in To Do → pick it up. Evaluates complexity → assigns medior level.
+**Orchestrator decides:** DEVELOPER is idle, issue #42 is in To Do → pick it up. Evaluates complexity → assigns medior level.
-### Phase 3: DEV pickup
+### Phase 3: DEVELOPER pickup
 The plugin handles everything end-to-end — level resolution, session lookup, label transition, state update, **and** task dispatch to the worker session. The agent's only job after is to post the announcement.
@@ -332,13 +347,13 @@ sequenceDiagram
    participant PJ as projects.json
    participant AL as audit.log
-    A->>WS: work_start({ issueId: 42, role: "dev", projectGroupId: "-123", level: "medior" })
+    A->>WS: work_start({ issueId: 42, role: "developer", projectGroupId: "-123", level: "medior" })
    WS->>PJ: readProjects()
    WS->>GL: getIssue(42)
    GL-->>WS: { title: "Add login page", labels: ["To Do"] }
    WS->>WS: Verify label is "To Do"
    WS->>TIER: resolve "medior" → "anthropic/claude-sonnet-4-5"
-    WS->>PJ: lookup dev.sessions.medior
+    WS->>PJ: lookup developer.sessions.medior
    WS->>GL: transitionLabel(42, "To Do", "Doing")
    alt New session
        WS->>GW: sessions.patch({ key: new-key, model: "anthropic/claude-sonnet-4-5" })
@@ -351,98 +366,116 @@ sequenceDiagram
 **Writes:**
 - `Issue Tracker`: label "To Do" → "Doing"
- `projects.json`: dev.active=true, dev.issueId="42", dev.level="medior", dev.sessions.medior=key
+- `projects.json`: workers.developer.active=true, issueId="42", level="medior", sessions.medior=key
 - `audit.log`: 2 entries (work_start, model_selection)
 - `Session`: task message delivered to worker session via CLI
-### Phase 4: DEV works
+### Phase 4: DEVELOPER works
 ```
-DEV sub-agent session → reads codebase, writes code, creates MR
+DEVELOPER sub-agent session → reads codebase, writes code, creates PR
-DEV sub-agent session → calls work_finish({ role: "dev", result: "done", ... })
+DEVELOPER sub-agent session → calls work_finish({ role: "developer", result: "done", ... })
 ```
 This happens inside the OpenClaw session. The worker calls `work_finish` directly for atomic state updates. If the worker discovers unrelated bugs, it calls `task_create` to file them.
-### Phase 5: DEV complete (worker self-reports)
+### Phase 5: DEVELOPER complete (worker self-reports)
 ```mermaid
 sequenceDiagram
-    participant DEV as DEV Session
+    participant DEV as DEVELOPER Session
    participant WF as work_finish
    participant GL as Issue Tracker
    participant PJ as projects.json
    participant AL as audit.log
    participant REPO as Git Repo
    participant QA as QA Session
-    DEV->>WF: work_finish({ role: "dev", result: "done", projectGroupId: "-123", summary: "Login page with OAuth" })
+    DEV->>WF: work_finish({ role: "developer", result: "done", projectGroupId: "-123", summary: "Login page with OAuth" })
    WF->>PJ: readProjects()
-    PJ-->>WF: { dev: { active: true, issueId: "42" } }
+    PJ-->>WF: { developer: { active: true, issueId: "42" } }
    WF->>REPO: git pull
-    WF->>PJ: deactivateWorker(-123, dev)
+    WF->>PJ: deactivateWorker(-123, developer)
    Note over PJ: active→false, issueId→null<br/>sessions map PRESERVED
    WF->>GL: transitionLabel "Doing" → "To Test"
-    WF->>AL: append { event: "work_finish", role: "dev", result: "done" }
+    WF->>AL: append { event: "work_finish", role: "developer", result: "done" }
    WF->>WF: tick queue (fill free slots)
-    Note over WF: Scheduler sees "To Test" issue, QA slot free → dispatches QA
+    Note over WF: Scheduler sees "To Test" issue, TESTER slot free → dispatches TESTER
-    WF-->>DEV: { announcement: "✅ DEV DONE #42", tickPickups: [...] }
+    WF-->>DEV: { announcement: "✅ DEVELOPER DONE #42", tickPickups: [...] }
 ```
 **Writes:**
- `Git repo`: pulled latest (has DEV's merged code)
+- `Git repo`: pulled latest (has DEVELOPER's merged code)
- `projects.json`: dev.active=false, dev.issueId=null (sessions map preserved for reuse)
+- `projects.json`: workers.developer.active=false, issueId=null (sessions map preserved for reuse)
 - `Issue Tracker`: label "Doing" → "To Test"
 - `audit.log`: 1 entry (work_finish) + tick entries if workers dispatched
-### Phase 6: QA pickup
+### Phase 5b: DEVELOPER requests review (alternative path)
-Same as Phase 3, but with `role: "qa"`. Label transitions "To Test" → "Testing". Uses the reviewer level.
+Instead of merging the PR themselves, a developer can leave it open for human review:
 ### Phase 7: QA result (4 possible outcomes)
 #### 7a. QA Pass
 ```mermaid
 sequenceDiagram
-    participant QA as QA Session
+    participant DEV as DEVELOPER Session
    participant WF as work_finish
    participant GL as Issue Tracker
    participant PJ as projects.json
    DEV->>WF: work_finish({ role: "developer", result: "review", ... })
    WF->>GL: transitionLabel "Doing" → "In Review"
    WF->>PJ: deactivateWorker (sessions preserved)
    WF-->>DEV: { announcement: "👀 DEVELOPER REVIEW #42" }
 ```
 The issue sits in "In Review" until the heartbeat's **review pass** detects the PR has been merged, then automatically transitions to "To Test".
 ### Phase 6: TESTER pickup
 Same as Phase 3, but with `role: "tester"`. Label transitions "To Test" → "Testing". Level selection determines which tester session is used.
 ### Phase 7: TESTER result (4 possible outcomes)
 #### 7a. TESTER Pass
 ```mermaid
 sequenceDiagram
    participant TST as TESTER Session
    participant WF as work_finish
    participant GL as Issue Tracker
    participant PJ as projects.json
    participant AL as audit.log
-    QA->>WF: work_finish({ role: "qa", result: "pass", projectGroupId: "-123" })
+    TST->>WF: work_finish({ role: "tester", result: "pass", projectGroupId: "-123" })
-    WF->>PJ: deactivateWorker(-123, qa)
+    WF->>PJ: deactivateWorker(-123, tester)
    WF->>GL: transitionLabel(42, "Testing", "Done")
    WF->>GL: closeIssue(42)
-    WF->>AL: append { event: "work_finish", role: "qa", result: "pass" }
+    WF->>AL: append { event: "work_finish", role: "tester", result: "pass" }
-    WF-->>QA: { announcement: "🎉 QA PASS #42. Issue closed." }
+    WF-->>TST: { announcement: "🎉 TESTER PASS #42. Issue closed." }
 ```
 **Ticket complete.** Issue closed, label "Done".
-#### 7b. QA Fail
+#### 7b. TESTER Fail
 ```mermaid
 sequenceDiagram
-    participant QA as QA Session
+    participant TST as TESTER Session
    participant WF as work_finish
    participant GL as Issue Tracker
    participant PJ as projects.json
    participant AL as audit.log
-    QA->>WF: work_finish({ role: "qa", result: "fail", projectGroupId: "-123", summary: "OAuth redirect broken" })
+    TST->>WF: work_finish({ role: "tester", result: "fail", projectGroupId: "-123", summary: "OAuth redirect broken" })
-    WF->>PJ: deactivateWorker(-123, qa)
+    WF->>PJ: deactivateWorker(-123, tester)
    WF->>GL: transitionLabel(42, "Testing", "To Improve")
    WF->>GL: reopenIssue(42)
-    WF->>AL: append { event: "work_finish", role: "qa", result: "fail" }
+    WF->>AL: append { event: "work_finish", role: "tester", result: "fail" }
-    WF-->>QA: { announcement: "❌ QA FAIL #42 — OAuth redirect broken. Sent back to DEV." }
+    WF-->>TST: { announcement: "❌ TESTER FAIL #42 — OAuth redirect broken. Sent back to DEVELOPER." }
 ```
-**Cycle restarts:** Issue goes to "To Improve". Next heartbeat, DEV picks it up again (Phase 3, but from "To Improve" instead of "To Do").
+**Cycle restarts:** Issue goes to "To Improve". Next heartbeat, DEVELOPER picks it up again (Phase 3, but from "To Improve" instead of "To Do").
-#### 7c. QA Refine
+#### 7c. TESTER Refine
 ```
 Label: "Testing" → "Refining"
@@ -450,14 +483,14 @@ Label: "Testing" → "Refining"
 Issue needs human decision. Pipeline pauses until human moves it to "To Do" or closes it.
-#### 7d. Blocked (DEV or QA)
+#### 7d. Blocked (DEVELOPER or TESTER)
 ```
-DEV Blocked: "Doing" → "To Do"
+DEVELOPER Blocked: "Doing" → "Refining"
-QA Blocked:  "Testing" → "To Test"
+TESTER Blocked:    "Testing" → "Refining"
 ```
-Worker cannot complete (missing info, environment errors, etc.). Issue returns to queue for retry. The task is available for the next heartbeat pickup.
+Worker cannot complete (missing info, environment errors, etc.). Issue enters hold state for human decision. The human can move it back to "To Do" to retry or take other action.
 ### Completion enforcement
@@ -465,18 +498,19 @@ Three layers guarantee that `work_finish` always runs:
 1. **Completion contract** — Every task message sent to a worker session includes a mandatory `## MANDATORY: Task Completion` section listing available results and requiring `work_finish` even on failure. Workers are instructed to use `"blocked"` if stuck.
-2. **Blocked result** — Both DEV and QA can use `"blocked"` to gracefully return a task to queue without losing work. DEV blocked: `Doing → To Do`. QA blocked: `Testing → To Test`. This gives workers an escape hatch instead of silently dying.
+2. **Blocked result** — All roles can use `"blocked"` to gracefully hand off to a human. Developer blocked: `Doing → Refining`. Tester blocked: `Testing → Refining`. This gives workers an escape hatch instead of silently dying.
 3. **Stale worker watchdog** — The heartbeat's health check detects workers active for >2 hours. With `fix=true`, it deactivates the worker and reverts the label back to queue. This catches sessions that crashed, ran out of context, or otherwise failed without calling `work_finish`. The `health` tool provides the same check for manual invocation.
 ### Phase 8: Heartbeat (continuous)
-The heartbeat runs periodically (via background service or manual `work_heartbeat` trigger). It combines health check + queue scan:
+The heartbeat runs periodically (via background service or manual `work_heartbeat` trigger). It combines health check + review polling + queue scan:
 ```mermaid
 sequenceDiagram
    participant HB as Heartbeat Service
    participant SH as health check
    participant RV as review pass
    participant TK as projectTick
    participant WS as work_start (dispatch)
    Note over HB: Tick triggered (every 60s)
@@ -485,6 +519,10 @@ sequenceDiagram
    Note over SH: Checks for zombies, stale workers
    SH-->>HB: { fixes applied }
    HB->>RV: reviewPass per project
    Note over RV: Polls PR status for "In Review" issues
    RV-->>HB: { transitions made }
    HB->>TK: projectTick per project
    Note over TK: Scans queue: To Improve > To Test > To Do
    TK->>WS: dispatchTask (fill free slots)
@@ -492,6 +530,31 @@ sequenceDiagram
    TK-->>HB: { pickups, skipped }
 ```
 ## Worker instructions (bootstrap hook)
 Role-specific instructions (coding standards, deployment steps, completion rules) are injected into worker sessions via the `agent:bootstrap` hook — not appended to the task message.
 ```mermaid
 sequenceDiagram
    participant GW as Gateway
    participant BH as Bootstrap Hook
    participant FS as Filesystem
    Note over GW: Worker session starts
    GW->>BH: agent:bootstrap event (sessionKey, bootstrapFiles[])
    BH->>BH: Parse session key → { projectName, role }
    BH->>FS: Load role instructions (project-specific → default)
    FS-->>BH: content + source path
    BH->>BH: Push WORKER_INSTRUCTIONS.md into bootstrapFiles
    BH-->>GW: bootstrapFiles now includes role instructions
 ```
 **Resolution order:**
 1. `devclaw/projects/<project>/prompts/<role>.md` (project-specific)
 2. `devclaw/prompts/<role>.md` (workspace default)
 The source path is logged for production traceability: `Bootstrap hook: injected developer instructions for project "my-app" from /path/to/prompts/developer.md`.
 ## Data flow map
 Every piece of data and where it lives:
@@ -503,15 +566,16 @@ Every piece of data and where it lives:
 │  Issue #42: "Add login page"                                    │
 │  Labels: [Planning | To Do | Doing | To Test | Testing | ...]   │
 │  State: open / closed                                           │
-│  MRs/PRs: linked merge/pull requests                            │
+│  PRs: linked pull/merge requests (status polled for In Review)  │
 │  Created by: orchestrator (task_create), workers, or humans     │
 └─────────────────────────────────────────────────────────────────┘
        ↕ gh/glab CLI (read/write, auto-detected)
        ↕ cockatiel resilience: retry + circuit breaker
 ┌─────────────────────────────────────────────────────────────────┐
 │ DevClaw Plugin (orchestration logic)                            │
 │                                                                 │
 │  setup          → agent creation + workspace + model config     │
-│  work_start     → level + label + dispatch + role instr (e2e)   │
+│  work_start     → level + label + dispatch (e2e)                │
 │  work_finish    → label + state + git pull + tick queue          │
 │  task_create    → create issue in tracker                       │
 │  task_update    → manual label state change                     │
@@ -519,27 +583,38 @@ Every piece of data and where it lives:
 │  status         → read labels + read state                      │
 │  health         → check sessions + fix zombies                  │
 │  project_register → labels + prompts + state init (one-time)    │
 │  design_task    → architect dispatch                            │
 │                                                                 │
 │  Bootstrap hook → injects role instructions into worker sessions│
 │  Review pass    → polls PR status, auto-advances In Review      │
 │  Config loader  → three-layer merge + Zod validation            │
 └─────────────────────────────────────────────────────────────────┘
        ↕ atomic file I/O          ↕ OpenClaw CLI (plugin shells out)
 ┌────────────────────────────────┐ ┌──────────────────────────────┐
-│ projects/projects.json         │ │ OpenClaw Gateway + CLI       │
+│ devclaw/projects.json          │ │ OpenClaw Gateway + CLI       │
 │                                │ │ (called by plugin, not agent)│
 │  Per project:                  │ │                              │
-│    dev:                        │ │  openclaw gateway call       │
+│    workers:                    │ │  openclaw gateway call       │
-│      active, issueId, level    │ │    sessions.patch → create   │
+│      developer:                │ │    sessions.patch → create   │
-│      sessions:                 │ │    sessions.list  → health   │
+│        active, issueId, level  │ │    sessions.list  → health   │
-│        junior: <key>           │ │    sessions.delete → cleanup │
+│        sessions:               │ │    sessions.delete → cleanup │
-│        medior: <key>           │ │                              │
+│          junior: <key>         │ │                              │
-│        senior: <key>           │ │  openclaw gateway call agent │
+│          medior: <key>         │ │  openclaw gateway call agent │
-│    qa:                         │ │    --params { sessionKey,    │
+│          senior: <key>         │ │    --params { sessionKey,    │
-│      active, issueId, level    │ │      message, agentId }      │
+│      tester:                   │ │      message, agentId }      │
-│      sessions:                 │ │    → dispatches to session   │
+│        active, issueId, level  │ │    → dispatches to session   │
-│        reviewer: <key>         │ │                              │
+│        sessions:               │ │                              │
-│        tester: <key>           │ │                              │
+│          junior: <key>         │ │                              │
 │          medior: <key>         │ │                              │
 │          senior: <key>         │ │                              │
 │      architect:                │ │                              │
 │        sessions:               │ │                              │
 │          junior: <key>         │ │                              │
 │          senior: <key>         │ │                              │
 └────────────────────────────────┘ └──────────────────────────────┘
        ↕ append-only
 ┌─────────────────────────────────────────────────────────────────┐
-│ log/audit.log (observability)                                   │
+│ devclaw/log/audit.log (observability)                           │
 │                                                                 │
 │  NDJSON, one line per event:                                    │
 │  work_start, work_finish, model_selection,                      │
@@ -553,21 +628,23 @@ Every piece of data and where it lives:
 │ Telegram / WhatsApp (user-facing messages)                      │
 │                                                                 │
 │  Per group chat:                                                │
-│    "🔧 Spawning DEV (medior) for #42: Add login page"          │
+│    "🔧 Spawning DEVELOPER (medior) for #42: Add login page"    │
-│    "⚡ Sending DEV (medior) for #57: Fix validation"            │
+│    "⚡ Sending DEVELOPER (medior) for #57: Fix validation"      │
-│    "✅ DEV DONE #42 — Login page with OAuth."                   │
+│    "✅ DEVELOPER DONE #42 — Login page with OAuth."             │
-│    "🎉 QA PASS #42. Issue closed."                              │
+│    "👀 DEVELOPER REVIEW #42 — PR open for review."              │
-│    "❌ QA FAIL #42 — OAuth redirect broken."                    │
+│    "🎉 TESTER PASS #42. Issue closed."                          │
-│    "🚫 DEV BLOCKED #42 — Missing dependencies."                │
+│    "❌ TESTER FAIL #42 — OAuth redirect broken."                │
-│    "🚫 QA BLOCKED #42 — Env not available."                    │
+│    "🚫 DEVELOPER BLOCKED #42 — Missing dependencies."          │
 │    "🚫 TESTER BLOCKED #42 — Env not available."                │
 └─────────────────────────────────────────────────────────────────┘
 ┌─────────────────────────────────────────────────────────────────┐
 │ Git Repository (codebase)                                       │
 │                                                                 │
-│  DEV sub-agent sessions: read code, write code, create MRs      │
+│  DEVELOPER sub-agent sessions: read code, write code, create PRs│
-│  QA sub-agent sessions: read code, run tests, review MRs        │
+│  TESTER sub-agent sessions: read code, run tests, review PRs    │
-│  work_finish (DEV done): git pull to sync latest                │
+│  ARCHITECT sub-agent sessions: research, design, recommend      │
 │  work_finish (developer done): git pull to sync latest          │
 └─────────────────────────────────────────────────────────────────┘
 ```
@@ -584,9 +661,12 @@ graph LR
        SETUP[Agent + workspace setup]
        SD[Session dispatch<br/>create + send via CLI]
        AC[Scheduling<br/>tick queue after work_finish]
-        RI[Role instructions<br/>loaded per project]
+        RI[Role instructions<br/>injected via bootstrap hook]
        RV[Review polling<br/>PR status → auto-advance]
        A[Audit logging]
        Z[Zombie cleanup]
        CFG[Config validation<br/>Zod + integrity checks]
        RES[Provider resilience<br/>retry + circuit breaker]
    end
    subgraph "Orchestrator handles (planning only)"
@@ -600,7 +680,7 @@ graph LR
    subgraph "Sub-agent sessions handle"
        CR[Code writing]
-        MR[MR creation/review]
+        MR[PR creation/review]
        WF_W[Task completion<br/>via work_finish]
        BUG[Bug filing<br/>via task_create]
    end
@@ -611,7 +691,7 @@ graph LR
    end
 ```
-**Key boundary:** The orchestrator is a planner and dispatcher — it never writes code. All implementation work (code edits, git operations, tests) must go through sub-agent sessions via the `task_create` → `work_start` pipeline. This ensures audit trails, tier selection, and QA review for every code change.
+**Key boundary:** The orchestrator is a planner and dispatcher — it never writes code. All implementation work (code edits, git operations, tests) must go through sub-agent sessions via the `task_create` → `work_start` pipeline. This ensures audit trails, level selection, and testing for every code change.
 ## IssueProvider abstraction
@@ -624,10 +704,13 @@ All issue tracker operations go through the `IssueProvider` interface, defined i
 - `transitionLabel` — atomic label state transition (unlabel + label)
 - `closeIssue` / `reopenIssue` — issue lifecycle
 - `hasStateLabel` / `getCurrentStateLabel` — label inspection
 - `getPrStatus` — get PR/MR state (open, merged, approved, none)
 - `hasMergedMR` / `getMergedMRUrl` — MR/PR verification
 - `addComment` — add comment to issue
 - `healthCheck` — verify provider connectivity
 **Provider resilience:** All provider calls are wrapped with cockatiel retry (3 attempts, exponential backoff) + circuit breaker (opens after 5 consecutive failures, half-opens after 30s). See `lib/providers/resilience.ts`.
 **Current providers:**
 - **GitHub** (`lib/providers/github.ts`) — wraps `gh` CLI
 - **GitLab** (`lib/providers/gitlab.ts`) — wraps `glab` CLI
@@ -637,19 +720,34 @@ All issue tracker operations go through the `IssueProvider` interface, defined i
 Provider selection is handled by `createProvider()` in `lib/providers/index.ts`. Auto-detects GitHub vs GitLab from the git remote URL.
 ## Configuration system
 DevClaw uses a three-layer config system with `workflow.yaml` files:
 ```
 Layer 1: Built-in defaults (ROLE_REGISTRY + DEFAULT_WORKFLOW)
 Layer 2: Workspace:  <workspace>/devclaw/workflow.yaml
 Layer 3: Project:    <workspace>/devclaw/projects/<project>/workflow.yaml
 ```
 Each layer can override roles (levels, models, emoji), workflow states/transitions, and timeouts. Config is validated with Zod schemas at load time, with cross-reference integrity checks (transition targets exist, queue states have roles, terminal states have no outgoing transitions).
 See [CONFIGURATION.md](CONFIGURATION.md) for the full reference.
 ## Error recovery
 | Failure | Detection | Recovery |
 |---|---|---|
 | Session dies mid-task | `health` checks via `sessions.list` Gateway RPC | `fix=true`: reverts label, clears active state. Next heartbeat picks up task again (creates fresh session for that level). |
-| gh/glab command fails | Plugin tool throws error, returns to agent | Agent retries or reports to Telegram group |
+| gh/glab command fails | Cockatiel retry (3 attempts), then circuit breaker | Circuit opens after 5 consecutive failures, prevents hammering. Plugin catches and returns error. |
 | `openclaw gateway call agent` fails | Plugin catches error during dispatch | Plugin rolls back: reverts label, clears active state. Returns error. No orphaned state. |
 | `sessions.patch` fails | Plugin catches error during session creation | Plugin rolls back label transition. Returns error. |
-| projects.json corrupted | Tool can't parse JSON | Manual fix needed. Atomic writes (temp+rename) prevent partial writes. |
+| projects.json corrupted | Tool can't parse JSON | Manual fix needed. Atomic writes (temp+rename) prevent partial writes. File locking prevents concurrent races. |
 | Label out of sync | `work_start` verifies label before transitioning | Throws error if label doesn't match expected state. |
-| Worker already active | `work_start` checks `active` flag | Throws error: "DEV already active on project". Must complete current task first. |
+| Worker already active | `work_start` checks `active` flag | Throws error: "DEVELOPER already active on project". Must complete current task first. |
 | Stale worker (>2h) | `health` and heartbeat health check | `fix=true`: deactivates worker, reverts label to queue. Task available for next pickup. |
-| Worker stuck/blocked | Worker calls `work_finish` with `"blocked"` | Deactivates worker, reverts label to queue. Issue available for retry. |
+| Worker stuck/blocked | Worker calls `work_finish` with `"blocked"` | Deactivates worker, transitions to "Refining" (hold state). Requires human decision to proceed. |
 | Config invalid | Zod schema validation at load time | Clear error message with field path. Prevents startup with broken config. |
 | `project_register` fails | Plugin catches error during label creation or state write | Clean error returned. Labels are idempotent, projects.json not written until all labels succeed. |
 ## File locations
@@ -659,8 +757,11 @@ Provider selection is handled by `createProvider()` in `lib/providers/index.ts`.
 | Plugin source | `~/.openclaw/extensions/devclaw/` | Plugin code |
 | Plugin manifest | `~/.openclaw/extensions/devclaw/openclaw.plugin.json` | Plugin registration |
 | Agent config | `~/.openclaw/openclaw.json` | Agent definition + tool permissions + model config |
-| Worker state | `~/.openclaw/workspace-<agent>/projects/projects.json` | Per-project DEV/QA state |
+| Worker state | `<workspace>/devclaw/projects.json` | Per-project worker state |
-| Role instructions | `~/.openclaw/workspace-<agent>/projects/roles/<project>/` | Per-project `dev.md` and `qa.md` |
+| Workflow config (workspace) | `<workspace>/devclaw/workflow.yaml` | Workspace-level role/workflow overrides |
-| Audit log | `~/.openclaw/workspace-<agent>/log/audit.log` | NDJSON event log |
+| Workflow config (project) | `<workspace>/devclaw/projects/<project>/workflow.yaml` | Project-specific overrides |
 | Default role instructions | `<workspace>/devclaw/prompts/<role>.md` | Default `developer.md`, `tester.md`, `architect.md` |
 | Project role instructions | `<workspace>/devclaw/projects/<project>/prompts/<role>.md` | Per-project role instruction overrides |
 | Audit log | `<workspace>/devclaw/log/audit.log` | NDJSON event log |
 | Session transcripts | `~/.openclaw/agents/<agent>/sessions/<uuid>.jsonl` | Conversation history per session |
 | Git repos | `~/git/<project>/` | Project source code |
--- a/docs/CONFIGURATION.md
+++ b/docs/CONFIGURATION.md
@@ -1,54 +1,236 @@
 # DevClaw — Configuration Reference
-All DevClaw configuration lives in two places: `openclaw.json` (plugin-level settings) and `projects.json` (per-project state).
+DevClaw uses a three-layer configuration system. All role, workflow, and timeout settings live in `workflow.yaml` files — not in `openclaw.json`.
-## Plugin Configuration (`openclaw.json`)
+## Three-Layer Config Resolution
-DevClaw is configured under `plugins.entries.devclaw.config` in `openclaw.json`.
+```
-
+Layer 1: Built-in defaults (ROLE_REGISTRY + DEFAULT_WORKFLOW)
-### Model Tiers
+Layer 2: Workspace:  <workspace>/devclaw/workflow.yaml
-
+Layer 3: Project:    <workspace>/devclaw/projects/<project>/workflow.yaml
 Override which LLM model powers each developer level:
 ```json
 {
  "plugins": {
    "entries": {
      "devclaw": {
        "config": {
          "models": {
            "dev": {
              "junior": "anthropic/claude-haiku-4-5",
              "medior": "anthropic/claude-sonnet-4-5",
              "senior": "anthropic/claude-opus-4-5"
            },
            "qa": {
              "reviewer": "anthropic/claude-sonnet-4-5",
              "tester": "anthropic/claude-haiku-4-5"
            }
          }
        }
      }
    }
  }
 }
 ```
-**Resolution order** (per `lib/tiers.ts:resolveModel`):
+Each layer can partially override the one below it. Only the fields you specify are merged — everything else inherits from the layer below.
-1. Plugin config `models.<role>.<level>` — explicit override
+**Source:** [`lib/config/loader.ts`](../lib/config/loader.ts)
-2. `DEFAULT_MODELS[role][level]` — built-in defaults (table below)
+
-3. Passthrough — treat the level string as a raw model ID
+**Validation:** Config is validated at load time with Zod schemas ([`lib/config/schema.ts`](../lib/config/schema.ts)). Integrity checks verify transition targets exist, queue states have roles, and terminal states have no outgoing transitions.
 ---
 ## Workflow Config (`workflow.yaml`)
 The `workflow.yaml` file configures roles, workflow states, and timeouts. Place it at `<workspace>/devclaw/workflow.yaml` for workspace-wide settings, or at `<workspace>/devclaw/projects/<project>/workflow.yaml` for project-specific overrides.
 ### Role Configuration
 Override which LLM model powers each level, customize levels, or disable roles entirely:
 ```yaml
 roles:
  developer:
    models:
      junior: anthropic/claude-haiku-4-5
      medior: anthropic/claude-sonnet-4-5
      senior: anthropic/claude-opus-4-6
  tester:
    models:
      junior: anthropic/claude-haiku-4-5
      medior: anthropic/claude-sonnet-4-5
      senior: anthropic/claude-opus-4-6
  architect:
    models:
      junior: anthropic/claude-sonnet-4-5
      senior: anthropic/claude-opus-4-6
  # Disable a role entirely:
  # architect: false
 ```
 **Role override fields** (all optional — only override what you need):
 | Field | Type | Description |
 |---|---|---|
 | `levels` | string[] | Available levels for this role |
 | `defaultLevel` | string | Default level when not specified |
 | `models` | Record<string, string> | Model ID per level |
 | `emoji` | Record<string, string> | Emoji per level for announcements |
 | `completionResults` | string[] | Valid completion results |
 **Default models:**
-| Role | Level | Default model |
+| Role | Level | Default Model |
 |---|---|---|
-| dev | junior | `anthropic/claude-haiku-4-5` |
+| developer | junior | `anthropic/claude-haiku-4-5` |
-| dev | medior | `anthropic/claude-sonnet-4-5` |
+| developer | medior | `anthropic/claude-sonnet-4-5` |
-| dev | senior | `anthropic/claude-opus-4-5` |
+| developer | senior | `anthropic/claude-opus-4-6` |
-| qa | reviewer | `anthropic/claude-sonnet-4-5` |
+| tester | junior | `anthropic/claude-haiku-4-5` |
-| qa | tester | `anthropic/claude-haiku-4-5` |
+| tester | medior | `anthropic/claude-sonnet-4-5` |
 | tester | senior | `anthropic/claude-opus-4-6` |
 | architect | junior | `anthropic/claude-sonnet-4-5` |
 | architect | senior | `anthropic/claude-opus-4-6` |
 **Source:** [`lib/roles/registry.ts`](../lib/roles/registry.ts)
 **Model resolution order:**
 1. Project `workflow.yaml` → `roles.<role>.models.<level>`
 2. Workspace `workflow.yaml` → `roles.<role>.models.<level>`
 3. Built-in defaults from `ROLE_REGISTRY`
 4. Passthrough — treat the level string as a raw model ID
 ### Workflow States
 The workflow section defines the state machine for issue lifecycle. Each state has a type, label, color, and optional transitions:
 ```yaml
 workflow:
  initial: planning
  states:
    planning:
      type: hold
      label: Planning
      color: "#95a5a6"
      on:
        APPROVE: todo
    todo:
      type: queue
      role: developer
      label: To Do
      color: "#428bca"
      priority: 1
      on:
        PICKUP: doing
    doing:
      type: active
      role: developer
      label: Doing
      color: "#f0ad4e"
      on:
        COMPLETE:
          target: toTest
          actions: [gitPull, detectPr]
        REVIEW:
          target: reviewing
          actions: [detectPr]
        BLOCKED: refining
    toTest:
      type: queue
      role: tester
      label: To Test
      color: "#5bc0de"
      priority: 2
      on:
        PICKUP: testing
    testing:
      type: active
      role: tester
      label: Testing
      color: "#9b59b6"
      on:
        PASS:
          target: done
          actions: [closeIssue]
        FAIL:
          target: toImprove
          actions: [reopenIssue]
        REFINE: refining
        BLOCKED: refining
    toImprove:
      type: queue
      role: developer
      label: To Improve
      color: "#d9534f"
      priority: 3
      on:
        PICKUP: doing
    refining:
      type: hold
      label: Refining
      color: "#f39c12"
      on:
        APPROVE: todo
    reviewing:
      type: review
      label: In Review
      color: "#c5def5"
      check: prMerged
      on:
        APPROVED:
          target: toTest
          actions: [gitPull]
        BLOCKED: refining
    done:
      type: terminal
      label: Done
      color: "#5cb85c"
    toDesign:
      type: queue
      role: architect
      label: To Design
      color: "#0075ca"
      priority: 1
      on:
        PICKUP: designing
    designing:
      type: active
      role: architect
      label: Designing
      color: "#d4c5f9"
      on:
        COMPLETE: planning
        BLOCKED: refining
 ```
 **State types:**
 | Type | Description |
 |---|---|
 | `queue` | Waiting for pickup. Must have a `role`. Has `priority` for ordering. |
 | `active` | Worker is currently working on it. Must have a `role`. |
 | `hold` | Paused, awaiting human decision. |
 | `review` | Awaiting external check (PR merged/approved). Has `check` field. |
 | `terminal` | Completed. No outgoing transitions. |
 **Built-in actions:**
 | Action | Description |
 |---|---|
 | `gitPull` | Pull latest from the base branch |
 | `detectPr` | Auto-detect PR URL from the issue |
 | `closeIssue` | Close the issue |
 | `reopenIssue` | Reopen the issue |
 **Review checks:**
 | Check | Description |
 |---|---|
 | `prMerged` | Transition when the issue's PR is merged |
 | `prApproved` | Transition when the issue's PR is approved or merged |
 ### Timeouts
 ```yaml
 timeouts:
  gitPullMs: 30000
  gatewayMs: 120000
  sessionPatchMs: 120000
  dispatchMs: 120000
  staleWorkerHours: 2
 ```
 | Setting | Default | Description |
 |---|---|---|
 | `gitPullMs` | 30000 | Timeout for git pull operations |
 | `gatewayMs` | 120000 | Timeout for gateway RPC calls |
 | `sessionPatchMs` | 120000 | Timeout for session creation |
 | `dispatchMs` | 120000 | Timeout for task dispatch |
 | `staleWorkerHours` | 2 | Hours before a worker is considered stale |
 ---
 ## Plugin Configuration (`openclaw.json`)
 Some settings still live in `openclaw.json` under `plugins.entries.devclaw.config`:
 ### Project Execution Mode
@@ -73,8 +255,6 @@ Controls cross-project parallelism:
 | `"parallel"` (default) | Multiple projects can have active workers simultaneously |
 | `"sequential"` | Only one project's workers active at a time. Useful for single-agent deployments. |
 Enforced in `work_heartbeat` and the heartbeat service before dispatching.
 ### Heartbeat Service
 Token-free interval-based health checks + queue dispatch:
@@ -105,7 +285,7 @@ Token-free interval-based health checks + queue dispatch:
 **Source:** [`lib/services/heartbeat.ts`](../lib/services/heartbeat.ts)
-The heartbeat service runs as a plugin service tied to the gateway lifecycle. Every tick: health pass (auto-fix zombies, stale workers) → tick pass (fill free slots by priority). Zero LLM tokens consumed.
+The heartbeat service runs as a plugin service tied to the gateway lifecycle. Every tick: health pass (auto-fix zombies, stale workers) → review pass (poll PR status for "In Review" issues) → tick pass (fill free slots by priority). Zero LLM tokens consumed.
 ### Notifications
@@ -157,7 +337,8 @@ Restrict DevClaw tools to your orchestrator agent:
            "work_heartbeat",
            "project_register",
            "setup",
-            "onboard"
+            "onboard",
            "design_task"
          ]
        }
      }
@@ -170,7 +351,7 @@ Restrict DevClaw tools to your orchestrator agent:
 ## Project State (`projects.json`)
-All project state lives in `<workspace>/projects/projects.json`, keyed by group ID.
+All project state lives in `<workspace>/devclaw/projects.json`, keyed by group ID.
 **Source:** [`lib/projects.ts`](../lib/projects.ts)
@@ -187,26 +368,40 @@ All project state lives in `<workspace>/projects/projects.json`, keyed by group
      "deployBranch": "development",
      "deployUrl": "https://my-webapp.example.com",
      "channel": "telegram",
      "provider": "github",
      "roleExecution": "parallel",
-      "dev": {
+      "workers": {
-        "active": false,
+        "developer": {
-        "issueId": null,
+          "active": false,
-        "startTime": null,
+          "issueId": null,
-        "level": null,
+          "startTime": null,
-        "sessions": {
+          "level": null,
-          "junior": null,
+          "sessions": {
-          "medior": "agent:orchestrator:subagent:my-webapp-dev-medior",
+            "junior": null,
-          "senior": null
+            "medior": "agent:orchestrator:subagent:my-webapp-developer-medior",
-        }
+            "senior": null
-      },
+          }
-      "qa": {
+        },
-        "active": false,
+        "tester": {
-        "issueId": null,
+          "active": false,
-        "startTime": null,
+          "issueId": null,
-        "level": null,
+          "startTime": null,
-        "sessions": {
+          "level": null,
-          "reviewer": "agent:orchestrator:subagent:my-webapp-qa-reviewer",
+          "sessions": {
-          "tester": null
+            "junior": null,
            "medior": "agent:orchestrator:subagent:my-webapp-tester-medior",
            "senior": null
          }
        },
        "architect": {
          "active": false,
          "issueId": null,
          "startTime": null,
          "level": null,
          "sessions": {
            "junior": null,
            "senior": null
          }
        }
      }
    }
@@ -225,29 +420,28 @@ All project state lives in `<workspace>/projects/projects.json`, keyed by group
 | `deployBranch` | string | Branch that triggers deployment |
 | `deployUrl` | string | Deployment URL |
 | `channel` | string | Messaging channel (`"telegram"`, `"whatsapp"`, etc.) |
-| `roleExecution` | `"parallel"` \| `"sequential"` | DEV/QA parallelism for this project |
+| `provider` | `"github"` \| `"gitlab"` | Issue tracker provider (auto-detected, stored for reuse) |
 | `roleExecution` | `"parallel"` \| `"sequential"` | DEVELOPER/TESTER parallelism for this project |
 ### Worker state fields
-Each project has `dev` and `qa` worker state objects:
+Each role in the `workers` record has a `WorkerState` object:
 | Field | Type | Description |
 |---|---|---|
 | `active` | boolean | Whether this role has an active worker |
 | `issueId` | string \| null | Issue being worked on (as string) |
 | `startTime` | string \| null | ISO timestamp when worker became active |
-| `level` | string \| null | Current level (`junior`, `medior`, `senior`, `reviewer`, `tester`) |
+| `level` | string \| null | Current level (`junior`, `medior`, `senior`) |
 | `sessions` | Record<string, string \| null> | Per-level session keys |
 **DEV session keys:** `junior`, `medior`, `senior`
 **QA session keys:** `reviewer`, `tester`
 ### Key design decisions
 - **Session-per-level** — each level gets its own worker session, accumulating context independently. Level selection maps directly to a session key.
 - **Sessions preserved on completion** — when a worker completes a task, the sessions map is preserved (only `active`, `issueId`, and `startTime` are cleared). This enables session reuse.
- **Atomic writes** — all writes go through temp-file-then-rename to prevent corruption.
+- **Atomic writes** — all writes go through temp-file-then-rename to prevent corruption. File locking prevents concurrent read-modify-write races.
 - **Sessions persist indefinitely** — no auto-cleanup. The `health` tool handles manual cleanup.
 - **Dynamic workers** — the `workers` record is keyed by role ID (e.g., `developer`, `tester`, `architect`). New roles are created automatically when dispatched.
 ---
@@ -255,37 +449,43 @@ Each project has `dev` and `qa` worker state objects:
 ```
 <workspace>/
-├── projects/
+├── devclaw/
-│   ├── projects.json          ← Project state (auto-managed)
+│   ├── projects.json              ← Project state (auto-managed)
-│   └── roles/
+│   ├── workflow.yaml              ← Workspace-level config overrides
-│       ├── my-webapp/         ← Per-project role instructions (editable)
+│   ├── prompts/
-│       │   ├── dev.md
+│   │   ├── developer.md           ← Default developer instructions
-│       │   └── qa.md
+│   │   ├── tester.md              ← Default tester instructions
-│       ├── another-project/
+│   │   └── architect.md           ← Default architect instructions
-│       │   ├── dev.md
+│   ├── projects/
-│       │   └── qa.md
+│   │   ├── my-webapp/
-│       └── default/           ← Fallback role instructions
+│   │   │   ├── workflow.yaml      ← Project-specific config overrides
-│           ├── dev.md
+│   │   │   └── prompts/
-│           └── qa.md
+│   │   │       ├── developer.md   ← Project-specific developer instructions
-├── log/
+│   │   │       ├── tester.md      ← Project-specific tester instructions
-│   └── audit.log              ← NDJSON event log (auto-managed)
+│   │   │       └── architect.md   ← Project-specific architect instructions
-├── AGENTS.md                  ← Agent identity documentation
+│   │   └── another-project/
-└── HEARTBEAT.md               ← Heartbeat operation guide
+│   │       └── prompts/
 │   │           ├── developer.md
 │   │           └── tester.md
 │   └── log/
 │       └── audit.log              ← NDJSON event log (auto-managed)
 ├── AGENTS.md                      ← Agent identity documentation
 └── HEARTBEAT.md                   ← Heartbeat operation guide
 ```
 ### Role instruction files
-`work_start` loads role instructions from `projects/roles/<project>/<role>.md` at dispatch time, falling back to `projects/roles/default/<role>.md`. These files are appended to the task message sent to worker sessions.
+Role instructions are injected into worker sessions via the `agent:bootstrap` hook at session startup. The hook loads instructions from `devclaw/projects/<project>/prompts/<role>.md`, falling back to `devclaw/prompts/<role>.md`.
 Edit to customize: deployment steps, test commands, acceptance criteria, coding standards.
-**Source:** [`lib/dispatch.ts:loadRoleInstructions`](../lib/dispatch.ts)
+**Source:** [`lib/bootstrap-hook.ts`](../lib/bootstrap-hook.ts)
 ---
 ## Audit Log
-Append-only NDJSON at `<workspace>/log/audit.log`. Auto-truncated to 250 lines.
+Append-only NDJSON at `<workspace>/devclaw/log/audit.log`. Auto-truncated to 250 lines.
 **Source:** [`lib/audit.ts`](../lib/audit.ts)
@@ -331,6 +531,8 @@ DevClaw uses an `IssueProvider` interface (`lib/providers/provider.ts`) to abstr
 | GitHub | `gh` | Remote contains `github.com` |
 | GitLab | `glab` | Remote contains `gitlab` |
 **Provider resilience:** All calls are wrapped with cockatiel retry (3 attempts, exponential backoff) + circuit breaker (opens after 5 consecutive failures, half-opens after 30s). See [`lib/providers/resilience.ts`](../lib/providers/resilience.ts).
 **Planned:** Jira (via REST API)
 **Source:** [`lib/providers/index.ts`](../lib/providers/index.ts)
--- a/docs/MANAGEMENT.md
+++ b/docs/MANAGEMENT.md
@@ -19,7 +19,8 @@ DevClaw's level selection does exactly this. When a task comes in, the plugin ro
 | Simple (typos, renames, copy)    | Junior   | The intern — just execute   |
 | Standard (features, bug fixes)   | Medior   | Mid-level — think and build |
 | Complex (architecture, security) | Senior   | The architect — design and reason |
-| Review                           | Reviewer | Independent code reviewer   |
+
 All three roles — DEVELOPER, TESTER, and ARCHITECT — use the same junior/medior/senior scheme (architect uses junior/senior). The orchestrator picks the level per task, and the plugin resolves it to the appropriate model via the role registry and workflow config.
 This isn't just cost optimization. It mirrors what effective managers do instinctively: match the delegation level to the task, not to a fixed assumption about the delegate.
@@ -27,14 +28,15 @@ This isn't just cost optimization. It mirrors what effective managers do instinc
 Classical management theory — later formalized by Bernard Bass in his work on Transformational Leadership — introduced a concept called Management by Exception (MBE). The principle: a manager should only be pulled back into a workstream when something deviates from the expected path.
-DevClaw's task lifecycle is built on this. The orchestrator delegates a task via `work_start`, then steps away. It only re-engages in three scenarios:
+DevClaw's task lifecycle is built on this. The orchestrator delegates a task via `work_start`, then steps away. It only re-engages in specific scenarios:
-1. **DEV completes work** → The label moves to `To Test`. The scheduler dispatches QA on the next tick. No orchestrator involvement needed.
+1. **DEVELOPER completes work** → The label moves to `To Test`. The scheduler dispatches TESTER on the next tick. No orchestrator involvement needed.
-2. **QA passes** → The issue closes. Pipeline complete.
+2. **DEVELOPER requests review** → The label moves to `In Review`. The heartbeat polls PR status. When merged, the scheduler dispatches TESTER. No orchestrator involvement needed.
-3. **QA fails** → The label moves to `To Improve`. The scheduler dispatches DEV on the next tick. The orchestrator may need to adjust the model level.
+3. **TESTER passes** → The issue closes. Pipeline complete.
-4. **QA refines** → The task enters a holding state that _requires human decision_. This is the explicit escalation boundary.
+4. **TESTER fails** → The label moves to `To Improve`. The scheduler dispatches DEVELOPER on the next tick. The orchestrator may need to adjust the level.
 5. **Any role is blocked** → The task enters `Refining` — a holding state that _requires human decision_. This is the explicit escalation boundary.
-The "refine" state is the most interesting from a delegation perspective. It's a conscious architectural decision that says: some judgments should not be automated. When the QA agent determines that a task needs rethinking rather than just fixing, it escalates to the only actor who has the full business context — the human.
+The "Refining" state is the most interesting from a delegation perspective. It's a conscious architectural decision that says: some judgments should not be automated. When a TESTER determines that a task needs rethinking rather than just fixing, or when a DEVELOPER hits an obstacle that requires business context, it escalates to the only actor who has the full picture — the human.
 This is textbook MBE. The person behind the keyboard isn't monitoring every task. They're only pulled in when the system encounters something beyond its delegation authority.
@@ -42,14 +44,17 @@ This is textbook MBE. The person behind the keyboard isn't monitoring every task
 Henry Mintzberg's work on organizational structure identified five coordination mechanisms. The one most relevant to DevClaw is **standardization of work processes** — when coordination happens not through direct supervision but through predetermined procedures that everyone follows.
-DevClaw enforces a single, fixed lifecycle for every task across every project:
+DevClaw enforces a configurable but consistent lifecycle for every task. The default workflow:
 ```
 Planning → To Do → Doing → To Test → Testing → Done
                         ↘ In Review → (PR merged) → To Test
                                    ↘ To Improve → Doing (fix cycle)
                                    ↘ Refining → (human decision)
 ```
 The ARCHITECT role adds a parallel track: `To Design → Designing → Planning`.
 Every label transition, state update, and audit log entry happens atomically inside the plugin. The orchestrator agent cannot skip a step, forget a label, or corrupt session state — because those operations are deterministic code, not instructions an LLM follows imperfectly.
 This is what allows a single orchestrator to manage multiple projects simultaneously. Management research has long debated the ideal span of control — typically cited as 5-9 direct reports for knowledge work. DevClaw sidesteps the constraint entirely by making every project follow identical processes. The orchestrator doesn't need to remember how Project A works versus Project B. They all work the same way.
@@ -60,9 +65,11 @@ One of the most common delegation failures is self-review. You don't ask the per
 DevClaw enforces structural separation between development and review by design:
- DEV and QA are separate sub-agent sessions with separate state.
+- DEVELOPER and TESTER are separate sub-agent sessions with separate state.
- QA uses the reviewer level, which can be a different model entirely, introducing genuine independence.
+- TESTER can use a different model entirely (e.g. senior for security reviews, junior for smoke tests), introducing genuine independence.
- The review happens after a clean label transition — QA picks up from `To Test`, not from watching DEV work in real time.
+- The review happens after a clean label transition — TESTER picks up from `To Test`, not from watching DEVELOPER work in real time.
 For higher-stakes changes, the DEVELOPER can submit a PR for human review (`result: "review"`). The issue enters `In Review` and the heartbeat polls the PR until it's merged — only then does TESTER receive the work. This adds a human checkpoint without breaking the automated flow.
 This mirrors a principle from organizational design: effective controls require independence between execution and verification. It's the same reason companies separate their audit function from their operations.
@@ -72,7 +79,7 @@ Ronald Coase won a Nobel Prize for explaining why firms exist: transaction costs
 DevClaw applies the same logic to AI sessions. Spawning a new sub-agent session costs approximately 50,000 tokens of context loading — the agent needs to read the full codebase before it can do useful work. That's the onboarding cost.
-The plugin tracks session keys across task completions. When a DEV finishes task A and task B is ready on the same project, DevClaw detects the existing session and reuses it instead of spawning a new one. No re-onboarding. No context reload.
+The plugin tracks session keys across task completions. When a DEVELOPER finishes task A and task B is ready on the same project, DevClaw detects the existing session and reuses it instead of spawning a new one. No re-onboarding. No context reload. Each role maintains separate sessions per level, so a "medior developer" session accumulates project context independently from the "senior developer" session.
 In management terms: keep your team stable. Reassigning the same person to the next task on their project is almost always cheaper than bringing in someone new — even if the new person is theoretically better qualified.
@@ -85,15 +92,15 @@ The obvious saving is execution time: AI writes code faster than a human. But th
 Without DevClaw, every task requires a human to make a series of small decisions:
 - Which model should handle this?
- Is the DEV session still alive, or do I need a new one?
+- Is the DEVELOPER session still alive, or do I need a new one?
 - What label should this issue have now?
 - Did I update the state file?
 - Did I log this transition?
- Is the QA session free, or is it still working on something?
+- Is the TESTER session free, or is it still working on something?
 None of these decisions are hard. But they accumulate. Each one consumes a small amount of the same cognitive resource you need for the decisions that actually matter — product direction, architecture choices, business priorities.
-DevClaw eliminates entire categories of decisions by making them deterministic. The plugin picks the model. The plugin manages sessions. The plugin transitions labels. The plugin writes audit logs. The person behind the keyboard is left with only the decisions that require human judgment: what to build, what to prioritize, and what to do when QA says "this needs rethinking."
+DevClaw eliminates entire categories of decisions by making them deterministic. The plugin picks the model. The plugin manages sessions. The plugin transitions labels. The plugin writes audit logs. The person behind the keyboard is left with only the decisions that require human judgment: what to build, what to prioritize, and what to do when a worker says "this needs rethinking."
 This is the deepest lesson from delegation theory: **good delegation isn't about getting someone else to do your work. It's about protecting your attention for the work only you can do.**
@@ -101,11 +108,11 @@ This is the deepest lesson from delegation theory: **good delegation isn't about
 Management research points to a few directions that could extend DevClaw's delegation model:
-**Progressive delegation.** Blanchard's model suggests increasing task complexity for delegates as they prove competent. DevClaw could track QA pass rates per model level and automatically promote — if junior consistently passes QA on borderline tasks, start routing more work to it. This is how good managers develop their people, and it reduces cost over time.
+**Progressive delegation.** Blanchard's model suggests increasing task complexity for delegates as they prove competent. DevClaw could track TESTER pass rates per model level and automatically promote — if junior consistently passes TESTER on borderline tasks, start routing more work to it. This is how good managers develop their people, and it reduces cost over time.
-**Delegation authority expansion.** The Vroom-Yetton decision model maps when a leader should decide alone versus consulting the team. Currently, sub-agents have narrow authority — they execute tasks but can't restructure the backlog. Selectively expanding this (e.g., allowing a DEV agent to split a task it judges too large) would reduce orchestrator bottlenecks, mirroring how managers gradually give high-performers more autonomy.
+**Delegation authority expansion.** The Vroom-Yetton decision model maps when a leader should decide alone versus consulting the team. Currently, sub-agents have narrow authority — they execute tasks but can't restructure the backlog. Selectively expanding this (e.g., allowing a DEVELOPER agent to split a task it judges too large) would reduce orchestrator bottlenecks, mirroring how managers gradually give high-performers more autonomy.
-**Outcome-based learning.** Delegation research emphasizes that the _delegator_ learns from outcomes too. Aggregated metrics — QA fail rate by model level, average cycles to Done, time-in-state distributions — would help both the orchestrator agent and the human calibrate their delegation patterns over time.
+**Outcome-based learning.** Delegation research emphasizes that the _delegator_ learns from outcomes too. Aggregated metrics — TESTER fail rate by model level, average cycles to Done, time-in-state distributions — would help both the orchestrator agent and the human calibrate their delegation patterns over time.
 ---
--- a/docs/ONBOARDING.md
+++ b/docs/ONBOARDING.md
@@ -52,13 +52,16 @@ openclaw devclaw setup
 The setup wizard walks you through:
 1. **Agent** — Create a new orchestrator agent or configure an existing one
-2. **Developer team** — Choose which LLM model powers each developer level:
+2. **Developer team** — Choose which LLM model powers each level:
-   - **DEV junior** (fast, cheap tasks) — default: `anthropic/claude-haiku-4-5`
+   - **Developer junior** (fast, cheap tasks) — default: `anthropic/claude-haiku-4-5`
-   - **DEV medior** (standard tasks) — default: `anthropic/claude-sonnet-4-5`
+   - **Developer medior** (standard tasks) — default: `anthropic/claude-sonnet-4-5`
-   - **DEV senior** (complex tasks) — default: `anthropic/claude-opus-4-5`
+   - **Developer senior** (complex tasks) — default: `anthropic/claude-opus-4-6`
-   - **QA reviewer** (code review) — default: `anthropic/claude-sonnet-4-5`
+   - **Tester junior** (quick checks) — default: `anthropic/claude-haiku-4-5`
-   - **QA tester** (manual testing) — default: `anthropic/claude-haiku-4-5`
+   - **Tester medior** (standard review) — default: `anthropic/claude-sonnet-4-5`
-3. **Workspace** — Writes AGENTS.md, HEARTBEAT.md, role templates, and initializes state
+   - **Tester senior** (thorough review) — default: `anthropic/claude-opus-4-6`
   - **Architect junior** (standard design) — default: `anthropic/claude-sonnet-4-5`
   - **Architect senior** (complex architecture) — default: `anthropic/claude-opus-4-6`
 3. **Workspace** — Writes AGENTS.md, HEARTBEAT.md, workflow.yaml, role templates, and initializes state
 Non-interactive mode:
 ```bash
@@ -68,7 +71,7 @@ openclaw devclaw setup --new-agent "My Dev Orchestrator"
 # Configure existing agent with custom models
 openclaw devclaw setup --agent my-orchestrator \
  --junior "anthropic/claude-haiku-4-5" \
-  --senior "anthropic/claude-opus-4-5"
+  --senior "anthropic/claude-opus-4-6"
 ```
 ### Option C: Tool call (agent-driven)
@@ -86,12 +89,12 @@ setup({
  "newAgentName": "My Dev Orchestrator",
  "channelBinding": "telegram",
  "models": {
-    "dev": {
+    "developer": {
      "junior": "anthropic/claude-haiku-4-5",
-      "senior": "anthropic/claude-opus-4-5"
+      "senior": "anthropic/claude-opus-4-6"
    },
-    "qa": {
+    "tester": {
-      "reviewer": "anthropic/claude-sonnet-4-5"
+      "medior": "anthropic/claude-sonnet-4-5"
    }
  }
 })
@@ -151,8 +154,8 @@ Go to the Telegram/WhatsApp group for the project and tell the orchestrator agen
 The agent calls `project_register`, which atomically:
 - Validates the repo and auto-detects GitHub/GitLab from remote
- Creates all 8 state labels (idempotent)
+- Creates all 11 state labels (idempotent)
- Scaffolds role instruction files (`projects/roles/<project>/dev.md` and `qa.md`)
+- Scaffolds role instruction files (`devclaw/projects/<project>/prompts/developer.md`, `tester.md`, `architect.md`)
 - Adds the project entry to `projects.json`
 - Logs the registration event
@@ -168,20 +171,30 @@ The agent calls `project_register`, which atomically:
      "baseBranch": "development",
      "deployBranch": "development",
      "channel": "telegram",
      "provider": "github",
      "roleExecution": "parallel",
-      "dev": {
+      "workers": {
-        "active": false,
+        "developer": {
-        "issueId": null,
+          "active": false,
-        "startTime": null,
+          "issueId": null,
-        "level": null,
+          "startTime": null,
-        "sessions": { "junior": null, "medior": null, "senior": null }
+          "level": null,
-      },
+          "sessions": { "junior": null, "medior": null, "senior": null }
-      "qa": {
+        },
-        "active": false,
+        "tester": {
-        "issueId": null,
+          "active": false,
-        "startTime": null,
+          "issueId": null,
-        "level": null,
+          "startTime": null,
-        "sessions": { "reviewer": null, "tester": null }
+          "level": null,
          "sessions": { "junior": null, "medior": null, "senior": null }
        },
        "architect": {
          "active": false,
          "issueId": null,
          "startTime": null,
          "level": null,
          "sessions": { "junior": null, "senior": null }
        }
      }
    }
  }
@@ -194,7 +207,7 @@ The agent calls `project_register`, which atomically:
 Issues can be created in multiple ways:
 - **Via the agent** — Ask the orchestrator in the Telegram group: "Create an issue for adding a login page" (uses `task_create`)
- **Via workers** — DEV/QA workers can call `task_create` to file follow-up bugs they discover
+- **Via workers** — DEVELOPER/TESTER workers can call `task_create` to file follow-up bugs they discover
 - **Via CLI** — `cd ~/git/my-project && gh issue create --title "My first task" --label "To Do"` (or `glab issue create`)
 - **Via web UI** — Create an issue and add the "To Do" label
@@ -208,9 +221,9 @@ Ask the agent in the Telegram group:
 The agent should call `status` and report the "To Do" issue. Then:
-> "Pick up issue #1 for DEV"
+> "Pick up issue #1 for developer"
-The agent calls `work_start`, which assigns a developer level, transitions the label to "Doing", creates or reuses a worker session, and dispatches the task — all in one call. The agent posts the announcement.
+The agent calls `work_start`, which assigns a level, transitions the label to "Doing", creates or reuses a worker session, and dispatches the task — all in one call. The agent posts the announcement.
 ## Adding more projects
@@ -220,17 +233,20 @@ Each project is fully isolated — separate queue, separate workers, separate st
 ## Developer levels
-DevClaw assigns tasks to developer levels instead of raw model names. This makes the system intuitive — you're assigning a "junior dev" to fix a typo, not configuring model parameters.
+DevClaw assigns tasks to developer levels instead of raw model names. This makes the system intuitive — you're assigning a "junior" to fix a typo, not configuring model parameters. All roles use the same level scheme.
-| Role | Level | Default model | When to assign |
+| Role | Level | Default Model | When to assign |
 |------|-------|---------------|----------------|
-| DEV | **junior** | `anthropic/claude-haiku-4-5` | Typos, single-file fixes, CSS changes |
+| Developer | **junior** | `anthropic/claude-haiku-4-5` | Typos, single-file fixes, CSS changes |
-| DEV | **medior** | `anthropic/claude-sonnet-4-5` | Features, bug fixes, multi-file changes |
+| Developer | **medior** | `anthropic/claude-sonnet-4-5` | Features, bug fixes, multi-file changes |
-| DEV | **senior** | `anthropic/claude-opus-4-5` | Architecture, migrations, system-wide refactoring |
+| Developer | **senior** | `anthropic/claude-opus-4-6` | Architecture, migrations, system-wide refactoring |
-| QA | **reviewer** | `anthropic/claude-sonnet-4-5` | Code review, test validation |
+| Tester | **junior** | `anthropic/claude-haiku-4-5` | Quick smoke tests, basic checks |
-| QA | **tester** | `anthropic/claude-haiku-4-5` | Manual testing, smoke tests |
+| Tester | **medior** | `anthropic/claude-sonnet-4-5` | Standard code review, test validation |
 | Tester | **senior** | `anthropic/claude-opus-4-6` | Thorough security review, complex edge cases |
 | Architect | **junior** | `anthropic/claude-sonnet-4-5` | Standard design investigation |
 | Architect | **senior** | `anthropic/claude-opus-4-6` | Complex architecture decisions |
-Change which model powers each level in `openclaw.json` — see [Configuration](CONFIGURATION.md#model-tiers).
+Change which model powers each level in `workflow.yaml` — see [Configuration](CONFIGURATION.md#role-configuration).
 ## What the plugin handles vs. what you handle
@@ -239,17 +255,19 @@ Change which model powers each level in `openclaw.json` — see [Configuration](
 | Plugin installation | You (once) | `openclaw plugins install @laurentenhoor/devclaw` |
 | Agent + workspace setup | Plugin (`setup`) | Creates agent, configures models, writes workspace files |
 | Channel binding migration | Plugin (`setup` with `migrateFrom`) | Automatically moves channel-wide bindings between agents |
-| Label setup | Plugin (`project_register`) | 8 labels, created idempotently via IssueProvider |
+| Label setup | Plugin (`project_register`) | 11 labels, created idempotently via IssueProvider |
-| Prompt file scaffolding | Plugin (`project_register`) | Creates `projects/roles/<project>/dev.md` and `qa.md` |
+| Prompt file scaffolding | Plugin (`project_register`) | Creates `devclaw/projects/<project>/prompts/<role>.md` for each role |
 | Project registration | Plugin (`project_register`) | Entry in `projects.json` with empty worker state |
 | Telegram group setup | You (once per project) | Add bot to group |
 | Issue creation | Plugin (`task_create`) | Orchestrator or workers create issues from chat |
 | Label transitions | Plugin | Atomic transitions via issue tracker CLI |
 | Developer assignment | Plugin | LLM-selected level by orchestrator, keyword heuristic fallback |
-| State management | Plugin | Atomic read/write to `projects.json` |
+| State management | Plugin | Atomic read/write to `projects.json` with file locking |
 | Session management | Plugin | Creates, reuses, and dispatches to sessions via CLI. Agent never touches session tools. |
 | Task completion | Plugin (`work_finish`) | Workers self-report. Scheduler dispatches next role. |
-| Prompt instructions | Plugin (`work_start`) | Loaded from `projects/roles/<project>/<role>.md`, appended to task message |
+| Role instructions | Plugin (bootstrap hook) | Injected into worker sessions via `agent:bootstrap` hook at session startup |
 | Review polling | Plugin (heartbeat) | Auto-advances "In Review" issues when PR is merged |
 | Config validation | Plugin | Zod schemas validate `workflow.yaml` at load time |
 | Audit logging | Plugin | Automatic NDJSON append per tool call |
 | Zombie detection | Plugin | `health` checks active vs alive |
 | Queue scanning | Plugin | `status` queries issue tracker per project |
--- a/docs/QA_WORKFLOW.md
+++ b/docs/QA_WORKFLOW.md
@@ -20,7 +20,7 @@ task_comment({
  projectGroupId: "<group-id>",
  issueId: <issue-number>,
  body: "## QA Review\n\n**Tested:**\n- [List what you tested]\n\n**Results:**\n- [Pass/fail details]\n\n**Environment:**\n- [Test environment details]",
-  authorRole: "qa"
+  authorRole: "tester"
 })
 ```
@@ -30,21 +30,21 @@ After posting your comment, call `work_finish`:
 ```javascript
 work_finish({
-  role: "qa",
+  role: "tester",
  projectGroupId: "<group-id>",
  result: "pass",  // or "fail", "refine", "blocked"
  summary: "Brief summary of review outcome"
 })
 ```
-## QA Results
+## TESTER Results
 | Result | Label transition | Meaning |
 |---|---|---|
 | `"pass"` | Testing → Done | Approved. Issue closed. |
-| `"fail"` | Testing → To Improve | Issues found. Issue reopened, sent back to DEV. |
+| `"fail"` | Testing → To Improve | Issues found. Issue reopened, sent back to DEVELOPER. |
 | `"refine"` | Testing → Refining | Needs human decision. Pipeline pauses. |
-| `"blocked"` | Testing → To Test | Cannot complete (env issues, etc.). Returns to QA queue. |
+| `"blocked"` | Testing → Refining | Cannot complete (env issues, etc.). Awaits human decision. |
 ## Why Comments Are Required
@@ -96,14 +96,14 @@ work_finish({
 ## Enforcement
-QA workers receive instructions via role templates to:
+TESTER workers receive instructions via role templates to:
 - Always call `task_comment` BEFORE `work_finish`
 - Include specific details about what was tested
 - Document results, environment, and any notes
 Prompt templates affected:
- `projects/roles/<project>/qa.md`
+- `devclaw/projects/<project>/prompts/tester.md`
- All project-specific QA templates should follow this pattern
+- `devclaw/prompts/tester.md` (default)
 ## Best Practices
@@ -116,5 +116,5 @@ Prompt templates affected:
 ## Related
 - Tool: [`task_comment`](TOOLS.md#task_comment) — Add comments to issues
- Tool: [`work_finish`](TOOLS.md#work_finish) — Complete QA tasks
+- Tool: [`work_finish`](TOOLS.md#work_finish) — Complete TESTER tasks
- Config: [`projects/roles/<project>/qa.md`](CONFIGURATION.md#role-instruction-files) — QA role instructions
+- Config: [`devclaw/projects/<project>/prompts/tester.md`](CONFIGURATION.md#role-instruction-files) — Tester role instructions
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -1,53 +1,77 @@
 # DevClaw — Roadmap
-## Configurable Roles
+## Recently Completed
-Currently DevClaw has two hardcoded roles: **DEV** and **QA**. Each project gets one worker slot per role. The pipeline is fixed: DEV writes code, QA reviews it.
+### Dynamic Roles and Role Registry
-This works for the common case but breaks down when you want:
+Roles are no longer hardcoded. The `ROLE_REGISTRY` in `lib/roles/registry.ts` defines three built-in roles — **developer**, **tester**, **architect** — each with configurable levels, models, emoji, and completion results. Adding a new role means adding one entry to the registry; everything else (workers, sessions, labels, prompts) derives from it.
 - A **design** role that creates mockups before DEV starts
 - A **devops** role that handles deployment after QA passes
 - A **PM** role that triages and prioritizes the backlog
 - Multiple DEV workers in parallel (e.g. frontend + backend)
 - A project with no QA step at all
-### Planned: role configuration per project
+All roles use a unified junior/medior/senior level scheme (architect uses junior/senior). Per-role model overrides live in `workflow.yaml`.
-Roles become a configurable list instead of a hardcoded pair. Each role defines:
+### Workflow State Machine
 - **Name** — e.g. `design`, `dev`, `qa`, `devops`
 - **Levels** — which developer levels can be assigned (e.g. design only needs `medior`)
 - **Pipeline position** — where it sits in the task lifecycle
 - **Worker count** — how many concurrent workers (default: 1)
-```json
+The issue lifecycle is now a configurable state machine defined in `workflow.yaml`. The default workflow defines 11 states:
-{
+
-  "roles": {
+```
-    "dev": { "levels": ["junior", "medior", "senior"], "workers": 1 },
+Planning → To Do → Doing → To Test → Testing → Done
-    "qa": { "levels": ["reviewer", "tester"], "workers": 1 },
+                         ↘ In Review → (PR merged) → To Test
-    "devops": { "levels": ["medior", "senior"], "workers": 1 }
+                                    ↘ To Improve → Doing
-  },
+                                    ↘ Refining → (human decision)
-  "pipeline": ["dev", "qa", "devops"]
+To Design → Designing → Planning
 }
 ```
-The pipeline definition replaces the hardcoded `Doing → To Test → Testing → Done` flow. Labels and transitions are generated from the pipeline config. The scheduler follows the pipeline order when filling free slots.
+States have types (`queue`, `active`, `hold`, `review`, `terminal`), transitions with actions (`gitPull`, `detectPr`, `closeIssue`, `reopenIssue`), and review checks (`prMerged`, `prApproved`).
-### Open questions
+### Three-Layer Configuration
- How do custom labels map? Generate from role names, or let users define?
+Config resolution follows three layers, each partially overriding the one below:
- Should roles have their own instruction files (`projects/roles/<project>/<role>.md`) — yes, this already works
+
- How to handle parallel roles (e.g. frontend + backend DEV in parallel before QA)?
+1. **Built-in defaults** — `ROLE_REGISTRY` + `DEFAULT_WORKFLOW`
 2. **Workspace** — `<workspace>/devclaw/workflow.yaml`
 3. **Project** — `<workspace>/devclaw/projects/<project>/workflow.yaml`
 Validated at load time with Zod schemas (`lib/config/schema.ts`). Integrity checks verify transition targets exist, queue states have roles, and terminal states have no outgoing transitions.
 ### Provider Resilience
 All issue tracker calls (GitHub via `gh`, GitLab via `glab`) are wrapped with cockatiel retry (3 attempts, exponential backoff) and circuit breaker (opens after 5 consecutive failures, half-opens after 30s). See `lib/providers/resilience.ts`.
 ### Bootstrap Hook for Role Instructions
 Worker sessions receive role-specific instructions via the `agent:bootstrap` hook at session startup, not appended to the task message. The hook reads from `devclaw/projects/<project>/prompts/<role>.md`, falling back to `devclaw/prompts/<role>.md`. Supports source tracking with `loadRoleInstructions(dir, { withSource: true })`.
 ### In Review State and PR Polling
 DEVELOPER can submit a PR for human review (`result: "review"`), which transitions the issue to `In Review`. The heartbeat's review pass polls PR status via `getPrStatus()` on the provider. When the PR is merged, the issue auto-transitions to `To Test` for TESTER pickup.
 ### Architect Role
 The architect role enables design investigations. `design_task` creates a `To Design` issue and dispatches an architect worker. The architect completes with `done` (→ Planning) or `blocked` (→ Refining).
 ### Workspace Layout Migration
 Data directory moved from `<workspace>/projects/` to `<workspace>/devclaw/`. Automatic migration on first load — see `lib/setup/migrate-layout.ts`.
 ### E2E Test Infrastructure
 Purpose-built test harness (`lib/testing/`) with:
 - `TestProvider` — in-memory `IssueProvider` with call tracking
 - `createTestHarness()` — scaffolds temp workspace, mock `runCommand`, test provider
 - `simulateBootstrap()` — tests the full bootstrap hook chain without a live gateway
 - `CommandInterceptor` — captures and filters CLI calls
 ---
-## Channel-agnostic Groups
+## Planned
 ### Channel-agnostic Groups
 Currently DevClaw maps projects to **Telegram group IDs**. The `projectGroupId` is a Telegram-specific negative number. This means:
 - WhatsApp groups can't be used as project channels (partially supported now via `channel` field)
 - Discord, Slack, or other channels are excluded
 - The naming (`groupId`, `groupName`) is Telegram-specific
-### Planned: abstract channel binding
+**Planned: abstract channel binding**
 Replace Telegram-specific group IDs with a generic channel identifier that works across any OpenClaw channel.
@@ -57,14 +81,12 @@ Replace Telegram-specific group IDs with a generic channel identifier that works
    "whatsapp:120363140032870788@g.us": {
      "name": "my-project",
      "channel": "whatsapp",
-      "peer": "120363140032870788@g.us",
+      "peer": "120363140032870788@g.us"
      ...
    },
    "telegram:-1234567890": {
      "name": "other-project",
      "channel": "telegram",
-      "peer": "-1234567890",
+      "peer": "-1234567890"
      ...
    }
  }
 }
@@ -79,7 +101,7 @@ Key changes:
 This enables any OpenClaw channel (Telegram, WhatsApp, Discord, Slack, etc.) to host a project.
-### Open questions
+#### Open questions
 - Should one project be bindable to multiple channels? (e.g. Telegram for devs, WhatsApp for stakeholder updates)
 - How does the orchestrator agent handle cross-channel context?
@@ -89,8 +111,9 @@ This enables any OpenClaw channel (Telegram, WhatsApp, Discord, Slack, etc.) to
 ## Other Ideas
 - **Jira provider** — `IssueProvider` interface already abstracts GitHub/GitLab; Jira is the obvious next addition
- **Deployment integration** — `work_finish` QA pass could trigger a deploy step via webhook or CLI
+- **Deployment integration** — `work_finish` TESTER pass could trigger a deploy step via webhook or CLI
 - **Cost tracking** — log token usage per task/level, surface in `status`
 - **Priority scoring** — automatic priority assignment based on labels, age, and dependencies
 - **Session archival** — auto-archive idle sessions after configurable timeout (currently indefinite)
- **Progressive delegation** — track QA pass rates per level and auto-promote (see [Management Theory](MANAGEMENT.md))
+- **Progressive delegation** — track TESTER pass rates per level and auto-promote (see [Management Theory](MANAGEMENT.md))
 - **Custom workflow actions** — user-defined actions in `workflow.yaml` (e.g. deploy scripts, notifications)
--- a/docs/TESTING.md
+++ b/docs/TESTING.md
@@ -1,216 +1,215 @@
-# DevClaw Testing Guide
+# DevClaw — Testing Guide
-Comprehensive automated testing for DevClaw onboarding and setup.
+DevClaw uses Node.js built-in test runner (`node:test`) with `node:assert/strict` for all tests.
 ## Quick Start
 ```bash
 # Install dependencies
 npm install
 # Run all tests
-npm test
+npx tsx --test lib/**/*.test.ts
-# Run with coverage report
+# Run a specific test file
-npm run test:coverage
+npx tsx --test lib/roles/registry.test.ts
-# Run in watch mode (auto-rerun on changes)
+# Run E2E tests only
-npm run test:watch
+npx tsx --test lib/services/*.e2e.test.ts
-# Run with UI (browser-based test explorer)
+# Build (also type-checks all test files)
-npm run test:ui
+npm run build
 ```
-## Test Coverage
+## Test Files
-### Scenario 1: New User (No Prior DevClaw Setup)
+### Unit Tests
 **File:** `tests/setup/new-user.test.ts`
-**What's tested:**
+| File | What it tests |
- First-time agent creation with default models
+|---|---|
- Channel binding creation (telegram/whatsapp)
+| [lib/roles/registry.test.ts](../lib/roles/registry.test.ts) | Role registry: role lookup, level resolution, model defaults |
- Workspace file generation (AGENTS.md, HEARTBEAT.md, projects/, log/)
+| [lib/projects.test.ts](../lib/projects.test.ts) | Project state: read/write, worker state, atomic file operations |
- Plugin configuration initialization
+| [lib/bootstrap-hook.test.ts](../lib/bootstrap-hook.test.ts) | Bootstrap hook: role instruction loading, source tracking, overloads |
- Error handling: channel not configured
+| [lib/tools/task-update.test.ts](../lib/tools/task-update.test.ts) | Task update tool: label transitions, validation |
- Error handling: channel disabled
+| [lib/tools/design-task.test.ts](../lib/tools/design-task.test.ts) | Design task tool: architect dispatch |
 | [lib/tools/queue-status.test.ts](../lib/tools/queue-status.test.ts) | Queue status formatting |
 | [lib/setup/migrate-layout.test.ts](../lib/setup/migrate-layout.test.ts) | Workspace layout migration: `projects/` → `devclaw/` |
 ### E2E Tests
 | File | What it tests |
 |---|---|
 | [lib/services/pipeline.e2e.test.ts](../lib/services/pipeline.e2e.test.ts) | Full pipeline: completion rules, label transitions, actions |
 | [lib/services/bootstrap.e2e.test.ts](../lib/services/bootstrap.e2e.test.ts) | Bootstrap hook chain: session key → parse → load instructions → inject |
 ## Test Infrastructure
 ### Test Harness (`lib/testing/`)
 The [`lib/testing/`](../lib/testing/) module provides E2E test infrastructure:
 **Example:**
 ```typescript
-// Before: openclaw.json has no DevClaw agents
+import { createTestHarness } from "../testing/index.js";
 {
  "agents": { "list": [{ "id": "main", ... }] },
  "bindings": [],
  "plugins": { "entries": {} }
 }
-// After: New orchestrator created
+const h = await createTestHarness({
-{
+  projectName: "my-project",
-  "agents": {
+  groupId: "-1234567890",
-    "list": [
+  workflow: DEFAULT_WORKFLOW,
-      { "id": "main", ... },
+  workers: {
-      { "id": "my-first-orchestrator", ... }
+    developer: { active: true, issueId: "42", level: "medior" },
    ]
  },
-  "bindings": [
+});
-    { "agentId": "my-first-orchestrator", "match": { "channel": "telegram" } }
+try {
-  ],
+  // ... run tests against h.provider, h.commands, etc.
-  "plugins": {
+} finally {
-    "entries": {
+  await h.cleanup();
      "devclaw": {
        "config": {
          "models": {
            "dev": {
              "junior": "anthropic/claude-haiku-4-5",
              "medior": "anthropic/claude-sonnet-4-5",
              "senior": "anthropic/claude-opus-4-5"
            },
            "qa": {
              "reviewer": "anthropic/claude-sonnet-4-5",
              "tester": "anthropic/claude-haiku-4-5"
            }
          }
        }
      }
    }
  }
 }
 ```
-### Scenario 2: Existing User (Migration)
+**`createTestHarness()`** scaffolds:
-**File:** `tests/setup/existing-user.test.ts`
+- Temporary workspace directory with `devclaw/` data dir and `log/` subdirectory
 - `projects.json` with test project and configurable worker state
 - Mock `runCommand` via `CommandInterceptor` (captures all CLI calls)
 - `TestProvider` — in-memory `IssueProvider` with call tracking
-**What's tested:**
+### TestProvider
- Channel conflict detection (existing channel-wide binding)
+
- Binding migration from old agent to new agent
+In-memory implementation of `IssueProvider` for testing. Tracks all provider method calls and maintains in-memory issue state:
 - Custom model preservation during migration
 - Old agent preservation (not deleted)
 - Error handling: migration source doesn't exist
 - Error handling: migration source has no binding
 **Example:**
 ```typescript
-// Before: Old orchestrator has telegram binding
+const h = await createTestHarness();
-{
+h.provider.seedIssue(42, {
-  "agents": {
+  title: "Fix the bug",
-    "list": [
+  labels: ["Doing"],
-      { "id": "main", ... },
+  state: "open",
-      { "id": "old-orchestrator", ... }
+});
    ]
  },
  "bindings": [
    { "agentId": "old-orchestrator", "match": { "channel": "telegram" } }
  ]
 }
-// After: Binding migrated to new orchestrator
+// After running pipeline code:
-{
+const calls = h.provider.calls;  // All method invocations
  "agents": {
    "list": [
      { "id": "main", ... },
      { "id": "old-orchestrator", ... },
      { "id": "new-orchestrator", ... }
    ]
  },
  "bindings": [
    { "agentId": "new-orchestrator", "match": { "channel": "telegram" } }
  ]
 }
 ```
-### Scenario 3: Power User (Multiple Agents)
+### CommandInterceptor
 **File:** `tests/setup/power-user.test.ts`
-**What's tested:**
+Captures all `runCommand` calls during tests. Provides filtering and extraction helpers:
 - No conflicts with group-specific bindings
 - Channel-wide binding creation alongside group bindings
 - Multiple orchestrators coexisting
 - Routing logic (specific bindings win over channel-wide)
 - WhatsApp support
 - Scale testing (12+ orchestrators)
 **Example:**
 ```typescript
-// Before: Two project orchestrators with group-specific bindings
+// All captured commands
-{
+h.commands.commands;
  "agents": {
    "list": [
      { "id": "project-a-orchestrator", ... },
      { "id": "project-b-orchestrator", ... }
    ]
  },
  "bindings": [
    {
      "agentId": "project-a-orchestrator",
      "match": { "channel": "telegram", "peer": { "kind": "group", "id": "-1001234567890" } }
    },
    {
      "agentId": "project-b-orchestrator",
      "match": { "channel": "telegram", "peer": { "kind": "group", "id": "-1009876543210" } }
    }
  ]
 }
-// After: Channel-wide orchestrator added (no conflicts)
+// Filter by command name
-{
+h.commands.commandsFor("openclaw");
  "agents": {
    "list": [
      { "id": "project-a-orchestrator", ... },
      { "id": "project-b-orchestrator", ... },
      { "id": "global-orchestrator", ... }
    ]
  },
  "bindings": [
    {
      "agentId": "project-a-orchestrator",
      "match": { "channel": "telegram", "peer": { "kind": "group", "id": "-1001234567890" } }
    },
    {
      "agentId": "project-b-orchestrator",
      "match": { "channel": "telegram", "peer": { "kind": "group", "id": "-1009876543210" } }
    },
    {
      "agentId": "global-orchestrator",
      "match": { "channel": "telegram" }  // Channel-wide (no peer)
    }
  ]
 }
-// Routing: Group messages go to specific agents, everything else goes to global
+// Extract task messages dispatched to workers
 h.commands.taskMessages();
 // Extract session creation patches
 h.commands.sessionPatches();
 // Reset between test cases
 h.commands.reset();
 ```
-## Test Architecture
+### simulateBootstrap
-### Mock File System
+Tests the full bootstrap hook chain without a live OpenClaw gateway:
 The tests use an in-memory mock file system (`MockFileSystem`) that simulates:
 - Reading/writing openclaw.json
 - Creating/reading workspace files
 - Tracking command executions (openclaw agents add)
 **Why?** Tests run in isolation without touching the real file system, making them:
 - Fast (no I/O)
 - Reliable (no file conflicts)
 - Repeatable (clean state every test)
 ### Fixtures
 Pre-built configurations for different user types:
 - `createNewUserConfig()` - Empty slate
 - `createCommonUserConfig()` - One orchestrator with binding
 - `createPowerUserConfig()` - Multiple orchestrators with group bindings
 - `createNoChannelConfig()` - Channel not configured
 - `createDisabledChannelConfig()` - Channel disabled
 ### Assertions
 Reusable assertion helpers that make tests readable:
 ```typescript
-assertAgentExists(mockFs, "my-agent", "My Agent");
+// Write a project-specific prompt
-assertChannelBinding(mockFs, "my-agent", "telegram");
+await h.writePrompt("developer", "Custom dev instructions", "my-project");
-assertWorkspaceFilesExist(mockFs, "my-agent");
+
-assertDevClawConfig(mockFs, { junior: "anthropic/claude-haiku-4-5" });
+// Simulate bootstrap for a developer session
 const files = await h.simulateBootstrap(
  "agent:orchestrator:subagent:my-project-developer-medior"
 );
 // Verify injected bootstrap files
 assert.strictEqual(files.length, 1);
 assert.strictEqual(files[0].content, "Custom dev instructions");
 ```
 ## Writing Tests
 ### Pattern: Unit Test
 ```typescript
 import { describe, it } from "node:test";
 import assert from "node:assert/strict";
 describe("my feature", () => {
  it("should do something", () => {
    const result = myFunction("input");
    assert.strictEqual(result, "expected");
  });
 });
 ```
 ### Pattern: E2E Pipeline Test
 ```typescript
 import { describe, it, afterEach } from "node:test";
 import assert from "node:assert/strict";
 import { createTestHarness, type TestHarness } from "../testing/index.js";
 import { executeCompletion } from "./pipeline.js";
 describe("pipeline completion", () => {
  let h: TestHarness;
  afterEach(async () => {
    if (h) await h.cleanup();
  });
  it("developer:done transitions Doing → To Test", async () => {
    h = await createTestHarness({
      workers: {
        developer: { active: true, issueId: "42", level: "medior" },
      },
    });
    h.provider.seedIssue(42, { labels: ["Doing"], state: "open" });
    const result = await executeCompletion({
      workspaceDir: h.workspaceDir,
      groupId: h.groupId,
      project: h.project,
      workflow: h.workflow,
      provider: h.provider,
      role: "developer",
      result: "done",
    });
    assert.strictEqual(result.rule.to, "To Test");
  });
 });
 ```
 ### Pattern: Bootstrap Hook Test
 ```typescript
 import { describe, it, afterEach } from "node:test";
 import assert from "node:assert/strict";
 import { createTestHarness, type TestHarness } from "../testing/index.js";
 describe("bootstrap instructions", () => {
  let h: TestHarness;
  afterEach(async () => {
    if (h) await h.cleanup();
  });
  it("injects project-specific prompt for developer", async () => {
    h = await createTestHarness({ projectName: "webapp" });
    await h.writePrompt("developer", "Build with React", "webapp");
    const files = await h.simulateBootstrap(
      "agent:orchestrator:subagent:webapp-developer-medior"
    );
    assert.strictEqual(files.length, 1);
    assert.ok(files[0].content?.includes("React"));
  });
 });
 ```
 ## CI/CD Integration
 ### GitHub Actions
 ```yaml
 name: Test
 on: [push, pull_request]
@@ -218,122 +217,52 @@ jobs:
  test:
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
-      - uses: actions/setup-node@v3
+      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm ci
-      - run: npm test
+      - run: npm run build
-      - run: npm run test:coverage
+      - run: npx tsx --test lib/**/*.test.ts
      - uses: codecov/codecov-action@v3
        with:
          files: ./coverage/coverage-final.json
 ```
 ### GitLab CI
 ```yaml
 test:
  image: node:20
  script:
    - npm ci
-    - npm test
+    - npm run build
-    - npm run test:coverage
+    - npx tsx --test lib/**/*.test.ts
  coverage: '/Lines\s*:\s*(\d+\.\d+)%/'
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml
 ```
 ## Debugging Tests
 ### Run specific test
 ```bash
-npm test -- new-user              # Run all new-user tests
+# Run by file
-npm test -- "should create agent" # Run tests matching pattern
+npx tsx --test lib/roles/registry.test.ts
 # Run by name pattern
 npx tsx --test --test-name-pattern "should have all expected roles" lib/**/*.test.ts
 ```
 ### Debug with Node inspector
 ```bash
-node --inspect-brk node_modules/.bin/vitest run
+node --inspect-brk node_modules/.bin/tsx --test lib/roles/registry.test.ts
 ```
-Then open Chrome DevTools at `chrome://inspect`
+Then open Chrome DevTools at `chrome://inspect`.
 ### View coverage report
 ```bash
 npm run test:coverage
 open coverage/index.html
 ```
 ## Adding Tests
 ### 1. Choose the right test file
 - New feature → `tests/setup/new-user.test.ts`
 - Migration feature → `tests/setup/existing-user.test.ts`
 - Multi-agent feature → `tests/setup/power-user.test.ts`
 ### 2. Write the test
 ```typescript
 import { describe, it, expect, beforeEach } from "vitest";
 import { MockFileSystem } from "../helpers/mock-fs.js";
 import { createNewUserConfig } from "../helpers/fixtures.js";
 import { assertAgentExists } from "../helpers/assertions.js";
 describe("My new feature", () => {
  let mockFs: MockFileSystem;
  beforeEach(() => {
    mockFs = new MockFileSystem(createNewUserConfig());
  });
  it("should do something useful", async () => {
    // GIVEN: initial state (via fixture)
    const beforeCount = countAgents(mockFs);
    // WHEN: execute the operation
    const config = mockFs.getConfig();
    config.agents.list.push({
      id: "test-agent",
      name: "Test Agent",
      workspace: "/home/test/.openclaw/workspace-test-agent",
      agentDir: "/home/test/.openclaw/agents/test-agent/agent",
    });
    mockFs.setConfig(config);
    // THEN: verify the outcome
    assertAgentExists(mockFs, "test-agent", "Test Agent");
    expect(countAgents(mockFs)).toBe(beforeCount + 1);
  });
 });
 ```
 ### 3. Run your test
 ```bash
 npm test -- "should do something useful"
 ```
 ## Best Practices
-### ✅ DO
+- **Use `node:test` + `node:assert/strict`** — no test framework dependencies
- Test one thing per test
+- **Use `createTestHarness()`** for any test that needs workspace state, providers, or command interception
- Use descriptive test names ("should create agent with telegram binding")
+- **Always call `h.cleanup()`** in `afterEach` to remove temp directories
- Use fixtures for initial state
+- **Seed provider state** with `h.provider.seedIssue()` before testing pipeline operations
- Use assertion helpers for readability
+- **Use `h.commands`** to verify what CLI commands were dispatched without actually running them
- Test error cases
+- **One assertion focus per test** — test one behavior, not the whole pipeline
-
+- **Test error cases** — invalid roles, missing projects, bad state transitions
 ### ❌ DON'T
 - Test implementation details (test behavior, not internals)
 - Share state between tests (use beforeEach)
 - Mock everything (only mock file system and commands)
 - Write brittle tests (avoid hard-coded UUIDs, timestamps)
 ## Test Metrics
 Current coverage:
 - **Lines:** Target 80%+
 - **Functions:** Target 90%+
 - **Branches:** Target 75%+
 Run `npm run test:coverage` to see detailed metrics.
--- a/docs/TOOLS.md
+++ b/docs/TOOLS.md
@@ -1,6 +1,6 @@
 # DevClaw — Tools Reference
-Complete reference for all 11 tools registered by DevClaw. See [`index.ts`](../index.ts) for registration.
+Complete reference for all tools registered by DevClaw. See [`index.ts`](../index.ts) for registration.
 ## Worker Lifecycle
@@ -17,9 +17,9 @@ Pick up a task from the issue queue. Handles level assignment, label transition,
 | Parameter | Type | Required | Description |
 |---|---|---|---|
 | `issueId` | number | No | Issue ID. If omitted, picks next by priority. |
-| `role` | `"dev"` \| `"qa"` | No | Worker role. Auto-detected from issue label if omitted. |
+| `role` | `"developer"` \| `"tester"` \| `"architect"` | No | Worker role. Auto-detected from issue label if omitted. |
 | `projectGroupId` | string | No | Project group ID. Auto-detected from group context. |
-| `level` | string | No | Developer level (`junior`, `medior`, `senior`, `reviewer`). Auto-detected if omitted. |
+| `level` | string | No | Level (`junior`, `medior`, `senior`). Auto-detected if omitted. |
 **What it does atomically:**
@@ -28,15 +28,14 @@ Pick up a task from the issue queue. Handles level assignment, label transition,
 3. Fetches issue from tracker, verifies correct label state
 4. Assigns level (LLM-chosen via `level` param → label detection → keyword heuristic fallback)
 5. Resolves level to model ID via config or defaults
-6. Loads prompt instructions from `projects/roles/<project>/<role>.md`
+6. Looks up existing session for assigned level (session-per-level)
-7. Looks up existing session for assigned level (session-per-level)
+7. Transitions label (e.g. `To Do` → `Doing`)
-8. Transitions label (e.g. `To Do` → `Doing`)
+8. Creates session via Gateway RPC if new (`sessions.patch`)
-9. Creates session via Gateway RPC if new (`sessions.patch`)
+9. Dispatches task to worker session via CLI (`openclaw gateway call agent`)
-10. Dispatches task to worker session via CLI (`openclaw gateway call agent`)
+10. Updates `projects.json` state (active, issueId, level, session key)
-11. Updates `projects.json` state (active, issueId, level, session key)
+11. Writes audit log entries (work_start + model_selection)
-12. Writes audit log entries (work_start + model_selection)
+12. Sends notification
-13. Sends notification
+13. Returns announcement text
 14. Returns announcement text
 **Level selection priority:**
@@ -55,7 +54,7 @@ Pick up a task from the issue queue. Handles level assignment, label transition,
 ### `work_finish`
-Complete a task with a result. Called by workers (DEV/QA sub-agent sessions) directly, or by the orchestrator.
+Complete a task with a result. Called by workers (DEVELOPER/TESTER/ARCHITECT sub-agent sessions) directly, or by the orchestrator.
 **Source:** [`lib/tools/work-finish.ts`](../lib/tools/work-finish.ts)
@@ -63,7 +62,7 @@ Complete a task with a result. Called by workers (DEV/QA sub-agent sessions) dir
 | Parameter | Type | Required | Description |
 |---|---|---|---|
-| `role` | `"dev"` \| `"qa"` | Yes | Worker role |
+| `role` | `"developer"` \| `"tester"` \| `"architect"` | Yes | Worker role |
 | `result` | string | Yes | Completion result (see table below) |
 | `projectGroupId` | string | Yes | Project group ID |
 | `summary` | string | No | Brief summary for the announcement |
@@ -73,12 +72,15 @@ Complete a task with a result. Called by workers (DEV/QA sub-agent sessions) dir
 | Role | Result | Label transition | Side effects |
 |---|---|---|---|
-| DEV | `"done"` | Doing → To Test | git pull, auto-detect PR URL |
+| developer | `"done"` | Doing → To Test | git pull, auto-detect PR URL |
-| DEV | `"blocked"` | Doing → To Do | Task returns to queue |
+| developer | `"review"` | Doing → In Review | auto-detect PR URL, heartbeat polls for merge |
-| QA | `"pass"` | Testing → Done | Issue closed |
+| developer | `"blocked"` | Doing → Refining | Awaits human decision |
-| QA | `"fail"` | Testing → To Improve | Issue reopened |
+| tester | `"pass"` | Testing → Done | Issue closed |
-| QA | `"refine"` | Testing → Refining | Awaits human decision |
+| tester | `"fail"` | Testing → To Improve | Issue reopened |
-| QA | `"blocked"` | Testing → To Test | Task returns to QA queue |
+| tester | `"refine"` | Testing → Refining | Awaits human decision |
 | tester | `"blocked"` | Testing → Refining | Awaits human decision |
 | architect | `"done"` | Designing → Planning | Design complete |
 | architect | `"blocked"` | Designing → Refining | Awaits human decision |
 **What it does atomically:**
@@ -111,7 +113,7 @@ Create a new issue in the project's issue tracker.
 | `description` | string | No | Full issue body (markdown) |
 | `label` | StateLabel | No | State label. Defaults to `"Planning"`. |
 | `assignees` | string[] | No | GitHub/GitLab usernames to assign |
-| `pickup` | boolean | No | If true, immediately pick up for DEV after creation |
+| `pickup` | boolean | No | If true, immediately pick up for DEVELOPER after creation |
 **Use cases:**
@@ -138,7 +140,7 @@ Change an issue's state label manually without going through the full pickup/com
 | `state` | StateLabel | Yes | New state label |
 | `reason` | string | No | Audit log reason for the change |
-**Valid states:** `Planning`, `To Do`, `Doing`, `To Test`, `Testing`, `Done`, `To Improve`, `Refining`
+**Valid states:** `Planning`, `To Do`, `Doing`, `To Test`, `Testing`, `Done`, `To Improve`, `Refining`, `In Review`, `To Design`, `Designing`
 **Use cases:**
@@ -161,12 +163,12 @@ Add a comment to an issue for feedback, notes, or discussion.
 | `projectGroupId` | string | Yes | Project group ID |
 | `issueId` | number | Yes | Issue ID to comment on |
 | `body` | string | Yes | Comment body (markdown) |
-| `authorRole` | `"dev"` \| `"qa"` \| `"orchestrator"` | No | Attribution role prefix |
+| `authorRole` | `"developer"` \| `"tester"` \| `"orchestrator"` | No | Attribution role prefix |
 **Use cases:**
- QA adds review feedback before pass/fail decision
+- TESTER adds review feedback before pass/fail decision
- DEV posts implementation notes or progress updates
+- DEVELOPER posts implementation notes or progress updates
 - Orchestrator adds summary comments
 When `authorRole` is provided, the comment is prefixed with a role emoji and attribution label.
@@ -191,7 +193,7 @@ Lightweight queue + worker state dashboard.
 **Returns per project:**
- Worker state: active/idle, current issue, level, start time
+- Worker state per role: active/idle, current issue, level, start time
 - Queue counts: To Do, To Test, To Improve
 - Role execution mode
@@ -226,7 +228,7 @@ Worker health scan with optional auto-fix.
 ### `work_heartbeat`
-Manual trigger for heartbeat: health fix + queue dispatch. Same logic as the background heartbeat service, but invoked on demand.
+Manual trigger for heartbeat: health fix + review polling + queue dispatch. Same logic as the background heartbeat service, but invoked on demand.
 **Source:** [`lib/tools/work-heartbeat.ts`](../lib/tools/work-heartbeat.ts)
@@ -239,15 +241,16 @@ Manual trigger for heartbeat: health fix + queue dispatch. Same logic as the bac
 | `maxPickups` | number | No | Max worker dispatches per tick. |
 | `activeSessions` | string[] | No | Active session IDs for zombie detection. |
-**Two-pass sweep:**
+**Three-pass sweep:**
 1. **Health pass** — Runs `checkWorkerHealth` per project per role. Auto-fixes zombies, stale workers, orphaned state.
-2. **Tick pass** — Calls `projectTick` per project. Fills free worker slots by priority (To Improve > To Test > To Do).
+2. **Review pass** — Polls PR status for issues in "In Review" state. Transitions to "To Test" when PR is merged.
 3. **Tick pass** — Calls `projectTick` per project. Fills free worker slots by priority (To Improve > To Test > To Do).
 **Execution guards:**
 - `projectExecution: "sequential"` — only one project active at a time
- `roleExecution: "sequential"` — only one role (DEV or QA) active at a time per project (enforced in `projectTick`)
+- `roleExecution: "sequential"` — only one role active at a time per project
 ---
@@ -272,18 +275,16 @@ One-time project setup. Creates state labels, scaffolds prompt files, adds proje
 | `baseBranch` | string | Yes | Base branch for development |
 | `deployBranch` | string | No | Deploy branch. Defaults to baseBranch. |
 | `deployUrl` | string | No | Deployment URL |
-| `roleExecution` | `"parallel"` \| `"sequential"` | No | DEV/QA parallelism. Default: `"parallel"`. |
+| `roleExecution` | `"parallel"` \| `"sequential"` | No | DEVELOPER/TESTER parallelism. Default: `"parallel"`. |
 **What it does atomically:**
 1. Validates project not already registered
 2. Resolves repo path, auto-detects GitHub/GitLab from git remote
 3. Verifies provider health (CLI installed and authenticated)
-4. Creates all 8 state labels (idempotent — safe to run again)
+4. Creates all 11 state labels (idempotent — safe to run again)
-5. Adds project entry to `projects.json` with empty worker state
+5. Adds project entry to `projects.json` with empty worker state for all registered roles
-   - DEV sessions: `{ junior: null, medior: null, senior: null }`
+6. Scaffolds prompt files: `devclaw/projects/<project>/prompts/<role>.md` for each role
   - QA sessions: `{ reviewer: null, tester: null }`
 6. Scaffolds prompt files: `projects/roles/<project>/dev.md` and `qa.md`
 7. Writes audit log
 ---
@@ -301,7 +302,7 @@ Agent + workspace initialization.
 | `newAgentName` | string | No | Create a new agent. Omit to configure current workspace. |
 | `channelBinding` | `"telegram"` \| `"whatsapp"` | No | Channel to bind (with `newAgentName` only) |
 | `migrateFrom` | string | No | Agent ID to migrate channel binding from |
-| `models` | object | No | Model overrides per role and level (see [Configuration](CONFIGURATION.md#model-tiers)) |
+| `models` | object | No | Model overrides per role and level (see [Configuration](CONFIGURATION.md#role-configuration)) |
 | `projectExecution` | `"parallel"` \| `"sequential"` | No | Project execution mode |
 **What it does:**
@@ -309,8 +310,8 @@ Agent + workspace initialization.
 1. Creates a new agent or configures existing workspace
 2. Optionally binds messaging channel (Telegram/WhatsApp)
 3. Optionally migrates channel binding from another agent
-4. Writes workspace files: AGENTS.md, HEARTBEAT.md, `projects/projects.json`
+4. Writes workspace files: AGENTS.md, HEARTBEAT.md, `devclaw/projects.json`, `devclaw/workflow.yaml`
-5. Configures model tiers in `openclaw.json`
+5. Scaffolds default prompt files for all roles
 ---
@@ -328,34 +329,47 @@ Conversational onboarding guide. Returns step-by-step instructions for the agent
 |---|---|---|---|
 | `mode` | `"first-run"` \| `"reconfigure"` | No | Auto-detected from current state |
-**Flow:**
+---
-1. Call `onboard` — returns QA-style step-by-step instructions
+### `design_task`
-2. Agent walks user through: agent selection, channel binding, model tiers
+
-3. Agent calls `setup` with collected answers
+Spawn an architect for a design investigation. Creates a "To Design" issue and dispatches an architect worker.
-4. User registers projects via `project_register` in group chats
+
 **Source:** [`lib/tools/design-task.ts`](../lib/tools/design-task.ts)
 **Parameters:**
 | Parameter | Type | Required | Description |
 |---|---|---|---|
 | `projectGroupId` | string | Yes | Project group ID |
 | `title` | string | Yes | Design task title |
 | `description` | string | No | Design problem description |
 | `level` | `"junior"` \| `"senior"` | No | Architect level. Default: `"junior"`. |
 ---
 ## Completion Rules Reference
-The pipeline service (`lib/services/pipeline.ts`) defines declarative completion rules:
+The pipeline service (`lib/services/pipeline.ts`) derives completion rules from the workflow config:
 ```
-dev:done    → Doing    → To Test     (git pull, detect PR)
+developer:done    → Doing     → To Test      (git pull, detect PR)
-dev:blocked → Doing    → To Do       (return to queue)
+developer:review  → Doing     → In Review    (detect PR, heartbeat polls for merge)
-qa:pass     → Testing  → Done        (close issue)
+developer:blocked → Doing     → Refining     (awaits human decision)
-qa:fail     → Testing  → To Improve  (reopen issue)
+tester:pass       → Testing   → Done         (close issue)
-qa:refine   → Testing  → Refining    (await human decision)
+tester:fail       → Testing   → To Improve   (reopen issue)
-qa:blocked  → Testing  → To Test     (return to QA queue)
+tester:refine     → Testing   → Refining     (awaits human decision)
 tester:blocked    → Testing   → Refining     (awaits human decision)
 architect:done    → Designing → Planning     (design complete)
 architect:blocked → Designing → Refining     (awaits human decision)
 ```
 ## Issue Priority Order
 When the heartbeat or `work_heartbeat` fills free worker slots, issues are prioritized:
-1. **To Improve** — QA failures get fixed first (highest priority)
+1. **To Improve** — Tester failures get fixed first (highest priority)
-2. **To Test** — Completed DEV work gets reviewed next
+2. **To Test** — Completed developer work gets reviewed next
 3. **To Do** — Fresh tasks are picked up last
 This ensures the pipeline clears its backlog before starting new work.