docs: overhaul documentation for consistency with implementation

Complete documentation rewrite to match the current codebase: - README: add benefits section (process consistency, token savings with estimates, project isolation, continuous planning, feedback loops, role-based prompts, atomic operations, audit trail), task workflow with state diagram, model-to-role mapping tables, installation guide - New TOOLS.md: complete reference for all 11 tools with parameters, behavior, and execution guards - New CONFIGURATION.md: full config reference for openclaw.json, projects.json, heartbeat, notifications, workspace layout - Fix tool names across all docs: task_pickup→work_start, task_complete→work_finish - Fix tier model: QA has reviewer/tester levels, not flat "qa" - Fix config schema: nested models.dev.*/models.qa.* structure - Fix prompt path: projects/roles/ not projects/prompts/ - Fix worker state: uses "level" field not "model"/"tier" - Fix MANAGEMENT.md: remove incorrect model references - Fix TESTING.md: update model config example to nested structure - Remove VERIFICATION.md (one-off checklist, no longer needed) - Add cross-references between all docs pages https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
2026-02-10 20:13:22 +00:00
parent ead4807797
commit 553efcc146
11 changed files with 1388 additions and 897 deletions
--- a/README.md
+++ b/README.md
@@ -2,38 +2,223 @@
  <img src="assets/DevClaw.png" width="300" alt="DevClaw Logo">
 </p>

-# DevClaw - Development Plugin for OpenClaw
+# DevClaw — Development Plugin for OpenClaw

 **Every group chat becomes an autonomous development team.**

-Add the agent to a Telegram/WhatsApp group, point it at a GitLab/GitHub repo — that group now has an **orchestrator** managing the backlog, a **DEV** worker session writing code, and a **QA** worker session reviewing it. All autonomous. Add another group, get another team. Each project runs in complete isolation with its own task queue, workers, and session state.
+Add an agent to a Telegram/WhatsApp group, point it at a GitHub/GitLab repo — that group now has an **orchestrator** managing the backlog, a **DEV** worker writing code, and a **QA** worker reviewing it. All autonomous. Add another group, get another team. Each project runs in complete isolation with its own task queue, workers, and session state.

 DevClaw is the [OpenClaw](https://openclaw.ai) plugin that makes this work.

-## Why
+## Benefits

-[OpenClaw](https://openclaw.ai) is great at giving AI agents the ability to develop software — spawn worker sessions, manage sessions, work with code. But running a real multi-project development pipeline exposes a gap: the orchestration layer between "agent can write code" and "agent reliably manages multiple projects" is brittle. Every task involves 10+ coordinated steps across GitLab labels, session state, model selection, and audit logging. Agents forget steps, corrupt state, null out session IDs they should preserve, or pick the wrong model for the job.
+### Process consistency

-DevClaw fills that gap with guardrails. It gives the orchestrator atomic tools that make it impossible to forget a label transition, lose a session reference, or skip an audit log entry. The complexity of multi-project orchestration moves from agent instructions (that LLMs follow imperfectly) into deterministic code (that runs the same way every time).
+Every task follows the same fixed pipeline — `Planning → To Do → Doing → To Test → Testing → Done` — across every project. Label transitions, state updates, session dispatch, and audit logging happen atomically inside the plugin. The orchestrator agent **cannot** skip a step, forget a label, or corrupt session state. Hundreds of lines of manual orchestration logic collapse into a single `work_start` call.

-## The idea
+### Token savings

-One orchestrator agent manages all your projects. It reads task backlogs, creates issues, decides priorities, and delegates work. For each task, DevClaw assigns a developer from your **team** — a junior, medior, or senior dev writes the code, then a QA engineer reviews it. Every Telegram/WhatsApp group is a separate project — the orchestrator keeps them completely isolated while managing them all from a single process.
+DevClaw reduces token consumption at three levels:

-DevClaw gives the orchestrator nine tools that replace hundreds of lines of manual orchestration logic. Instead of following a 10-step checklist per task (fetch issue, check labels, pick model, check for existing session, transition label, dispatch task, update state, log audit event...), it calls `task_pickup` and the plugin handles everything atomically — including session dispatch. Workers call `task_complete` themselves for atomic state updates, and can file follow-up issues via `task_create`.
+| Mechanism | How it works | Estimated savings |
+|---|---|---|
+| **Shared sessions** | Each developer level per role maintains one persistent session per project. When a medior dev finishes task A and picks up task B, the plugin reuses the existing session — no codebase re-reading. | **~40-60%** per task (~50K tokens saved per session reuse) |
+| **Tier selection** | Junior for typos (Haiku), medior for features (Sonnet), senior for architecture (Opus). The right model for the job means you're not burning Opus tokens on a CSS fix. | **~30-50%** on simple tasks vs. always using the largest model |
+| **Token-free heartbeat** | The heartbeat service runs every 60s doing health checks and queue dispatch using pure deterministic code + CLI calls. Zero LLM tokens consumed. Workers only use tokens when they actually process tasks. | **100%** savings on orchestration overhead |

-## Developer tiers
+### Project isolation and parallelization

-DevClaw uses a developer seniority model. Each tier maps to a configurable LLM model:
+Each project is fully isolated — separate task queue, separate worker state, separate sessions. No cross-project contamination. Two execution modes control parallelism:

-| Tier       | Role                | Default model                 | Assigns to                                        |
-| ---------- | ------------------- | ----------------------------- | ------------------------------------------------- |
-| **junior** | Junior developer    | `anthropic/claude-haiku-4-5`  | Typos, single-file fixes, simple changes          |
-| **medior** | Mid-level developer | `anthropic/claude-sonnet-4-5` | Features, bug fixes, multi-file changes           |
-| **senior** | Senior developer    | `anthropic/claude-opus-4-5`   | Architecture, migrations, system-wide refactoring |
-| **qa**     | QA engineer         | `anthropic/claude-sonnet-4-5` | Code review, test validation                      |
+- **Project-level**: DEV and QA can work simultaneously on different tasks (parallel, default) or one role at a time (sequential)
+- **Plugin-level**: Multiple projects can have active workers at once (parallel, default) or only one project active at a time (sequential)

-Configure which model each tier uses during setup or in `openclaw.json` plugin config.
+### Continuous planning
+
+The heartbeat service runs a continuous loop: health check → queue scan → dispatch. It detects stale workers (>2 hours), auto-reverts stuck labels, and fills free worker slots — all without human intervention or agent LLM tokens. The orchestrator agent only gets involved when a decision requires judgment.
+
+### Feedback loops
+
+Three automated feedback loops keep the pipeline self-correcting:
+
+1. **Auto-chaining** — DEV "done" automatically dispatches QA. QA "fail" automatically re-dispatches DEV. No orchestrator action needed.
+2. **Stale worker watchdog** — Workers active >2 hours are auto-detected. Labels revert to queue, workers deactivated. Tasks available for retry.
+3. **Completion enforcement** — Every task message includes a mandatory `work_finish` section. Workers use `"blocked"` if stuck. Three-layer guarantee prevents tasks from getting stuck forever.
+
+### Role-based instruction prompts
+
+Workers receive customizable, project-specific instructions loaded at dispatch time:
+
+```
+workspace/projects/roles/
+├── my-webapp/
+│   ├── dev.md     ← "Run npm test before committing. Deploy URL: ..."
+│   └── qa.md      ← "Check OAuth flow. Verify mobile responsiveness."
+└── default/
+    ├── dev.md     ← Fallback for projects without custom instructions
+    └── qa.md
+```
+
+Edit these files to inject deployment steps, test commands, acceptance criteria, or coding standards — per project, per role.
+
+### Atomic operations with rollback
+
+Every tool call wraps multiple operations (label transition + state update + session dispatch + audit log) into a single atomic action. If session dispatch fails, the label transition is rolled back. No orphaned state. No half-completed operations.
+
+### Full audit trail
+
+Every tool call automatically appends an NDJSON entry to `log/audit.log`. Query with `jq` to trace any task's full history. No manual logging required from the orchestrator.
+
+---
+
+## The model-to-role mapping
+
+DevClaw doesn't expose raw model names. You're assigning a _junior developer_ to fix a typo, not configuring `anthropic/claude-haiku-4-5`. Each developer level maps to a configurable LLM:
+
+### DEV levels
+
+| Level | Who they are | Default model | Assigns to |
+|---|---|---|---|
+| `junior` | The intern | `anthropic/claude-haiku-4-5` | Typos, single-file fixes, CSS changes |
+| `medior` | The reliable mid-level | `anthropic/claude-sonnet-4-5` | Features, bug fixes, multi-file changes |
+| `senior` | The architect | `anthropic/claude-opus-4-5` | Architecture, migrations, system-wide refactoring |
+
+### QA levels
+
+| Level | Who they are | Default model | Assigns to |
+|---|---|---|---|
+| `reviewer` | The code reviewer | `anthropic/claude-sonnet-4-5` | Code review, test validation, PR inspection |
+| `tester` | The QA tester | `anthropic/claude-haiku-4-5` | Manual testing, smoke tests |
+
+The orchestrator LLM evaluates each issue and picks the appropriate level. A keyword-based heuristic in `model-selector.ts` serves as fallback when the orchestrator omits the level. Override which model powers each level in [`openclaw.json`](docs/CONFIGURATION.md#model-tiers).
+
+---
+
+## Task workflow
+
+Every task (issue) moves through a fixed pipeline of label states. DevClaw tools handle every transition atomically.
+
+```mermaid
+stateDiagram-v2
+    [*] --> Planning
+    Planning --> ToDo: Ready for development
+
+    ToDo --> Doing: work_start (DEV) ⇄ blocked
+    Doing --> ToTest: work_finish (DEV done)
+
+    ToTest --> Testing: work_start (QA) / auto-chain ⇄ blocked
+    Testing --> Done: work_finish (QA pass)
+    Testing --> ToImprove: work_finish (QA fail)
+    Testing --> Refining: work_finish (QA refine)
+
+    ToImprove --> Doing: work_start (DEV fix) or auto-chain
+    Refining --> ToDo: Human decision
+
+    Done --> [*]
+```
+
+### The eight state labels
+
+| Label | Color | Meaning |
+|---|---|---|
+| **Planning** | Blue-grey | Pre-work review — issue exists but not ready for development |
+| **To Do** | Blue | Ready for DEV pickup |
+| **Doing** | Orange | DEV actively working |
+| **To Test** | Cyan | Ready for QA pickup |
+| **Testing** | Purple | QA actively reviewing |
+| **Done** | Green | Complete — issue closed |
+| **To Improve** | Red | QA failed — back to DEV |
+| **Refining** | Yellow | Awaiting human decision |
+
+### Worker self-reporting
+
+Workers call `work_finish` directly when they're done — no orchestrator involvement needed for the state transition. Workers can also call `task_create` to file follow-up issues they discover during work.
+
+### Auto-chaining
+
+When a project has auto-chaining enabled:
+
+- **DEV "done"** → QA is dispatched immediately (using the reviewer level)
+- **QA "fail"** → DEV fix is dispatched immediately (reuses previous DEV level)
+- **QA "pass" / "refine" / "blocked"** → no chaining (pipeline done, needs human input, or returned to queue)
+- **DEV "blocked"** → no chaining (returned to queue for retry)
+
+### Completion enforcement
+
+Three layers guarantee tasks never get stuck:
+
+1. **Completion contract** — Every task message includes a mandatory section requiring `work_finish`, even on failure. Workers use `"blocked"` if stuck.
+2. **Blocked result** — Both DEV and QA can gracefully put a task back in queue (`Doing → To Do`, `Testing → To Test`).
+3. **Stale worker watchdog** — Heartbeat detects workers active >2 hours and auto-reverts labels to queue.
+
+---
+
+## Installation
+
+### Requirements
+
+| Requirement | Why | Verify |
+|---|---|---|
+| [OpenClaw](https://openclaw.ai) | DevClaw is an OpenClaw plugin | `openclaw --version` |
+| Node.js >= 20 | Plugin runtime | `node --version` |
+| [`gh`](https://cli.github.com) or [`glab`](https://gitlab.com/gitlab-org/cli) CLI | Issue tracker provider (auto-detected from git remote) | `gh --version` / `glab --version` |
+| CLI authenticated | Plugin calls gh/glab for every label transition | `gh auth status` / `glab auth status` |
+
+### Install the plugin
+
+```bash
+cp -r devclaw ~/.openclaw/extensions/
+```
+
+Verify:
+
+```bash
+openclaw plugins list
+# Should show: DevClaw | devclaw | loaded
+```
+
+### Run setup
+
+Three options — pick one:
+
+**Option A: Conversational onboarding (recommended)**
+
+Call the `onboard` tool from any agent with DevClaw loaded. It walks through configuration step by step.
+
+**Option B: CLI wizard**
+
+```bash
+openclaw devclaw setup
+```
+
+**Option C: Non-interactive CLI**
+
+```bash
+openclaw devclaw setup --new-agent "My Orchestrator"
+```
+
+Setup creates an agent, configures model tiers, writes workspace files (AGENTS.md, HEARTBEAT.md, role templates), and optionally binds a messaging channel.
+
+### Register a project
+
+In the Telegram/WhatsApp group for the project:
+
+> "Register project my-app at ~/git/my-app with base branch main"
+
+The agent calls `project_register`, which atomically creates all 8 state labels, scaffolds role instruction files, and adds the project to `projects.json`.
+
+### Start working
+
+```
+"Check the queue"           → agent calls status
+"Pick up issue #1 for DEV"  → agent calls work_start
+[DEV works autonomously]    → calls work_finish when done
+[Heartbeat fills next slot] → QA dispatched automatically
+```
+
+See the [Onboarding Guide](docs/ONBOARDING.md) for detailed step-by-step instructions.
+
+---

 ## How it works

@@ -41,429 +226,114 @@ Configure which model each tier uses during setup or in `openclaw.json` plugin c
 graph TB
    subgraph "Group Chat A"
        direction TB
-        A_O["🎯 Orchestrator"]
-        A_GL[GitLab Issues]
-        A_DEV["🔧 DEV (worker session)"]
-        A_QA["🔍 QA (worker session)"]
-        A_O -->|task_pickup| A_GL
-        A_O -->|task_pickup dispatches| A_DEV
-        A_O -->|task_pickup dispatches| A_QA
+        A_O["Orchestrator"]
+        A_GL[GitHub/GitLab Issues]
+        A_DEV["DEV (worker session)"]
+        A_QA["QA (worker session)"]
+        A_O -->|work_start| A_GL
+        A_O -->|dispatches| A_DEV
+        A_O -->|dispatches| A_QA
    end

    subgraph "Group Chat B"
        direction TB
-        B_O["🎯 Orchestrator"]
-        B_GL[GitLab Issues]
-        B_DEV["🔧 DEV (worker session)"]
-        B_QA["🔍 QA (worker session)"]
-        B_O -->|task_pickup| B_GL
-        B_O -->|task_pickup dispatches| B_DEV
-        B_O -->|task_pickup dispatches| B_QA
-    end
-
-    subgraph "Group Chat C"
-        direction TB
-        C_O["🎯 Orchestrator"]
-        C_GL[GitLab Issues]
-        C_DEV["🔧 DEV (worker session)"]
-        C_QA["🔍 QA (worker session)"]
-        C_O -->|task_pickup| C_GL
-        C_O -->|task_pickup dispatches| C_DEV
-        C_O -->|task_pickup dispatches| C_QA
+        B_O["Orchestrator"]
+        B_GL[GitHub/GitLab Issues]
+        B_DEV["DEV (worker session)"]
+        B_QA["QA (worker session)"]
+        B_O -->|work_start| B_GL
+        B_O -->|dispatches| B_DEV
+        B_O -->|dispatches| B_QA
    end

    AGENT["Single OpenClaw Agent"]
    AGENT --- A_O
    AGENT --- B_O
-    AGENT --- C_O
 ```

-It's the same agent process — but each group chat gives it a different project context. The orchestrator role, the workers, the task queue, and all state are fully isolated per group.
+Same agent process — each group chat gives it a different project context. The orchestrator role, the workers, the task queue, and all state are fully isolated per group.

-## Task lifecycle
-
-Every task (GitLab issue) moves through a fixed pipeline of label states. Issues are created by the orchestrator agent or by worker sessions — not manually. DevClaw tools handle every transition atomically — label change, state update, audit log, and session management in a single call.
-
-```mermaid
-stateDiagram-v2
-    [*] --> Planning
-    Planning --> ToDo: Ready for development
-
-    ToDo --> Doing: task_pickup (DEV) ⇄ blocked
-    Doing --> ToTest: task_complete (DEV done)
-
-    ToTest --> Testing: task_pickup (QA) / auto-chain ⇄ blocked
-    Testing --> Done: task_complete (QA pass)
-    Testing --> ToImprove: task_complete (QA fail)
-    Testing --> Refining: task_complete (QA refine)
-
-    ToImprove --> Doing: task_pickup (DEV fix) or auto-chain
-    Refining --> ToDo: Human decision
-
-    Done --> [*]
-```
-
-### Worker self-reporting
-
-Workers (DEV/QA sub-agent sessions) call `task_complete` directly when they finish — no orchestrator involvement needed for the state transition. Workers can also call `task_create` to file follow-up issues they discover during work.
-
-### Completion enforcement
-
-Three layers guarantee that `task_complete` always runs, preventing tasks from getting stuck in "Doing" or "Testing" forever:
-
-1. **Completion contract** — Every task message includes a mandatory section requiring the worker to call `task_complete`, even on failure. Workers use `"blocked"` if stuck.
-2. **Blocked result** — Both DEV and QA can return `"blocked"` to gracefully put a task back in queue (`Doing → To Do`, `Testing → To Test`) instead of silently dying.
-3. **Stale worker watchdog** — The heartbeat health check detects workers active >2 hours and auto-reverts labels to queue, catching sessions that crashed or ran out of context.
-
-### Auto-chaining
-
-When a project has `autoChain: true`, `task_complete` automatically dispatches the next step:
-
- **DEV "done"** → QA is dispatched immediately (using the qa tier)
- **QA "fail"** → DEV fix is dispatched immediately (reuses previous DEV tier)
- **QA "pass" / "refine" / "blocked"** → no chaining (pipeline done, needs human input, or returned to queue)
- **DEV "blocked"** → no chaining (returned to queue for retry)
-
-When `autoChain` is false, `task_complete` returns a `nextAction` hint for the orchestrator to act on.
+---

 ## Session reuse

-Worker sessions are expensive to start — each new spawn requires the session to read the full codebase (~50K tokens). DevClaw maintains **separate sessions per tier per role** (session-per-tier design). When a medior dev finishes task A and picks up task B on the same project, the plugin detects the existing session and sends the task directly — no new session needed.
+Worker sessions are expensive to start — each new spawn reads the full codebase (~50K tokens). DevClaw maintains **separate sessions per level per role** (session-per-level design). When a medior dev finishes task A and picks up task B on the same project, the plugin detects the existing session and sends the task directly.

-The plugin handles session dispatch internally via OpenClaw CLI. The orchestrator agent never calls `sessions_spawn` or `sessions_send` — it just calls `task_pickup` and the plugin does the rest.
+The plugin handles session dispatch internally via OpenClaw CLI. The orchestrator agent never calls `sessions_spawn` or `sessions_send` — it calls `work_start` and the plugin does the rest.

 ```mermaid
 sequenceDiagram
    participant O as Orchestrator
    participant DC as DevClaw Plugin
-    participant GL as GitLab
+    participant IT as Issue Tracker
    participant S as Worker Session

-    O->>DC: task_pickup({ issueId: 42, role: "dev" })
-    DC->>GL: Fetch issue, verify label
-    DC->>DC: Assign tier (junior/medior/senior)
-    DC->>DC: Check existing session for assigned tier
-    DC->>GL: Transition label (To Do → Doing)
+    O->>DC: work_start({ issueId: 42, role: "dev" })
+    DC->>IT: Fetch issue, verify label
+    DC->>DC: Assign level (junior/medior/senior)
+    DC->>DC: Check existing session for assigned level
+    DC->>IT: Transition label (To Do → Doing)
    DC->>S: Dispatch task via CLI (create or reuse session)
    DC->>DC: Update projects.json, write audit log
-    DC-->>O: { success: true, announcement: "🔧 DEV (medior) picking up #42" }
+    DC-->>O: { success: true, announcement: "..." }
 ```

-## Developer assignment
-
-The orchestrator LLM evaluates each issue's title, description, and labels to assign the appropriate developer tier, then passes it to `task_pickup` via the `model` parameter. This gives the LLM full context for the decision — it can weigh factors like codebase familiarity, task dependencies, and recent failure history that keyword matching would miss.
-
-The keyword heuristic in `model-selector.ts` serves as a **fallback only**, used when the orchestrator omits the `model` parameter.
-
-| Tier   | Role                | When                                                        |
-| ------ | ------------------- | ----------------------------------------------------------- |
-| junior | Junior developer    | Typos, CSS, renames, copy changes                           |
-| medior | Mid-level developer | Features, bug fixes, multi-file changes                     |
-| senior | Senior developer    | Architecture, migrations, security, system-wide refactoring |
-| qa     | QA engineer         | All QA tasks (code review, test validation)                 |
-
-## State management
-
-All project state lives in a single `projects/projects.json` file in the orchestrator's workspace, keyed by Telegram group ID:
-
-```json
-{
-  "projects": {
-    "-1234567890": {
-      "name": "my-webapp",
-      "repo": "~/git/my-webapp",
-      "groupName": "Dev - My Webapp",
-      "baseBranch": "development",
-      "autoChain": true,
-      "dev": {
-        "active": false,
-        "issueId": null,
-        "model": "medior",
-        "sessions": {
-          "junior": "agent:orchestrator:subagent:a9e4d078-...",
-          "medior": "agent:orchestrator:subagent:b3f5c912-...",
-          "senior": null
-        }
-      },
-      "qa": {
-        "active": false,
-        "issueId": null,
-        "model": "qa",
-        "sessions": {
-          "qa": "agent:orchestrator:subagent:18707821-..."
-        }
-      }
-    }
-  }
-}
-```
-
-Key design decisions:
-
- **Session-per-tier** — each tier gets its own worker session, accumulating context independently. Tier selection maps directly to a session key.
- **Sessions preserved on completion** — when a worker completes a task, `sessions` map is **preserved** (only `active` and `issueId` are cleared). This enables session reuse on the next pickup.
- **Plugin-controlled dispatch** — the plugin creates and dispatches to sessions via OpenClaw CLI (`sessions.patch` + `openclaw agent`). The orchestrator agent never calls `sessions_spawn` or `sessions_send`.
- **Sessions persist indefinitely** — no auto-cleanup. `session_health` handles manual cleanup when needed.
-
-All writes go through atomic temp-file-then-rename to prevent corruption.
+---

 ## Tools

-### `devclaw_setup`
+DevClaw registers **11 tools**, grouped by function:

-Set up DevClaw in an agent's workspace. Creates AGENTS.md, HEARTBEAT.md, role templates, and configures models. Can optionally create a new agent.
+### Worker lifecycle

-**Parameters:**
+| Tool | Description |
+|---|---|
+| [`work_start`](docs/TOOLS.md#work_start) | Pick up a task — handles level assignment, label transition, session dispatch, audit |
+| [`work_finish`](docs/TOOLS.md#work_finish) | Complete a task — handles label transition, state update, auto-chaining, queue tick |

- `newAgentName` (string, optional) — Create a new agent with this name
- `models` (object, optional) — Model overrides per tier: `{ junior, medior, senior, qa }`
+### Task management

-### `task_pickup`
+| Tool | Description |
+|---|---|
+| [`task_create`](docs/TOOLS.md#task_create) | Create a new issue in the tracker |
+| [`task_update`](docs/TOOLS.md#task_update) | Change an issue's state label manually |
+| [`task_comment`](docs/TOOLS.md#task_comment) | Add a comment to an issue |

-Pick up a task from the issue queue for a DEV or QA worker.
+### Operations

-**Parameters:**
+| Tool | Description |
+|---|---|
+| [`status`](docs/TOOLS.md#status) | Queue counts + worker state dashboard |
+| [`health`](docs/TOOLS.md#health) | Worker health checks + zombie detection |
+| [`work_heartbeat`](docs/TOOLS.md#work_heartbeat) | Manual trigger for health + queue dispatch |

- `issueId` (number, required) — Issue ID
- `role` ("dev" | "qa", required) — Worker role
- `projectGroupId` (string, required) — Telegram group ID
- `model` (string, optional) — Developer tier (junior, medior, senior, qa). The orchestrator should evaluate the task complexity and choose. Falls back to keyword heuristic if omitted.
+### Setup

-**What it does atomically:**
+| Tool | Description |
+|---|---|
+| [`project_register`](docs/TOOLS.md#project_register) | One-time project setup (labels, prompts, state) |
+| [`setup`](docs/TOOLS.md#setup) | Agent + workspace initialization |
+| [`onboard`](docs/TOOLS.md#onboard) | Conversational onboarding guide |

-1. Resolves project from `projects.json`
-2. Validates no active worker for this role
-3. Fetches issue from issue tracker, verifies correct label state
-4. Assigns tier (LLM-chosen via `model` param, keyword heuristic fallback)
-5. Loads prompt instructions from `projects/prompts/<project>/<role>.md`
-6. Looks up existing session for assigned tier (session-per-tier)
-7. Transitions label (e.g. `To Do` → `Doing`)
-8. Creates session via Gateway RPC if new (`sessions.patch`)
-9. Dispatches task to worker session via CLI (`openclaw agent`) with role instructions appended
-10. Updates `projects.json` state (active, issueId, tier, session key)
-11. Writes audit log entry
-12. Returns announcement text for the orchestrator to post
+See the [Tools Reference](docs/TOOLS.md) for full parameters and usage.

-### `task_complete`
+---

-Complete a task with a result. Called by workers (DEV/QA sub-agent sessions) directly, or by the orchestrator.
+## Documentation

-**Parameters:**
+| Document | Description |
+|---|---|
+| [Architecture](docs/ARCHITECTURE.md) | System design, session-per-level model, data flow, component interactions |
+| [Tools Reference](docs/TOOLS.md) | Complete reference for all 11 tools with parameters and examples |
+| [Configuration](docs/CONFIGURATION.md) | Full config reference — `openclaw.json`, `projects.json`, heartbeat, notifications |
+| [Onboarding Guide](docs/ONBOARDING.md) | Step-by-step setup: install, configure, register projects, test the pipeline |
+| [QA Workflow](docs/QA_WORKFLOW.md) | QA process: review documentation, comment templates, enforcement |
+| [Context Awareness](docs/CONTEXT-AWARENESS.md) | How DevClaw adapts behavior based on interaction context |
+| [Testing Guide](docs/TESTING.md) | Automated test suite: scenarios, fixtures, CI/CD integration |
+| [Management Theory](docs/MANAGEMENT.md) | The delegation theory behind DevClaw's design |
+| [Roadmap](docs/ROADMAP.md) | Planned features: configurable roles, channel-agnostic groups, Jira |

- `role` ("dev" | "qa", required)
- `result` ("done" | "pass" | "fail" | "refine" | "blocked", required)
- `projectGroupId` (string, required)
- `summary` (string, optional) — For the Telegram announcement
-
-**Results:**
-
- **DEV "done"** — Pulls latest code, moves label `Doing` → `To Test`, deactivates worker. If `autoChain` enabled, automatically dispatches QA.
- **DEV "blocked"** — Moves label `Doing` → `To Do`, deactivates worker. Task returns to queue for retry.
- **QA "pass"** — Moves label `Testing` → `Done`, closes issue, deactivates worker
- **QA "fail"** — Moves label `Testing` → `To Improve`, reopens issue. If `autoChain` enabled, automatically dispatches DEV fix (reuses previous DEV tier).
- **QA "refine"** — Moves label `Testing` → `Refining`, awaits human decision
- **QA "blocked"** — Moves label `Testing` → `To Test`, deactivates worker. Task returns to QA queue for retry.
-
-### `task_update`
-
-Change an issue's state label programmatically without going through the full pickup/complete flow.
-
-**Parameters:**
-
- `projectGroupId` (string, required) — Telegram/WhatsApp group ID
- `issueId` (number, required) — Issue ID to update
- `state` (string, required) — New state label (Planning, To Do, Doing, To Test, Testing, Done, To Improve, Refining)
- `reason` (string, optional) — Audit log reason for the change
-
-**Use cases:**
- Manual state adjustments (e.g., Planning → To Do after approval)
- Failed auto-transitions that need correction
- Bulk state changes by orchestrator
-
-### `task_comment`
-
-Add a comment to an issue for feedback, notes, or discussion.
-
-**Parameters:**
-
- `projectGroupId` (string, required) — Telegram/WhatsApp group ID
- `issueId` (number, required) — Issue ID to comment on
- `body` (string, required) — Comment body in markdown
- `authorRole` ("dev" | "qa" | "orchestrator", optional) — Attribution role
-
-**Use cases:**
- QA adds review feedback without blocking pass/fail
- DEV posts implementation notes or progress updates
- Orchestrator adds summary comments
-
-### `task_create`
-
-Create a new issue in the project's issue tracker. Used by workers to file follow-up bugs, or by the orchestrator to create tasks from chat.
-
-**Parameters:**
-
- `projectGroupId` (string, required) — Telegram group ID
- `title` (string, required) — Issue title
- `description` (string, optional) — Full issue body in markdown
- `label` (string, optional) — State label (defaults to "Planning")
- `assignees` (string[], optional) — Usernames to assign
- `pickup` (boolean, optional) — If true, immediately pick up for DEV after creation
-
-### `queue_status`
-
-Returns task queue counts and worker status across all projects (or a specific one).
-
-**Parameters:**
-
- `projectGroupId` (string, optional) — Omit for all projects
-
-### `session_health`
-
-Detects and optionally fixes state inconsistencies.
-
-**Parameters:**
-
- `autoFix` (boolean, optional) — Auto-fix zombies and stale state
-
-**What it does:**
-
- Queries live sessions via Gateway RPC (`sessions.list`)
- Cross-references with `projects.json` worker state
-
-**Checks:**
-
- Active worker with no session key (critical, auto-fixable)
- Active worker whose session is dead — zombie (critical, auto-fixable)
- Worker active for >2 hours — stale watchdog (warning, auto-fixable: reverts label to queue)
- Inactive worker with lingering issue ID (warning, auto-fixable)
-
-### `project_register`
-
-Register a new project with DevClaw. Creates all required issue tracker labels (idempotent), scaffolds role instruction files, and adds the project to `projects.json`. One-time setup per project. Auto-detects GitHub/GitLab from git remote.
-
-**Parameters:**
-
- `projectGroupId` (string, required) — Telegram group ID (key in projects.json)
- `name` (string, required) — Short project name
- `repo` (string, required) — Path to git repo (e.g. `~/git/my-project`)
- `groupName` (string, required) — Telegram group display name
- `baseBranch` (string, required) — Base branch for development
- `deployBranch` (string, optional) — Defaults to baseBranch
- `deployUrl` (string, optional) — Deployment URL
-
-**What it does atomically:**
-
-1. Validates project not already registered
-2. Resolves repo path, auto-detects GitHub/GitLab, and verifies access
-3. Creates all 8 state labels (idempotent — safe to run on existing projects)
-4. Adds project entry to `projects.json` with empty worker state and `autoChain: false`
-5. Scaffolds prompt instruction files: `projects/prompts/<project>/dev.md` and `projects/prompts/<project>/qa.md`
-6. Writes audit log entry
-7. Returns announcement text
-
-## Audit logging
-
-Every tool call automatically appends an NDJSON entry to `log/audit.log`. No manual logging required from the orchestrator agent.
-
-```jsonl
-{"ts":"2026-02-08T10:30:00Z","event":"task_pickup","project":"my-webapp","issue":42,"role":"dev","tier":"medior","sessionAction":"send"}
-{"ts":"2026-02-08T10:30:01Z","event":"model_selection","issue":42,"role":"dev","tier":"medior","reason":"Standard dev task"}
-{"ts":"2026-02-08T10:45:00Z","event":"task_complete","project":"my-webapp","issue":42,"role":"dev","result":"done"}
-```
-
-## Quick start
-
-```bash
-# 1. Install the plugin
-cp -r devclaw ~/.openclaw/extensions/
-
-# 2. Run setup (interactive — creates agent, configures models, writes workspace files)
-openclaw devclaw setup
-
-# 3. Add bot to Telegram group, then register a project
-# (via the agent in Telegram)
-```
-
-See the [Onboarding Guide](docs/ONBOARDING.md) for detailed instructions.
-
-## Configuration
-
-Model tier configuration in `openclaw.json`:
-
-```json
-{
-  "plugins": {
-    "entries": {
-      "devclaw": {
-        "config": {
-          "models": {
-            "junior": "anthropic/claude-haiku-4-5",
-            "medior": "anthropic/claude-sonnet-4-5",
-            "senior": "anthropic/claude-opus-4-5",
-            "qa": "anthropic/claude-sonnet-4-5"
-          }
-        }
-      }
-    }
-  }
-}
-```
-
-Restrict tools to your orchestrator agent only:
-
-```json
-{
-  "agents": {
-    "list": [
-      {
-        "id": "my-orchestrator",
-        "tools": {
-          "allow": [
-            "devclaw_setup",
-            "task_pickup",
-            "task_complete",
-            "task_update",
-            "task_comment",
-            "task_create",
-            "queue_status",
-            "session_health",
-            "project_register"
-          ]
-        }
-      }
-    ]
-  }
-}
-```
-
-> DevClaw uses an `IssueProvider` interface to abstract issue tracker operations. GitLab (via `glab` CLI) and GitHub (via `gh` CLI) are supported — the provider is auto-detected from the git remote URL. Jira is planned.
-
-## Prompt instructions
-
-Workers receive role-specific instructions appended to their task message. `project_register` scaffolds editable files:
-
-```
-workspace/
-├── projects/
-│   ├── projects.json     ← project state
-│   └── prompts/
-│       ├── my-webapp/    ← per-project prompts (edit to customize)
-│       │   ├── dev.md
-│       │   └── qa.md
-│       └── another-project/
-│           ├── dev.md
-│           └── qa.md
-├── log/
-│   └── audit.log         ← NDJSON event log
-```
-
-`task_pickup` loads `projects/prompts/<project>/<role>.md`. Edit these files to customize worker behavior per project — for example, adding project-specific deployment steps or test commands.
-
-## Requirements
-
- [OpenClaw](https://openclaw.ai)
- Node.js >= 20
- [`glab`](https://gitlab.com/gitlab-org/cli) CLI installed and authenticated (GitLab provider), or [`gh`](https://cli.github.com) CLI (GitHub provider)
+---

 ## License

--- a/VERIFICATION.md
+++ b/VERIFICATION.md
@@ -1,45 +0,0 @@
-# Verification: task_create Default State
-
-## Issue #115 Request
-Change default state for new tasks from "To Do" to "Planning"
-
-## Current Implementation Status
-**Already implemented** - The default has been "Planning" since initial commit.
-
-### Code Evidence
-File: `lib/tools/task-create.ts` (line 68)
-```typescript
-const label = (params.label as StateLabel) ?? "Planning";
-```
-
-### Documentation Evidence
-File: `README.md` (line 308)
-```
- `label` (string, optional) — State label (defaults to "Planning")
-```
-
-### Tool Description
-The tool description itself states:
-```
-The issue is created with a state label (defaults to "Planning").
-```
-
-## Timeline
- **Feb 9, 2026** (commit 8a79755e): Initial task_create implementation with "Planning" default
- **Feb 10, 2026**: Issue #115 created requesting this change (already done)
-
-## Verification Test
-Default behavior can be verified by calling task_create without specifying a label:
-
-```javascript
-task_create({
-  projectGroupId: "-5239235162",
-  title: "Test Issue"
-  // label parameter omitted - should default to "Planning"
-})
-```
-
-Expected result: Issue created with "Planning" label, NOT "To Do"
-
-## Conclusion
-The requested feature is already fully implemented. No code changes needed.
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -6,59 +6,59 @@ Understanding the OpenClaw model is key to understanding how DevClaw works:

 - **Agent** — A configured entity in `openclaw.json`. Has a workspace, model, identity files (SOUL.md, IDENTITY.md), and tool permissions. Persists across restarts.
 - **Session** — A runtime conversation instance. Each session has its own context window and conversation history, stored as a `.jsonl` transcript file.
- **Sub-agent session** — A session created under the orchestrator agent for a specific worker role. NOT a separate agent — it's a child session running under the same agent, with its own isolated context. Format: `agent:<parent>:subagent:<uuid>`.
+- **Sub-agent session** — A session created under the orchestrator agent for a specific worker role. NOT a separate agent — it's a child session running under the same agent, with its own isolated context. Format: `agent:<parent>:subagent:<project>-<role>-<level>`.

-### Session-per-tier design
+### Session-per-level design

-Each project maintains **separate sessions per developer tier per role**. A project's DEV might have a junior session, a medior session, and a senior session — each accumulating its own codebase context over time.
+Each project maintains **separate sessions per developer level per role**. A project's DEV might have a junior session, a medior session, and a senior session — each accumulating its own codebase context over time.

 ```
 Orchestrator Agent (configured in openclaw.json)
  └─ Main session (long-lived, handles all projects)
       │
       ├─ Project A
-       │    ├─ DEV sessions: { junior: <uuid>, medior: <uuid>, senior: null }
-       │    └─ QA sessions:  { qa: <uuid> }
+       │    ├─ DEV sessions: { junior: <key>, medior: <key>, senior: null }
+       │    └─ QA sessions:  { reviewer: <key>, tester: null }
       │
       └─ Project B
-            ├─ DEV sessions: { junior: null, medior: <uuid>, senior: null }
-            └─ QA sessions:  { qa: <uuid> }
+            ├─ DEV sessions: { junior: null, medior: <key>, senior: null }
+            └─ QA sessions:  { reviewer: <key>, tester: null }
 ```

-Why per-tier instead of switching models on one session:
+Why per-level instead of switching models on one session:
 - **No model switching overhead** — each session always uses the same model
 - **Accumulated context** — a junior session that's done 20 typo fixes knows the project well; a medior session that's done 5 features knows it differently
 - **No cross-model confusion** — conversation history stays with the model that generated it
- **Deterministic reuse** — tier selection directly maps to a session key, no patching needed
+- **Deterministic reuse** — level selection directly maps to a session key, no patching needed

 ### Plugin-controlled session lifecycle

 DevClaw controls the **full** session lifecycle end-to-end. The orchestrator agent never calls `sessions_spawn` or `sessions_send` — the plugin handles session creation and task dispatch internally using the OpenClaw CLI:

 ```
-Plugin dispatch (inside task_pickup):
-  1. Assign tier, look up session, decide spawn vs send
+Plugin dispatch (inside work_start):
+  1. Assign level, look up session, decide spawn vs send
  2. New session:  openclaw gateway call sessions.patch → create entry + set model
-                   openclaw agent --session-id <key> --message "task..."
-  3. Existing:     openclaw agent --session-id <key> --message "task..."
+                   openclaw gateway call agent → dispatch task
+  3. Existing:     openclaw gateway call agent → dispatch task to existing session
  4. Return result to orchestrator (announcement text, no session instructions)
 ```

-The agent's only job after `task_pickup` returns is to post the announcement to Telegram. Everything else — tier assignment, session creation, task dispatch, state update, audit logging — is deterministic plugin code.
+The agent's only job after `work_start` returns is to post the announcement to Telegram. Everything else — level assignment, session creation, task dispatch, state update, audit logging — is deterministic plugin code.

 **Why this matters:** Previously the plugin returned instructions like `{ sessionAction: "spawn", model: "sonnet" }` and the agent had to correctly call `sessions_spawn` with the right params. This was the fragile handoff point where agents would forget `cleanup: "keep"`, use wrong models, or corrupt session state. Moving dispatch into the plugin eliminates that entire class of errors.

-**Session persistence:** Sessions created via `sessions.patch` persist indefinitely (no auto-cleanup). The plugin manages lifecycle explicitly through `session_health`.
+**Session persistence:** Sessions created via `sessions.patch` persist indefinitely (no auto-cleanup). The plugin manages lifecycle explicitly through the `health` tool.

 **What we trade off vs. registered sub-agents:**

 | Feature | Sub-agent system | Plugin-controlled | DevClaw equivalent |
 |---|---|---|---|
 | Auto-reporting | Sub-agent reports to parent | No | Heartbeat polls for completion |
-| Concurrency control | `maxConcurrent` | No | `task_pickup` checks `active` flag |
+| Concurrency control | `maxConcurrent` | No | `work_start` checks `active` flag |
 | Lifecycle tracking | Parent-child registry | No | `projects.json` tracks all sessions |
-| Timeout detection | `runTimeoutSeconds` | No | `session_health` flags stale >2h |
-| Cleanup | Auto-archive | No | `session_health` manual cleanup |
+| Timeout detection | `runTimeoutSeconds` | No | `health` flags stale >2h |
+| Cleanup | Auto-archive | No | `health` manual cleanup |

 DevClaw provides equivalent guardrails for everything except auto-reporting, which the heartbeat handles.

@@ -74,22 +74,22 @@ graph TB
    subgraph "OpenClaw Runtime"
        MS[Main Session<br/>orchestrator agent]
        GW[Gateway RPC<br/>sessions.patch / sessions.list]
-        CLI[openclaw agent CLI]
+        CLI[openclaw gateway call agent]
        DEV_J[DEV session<br/>junior]
        DEV_M[DEV session<br/>medior]
        DEV_S[DEV session<br/>senior]
-        QA_E[QA session<br/>qa]
+        QA_R[QA session<br/>reviewer]
    end

    subgraph "DevClaw Plugin"
-        TP[task_pickup]
-        TC[task_complete]
+        WS[work_start]
+        WF[work_finish]
        TCR[task_create]
-        QS[queue_status]
-        SH[session_health]
+        ST[status]
+        SH[health]
        PR[project_register]
-        DS[devclaw_setup]
-        TIER[Tier Resolver]
+        DS[setup]
+        TIER[Level Resolver]
        PJ[projects.json]
        AL[audit.log]
    end
@@ -103,34 +103,34 @@ graph TB
    TG -->|delivers| MS
    MS -->|announces| TG

-    MS -->|calls| TP
-    MS -->|calls| TC
+    MS -->|calls| WS
+    MS -->|calls| WF
    MS -->|calls| TCR
-    MS -->|calls| QS
+    MS -->|calls| ST
    MS -->|calls| SH
    MS -->|calls| PR
    MS -->|calls| DS

-    TP -->|resolves tier| TIER
-    TP -->|transitions labels| GL
-    TP -->|reads/writes| PJ
-    TP -->|appends| AL
-    TP -->|creates session| GW
-    TP -->|dispatches task| CLI
+    WS -->|resolves level| TIER
+    WS -->|transitions labels| GL
+    WS -->|reads/writes| PJ
+    WS -->|appends| AL
+    WS -->|creates session| GW
+    WS -->|dispatches task| CLI

-    TC -->|transitions labels| GL
-    TC -->|closes/reopens| GL
-    TC -->|reads/writes| PJ
-    TC -->|git pull| REPO
-    TC -->|auto-chain dispatch| CLI
-    TC -->|appends| AL
+    WF -->|transitions labels| GL
+    WF -->|closes/reopens| GL
+    WF -->|reads/writes| PJ
+    WF -->|git pull| REPO
+    WF -->|auto-chain dispatch| CLI
+    WF -->|appends| AL

    TCR -->|creates issue| GL
    TCR -->|appends| AL

-    QS -->|lists issues by label| GL
-    QS -->|reads| PJ
-    QS -->|appends| AL
+    ST -->|lists issues by label| GL
+    ST -->|reads| PJ
+    ST -->|appends| AL

    SH -->|reads/writes| PJ
    SH -->|checks sessions| GW
@@ -144,12 +144,12 @@ graph TB
    CLI -->|sends task| DEV_J
    CLI -->|sends task| DEV_M
    CLI -->|sends task| DEV_S
-    CLI -->|sends task| QA_E
+    CLI -->|sends task| QA_R

    DEV_J -->|writes code, creates MRs| REPO
    DEV_M -->|writes code, creates MRs| REPO
    DEV_S -->|writes code, creates MRs| REPO
-    QA_E -->|reviews code, tests| REPO
+    QA_R -->|reviews code, tests| REPO
 ```

 ## End-to-end flow: human to sub-agent
@@ -163,7 +163,7 @@ sequenceDiagram
    participant MS as Main Session<br/>(orchestrator)
    participant DC as DevClaw Plugin
    participant GW as Gateway RPC
-    participant CLI as openclaw agent CLI
+    participant CLI as openclaw gateway call agent
    participant DEV as DEV Session<br/>(medior)
    participant GL as Issue Tracker

@@ -171,34 +171,34 @@ sequenceDiagram

    H->>TG: "check status" (or heartbeat triggers)
    TG->>MS: delivers message
-    MS->>DC: queue_status()
-    DC->>GL: glab issue list --label "To Do"
+    MS->>DC: status()
+    DC->>GL: list issues by label "To Do"
    DC-->>MS: { toDo: [#42], dev: idle }

    Note over MS: Decides to pick up #42 for DEV as medior

-    MS->>DC: task_pickup({ issueId: 42, role: "dev", model: "medior", ... })
-    DC->>DC: resolve tier "medior" → model ID
+    MS->>DC: work_start({ issueId: 42, role: "dev", level: "medior", ... })
+    DC->>DC: resolve level "medior" → model ID
    DC->>DC: lookup dev.sessions.medior → null (first time)
-    DC->>GL: glab issue update 42 --unlabel "To Do" --label "Doing"
+    DC->>GL: transition label "To Do" → "Doing"
    DC->>GW: sessions.patch({ key: new-session-key, model: "anthropic/claude-sonnet-4-5" })
-    DC->>CLI: openclaw agent --session-id <key> --message "Build login page for #42..."
+    DC->>CLI: openclaw gateway call agent --params { sessionKey, message }
    CLI->>DEV: creates session, delivers task
    DC->>DC: store session key in projects.json + append audit.log
-    DC-->>MS: { success: true, announcement: "🔧 DEV (medior) picking up #42" }
+    DC-->>MS: { success: true, announcement: "🔧 Spawning DEV (medior) for #42" }

-    MS->>TG: "🔧 DEV (medior) picking up #42: Add login page"
+    MS->>TG: "🔧 Spawning DEV (medior) for #42: Add login page"
    TG->>H: sees announcement

    Note over DEV: Works autonomously — reads code, writes code, creates MR
-    Note over DEV: Calls task_complete when done
+    Note over DEV: Calls work_finish when done

-    DEV->>DC: task_complete({ role: "dev", result: "done", ... })
-    DC->>GL: glab issue update 42 --unlabel "Doing" --label "To Test"
+    DEV->>DC: work_finish({ role: "dev", result: "done", ... })
+    DC->>GL: transition label "Doing" → "To Test"
    DC->>DC: deactivate worker (sessions preserved)
-    DC-->>DEV: { announcement: "✅ DEV done #42" }
+    DC-->>DEV: { announcement: "✅ DEV DONE #42" }

-    MS->>TG: "✅ DEV done #42 — moved to QA queue"
+    MS->>TG: "✅ DEV DONE #42 — moved to QA queue"
    TG->>H: sees announcement
 ```

@@ -208,16 +208,16 @@ On the **next DEV task** for this project that also assigns medior:
 sequenceDiagram
    participant MS as Main Session
    participant DC as DevClaw Plugin
-    participant CLI as openclaw agent CLI
+    participant CLI as openclaw gateway call agent
    participant DEV as DEV Session<br/>(medior, existing)

-    MS->>DC: task_pickup({ issueId: 57, role: "dev", model: "medior", ... })
-    DC->>DC: resolve tier "medior" → model ID
+    MS->>DC: work_start({ issueId: 57, role: "dev", level: "medior", ... })
+    DC->>DC: resolve level "medior" → model ID
    DC->>DC: lookup dev.sessions.medior → existing key!
    Note over DC: No sessions.patch needed — session already exists
-    DC->>CLI: openclaw agent --session-id <key> --message "Fix validation for #57..."
+    DC->>CLI: openclaw gateway call agent --params { sessionKey, message }
    CLI->>DEV: delivers task to existing session (has full codebase context)
-    DC-->>MS: { success: true, announcement: "⚡ DEV (medior) picking up #57" }
+    DC-->>MS: { success: true, announcement: "⚡ Sending DEV (medior) for #57" }
 ```

 Session reuse saves ~50K tokens per task by not re-reading the codebase.
@@ -228,118 +228,118 @@ This traces a single issue from creation to completion, showing every component

 ### Phase 1: Issue created

-Issues are created by the orchestrator agent or by sub-agent sessions via `glab`. The orchestrator can create issues based on user requests in Telegram, backlog planning, or QA feedback. Sub-agents can also create issues when they discover bugs or related work during development.
+Issues are created by the orchestrator agent or by sub-agent sessions via `task_create` or directly via `gh`/`glab`. The orchestrator can create issues based on user requests in Telegram, backlog planning, or QA feedback. Sub-agents can also create issues when they discover bugs during development.

 ```
-Orchestrator Agent → Issue Tracker: creates issue #42 with label "To Do"
+Orchestrator Agent → Issue Tracker: creates issue #42 with label "Planning"
 ```

-**State:** Issue tracker has issue #42 labeled "To Do". Nothing in DevClaw yet.
+**State:** Issue tracker has issue #42 labeled "Planning". Nothing in DevClaw yet.

 ### Phase 2: Heartbeat detects work

 ```
-Heartbeat triggers → Orchestrator calls queue_status()
+Heartbeat triggers → Orchestrator calls status()
 ```

 ```mermaid
 sequenceDiagram
    participant A as Orchestrator
-    participant QS as queue_status
+    participant QS as status
    participant GL as Issue Tracker
    participant PJ as projects.json
    participant AL as audit.log

-    A->>QS: queue_status({ projectGroupId: "-123" })
+    A->>QS: status({ projectGroupId: "-123" })
    QS->>PJ: readProjects()
    PJ-->>QS: { dev: idle, qa: idle }
-    QS->>GL: glab issue list --label "To Do"
+    QS->>GL: list issues by label "To Do"
    GL-->>QS: [{ id: 42, title: "Add login page" }]
-    QS->>GL: glab issue list --label "To Test"
+    QS->>GL: list issues by label "To Test"
    GL-->>QS: []
-    QS->>GL: glab issue list --label "To Improve"
+    QS->>GL: list issues by label "To Improve"
    GL-->>QS: []
-    QS->>AL: append { event: "queue_status", ... }
+    QS->>AL: append { event: "status", ... }
    QS-->>A: { dev: idle, queue: { toDo: [#42] } }
 ```

-**Orchestrator decides:** DEV is idle, issue #42 is in To Do → pick it up. Evaluates complexity → assigns medior tier.
+**Orchestrator decides:** DEV is idle, issue #42 is in To Do → pick it up. Evaluates complexity → assigns medior level.

 ### Phase 3: DEV pickup

-The plugin handles everything end-to-end — tier resolution, session lookup, label transition, state update, **and** task dispatch to the worker session. The agent's only job after is to post the announcement.
+The plugin handles everything end-to-end — level resolution, session lookup, label transition, state update, **and** task dispatch to the worker session. The agent's only job after is to post the announcement.

 ```mermaid
 sequenceDiagram
    participant A as Orchestrator
-    participant TP as task_pickup
+    participant WS as work_start
    participant GL as Issue Tracker
-    participant TIER as Tier Resolver
+    participant TIER as Level Resolver
    participant GW as Gateway RPC
-    participant CLI as openclaw agent CLI
+    participant CLI as openclaw gateway call agent
    participant PJ as projects.json
    participant AL as audit.log

-    A->>TP: task_pickup({ issueId: 42, role: "dev", projectGroupId: "-123", model: "medior" })
-    TP->>PJ: readProjects()
-    TP->>GL: glab issue view 42 --output json
-    GL-->>TP: { title: "Add login page", labels: ["To Do"] }
-    TP->>TP: Verify label is "To Do" ✓
-    TP->>TIER: resolve "medior" → "anthropic/claude-sonnet-4-5"
-    TP->>PJ: lookup dev.sessions.medior
-    TP->>GL: glab issue update 42 --unlabel "To Do" --label "Doing"
+    A->>WS: work_start({ issueId: 42, role: "dev", projectGroupId: "-123", level: "medior" })
+    WS->>PJ: readProjects()
+    WS->>GL: getIssue(42)
+    GL-->>WS: { title: "Add login page", labels: ["To Do"] }
+    WS->>WS: Verify label is "To Do"
+    WS->>TIER: resolve "medior" → "anthropic/claude-sonnet-4-5"
+    WS->>PJ: lookup dev.sessions.medior
+    WS->>GL: transitionLabel(42, "To Do", "Doing")
    alt New session
-        TP->>GW: sessions.patch({ key: new-key, model: "anthropic/claude-sonnet-4-5" })
+        WS->>GW: sessions.patch({ key: new-key, model: "anthropic/claude-sonnet-4-5" })
    end
-    TP->>CLI: openclaw agent --session-id <key> --message "task..."
-    TP->>PJ: activateWorker + store session key
-    TP->>AL: append task_pickup + model_selection
-    TP-->>A: { success: true, announcement: "🔧 ..." }
+    WS->>CLI: openclaw gateway call agent --params { sessionKey, message }
+    WS->>PJ: activateWorker + store session key
+    WS->>AL: append work_start + model_selection
+    WS-->>A: { success: true, announcement: "🔧 ..." }
 ```

 **Writes:**
 - `Issue Tracker`: label "To Do" → "Doing"
- `projects.json`: dev.active=true, dev.issueId="42", dev.model="medior", dev.sessions.medior=key
- `audit.log`: 2 entries (task_pickup, model_selection)
+- `projects.json`: dev.active=true, dev.issueId="42", dev.level="medior", dev.sessions.medior=key
+- `audit.log`: 2 entries (work_start, model_selection)
 - `Session`: task message delivered to worker session via CLI

 ### Phase 4: DEV works

 ```
 DEV sub-agent session → reads codebase, writes code, creates MR
-DEV sub-agent session → calls task_complete({ role: "dev", result: "done", ... })
+DEV sub-agent session → calls work_finish({ role: "dev", result: "done", ... })
 ```

-This happens inside the OpenClaw session. The worker calls `task_complete` directly for atomic state updates. If the worker discovers unrelated bugs, it calls `task_create` to file them.
+This happens inside the OpenClaw session. The worker calls `work_finish` directly for atomic state updates. If the worker discovers unrelated bugs, it calls `task_create` to file them.

 ### Phase 5: DEV complete (worker self-reports)

 ```mermaid
 sequenceDiagram
    participant DEV as DEV Session
-    participant TC as task_complete
+    participant WF as work_finish
    participant GL as Issue Tracker
    participant PJ as projects.json
    participant AL as audit.log
    participant REPO as Git Repo
    participant QA as QA Session (auto-chain)

-    DEV->>TC: task_complete({ role: "dev", result: "done", projectGroupId: "-123", summary: "Login page with OAuth" })
-    TC->>PJ: readProjects()
-    PJ-->>TC: { dev: { active: true, issueId: "42" } }
-    TC->>REPO: git pull
-    TC->>PJ: deactivateWorker(-123, dev)
+    DEV->>WF: work_finish({ role: "dev", result: "done", projectGroupId: "-123", summary: "Login page with OAuth" })
+    WF->>PJ: readProjects()
+    PJ-->>WF: { dev: { active: true, issueId: "42" } }
+    WF->>REPO: git pull
+    WF->>PJ: deactivateWorker(-123, dev)
    Note over PJ: active→false, issueId→null<br/>sessions map PRESERVED
-    TC->>GL: transition label "Doing" → "To Test"
-    TC->>AL: append { event: "task_complete", role: "dev", result: "done" }
+    WF->>GL: transitionLabel "Doing" → "To Test"
+    WF->>AL: append { event: "work_finish", role: "dev", result: "done" }

    alt autoChain enabled
-        TC->>GL: transition label "To Test" → "Testing"
-        TC->>QA: dispatchTask(role: "qa", tier: "qa")
-        TC->>PJ: activateWorker(-123, qa)
-        TC-->>DEV: { announcement: "✅ DEV done #42", autoChain: { dispatched: true, role: "qa" } }
+        WF->>GL: transitionLabel "To Test" → "Testing"
+        WF->>QA: dispatchTask(role: "qa", level: "reviewer")
+        WF->>PJ: activateWorker(-123, qa)
+        WF-->>DEV: { announcement: "✅ DEV DONE #42", autoChain: { dispatched: true, role: "qa" } }
    else autoChain disabled
-        TC-->>DEV: { announcement: "✅ DEV done #42", nextAction: "qa_pickup" }
+        WF-->>DEV: { announcement: "✅ DEV DONE #42", nextAction: "qa_pickup" }
    end
 ```

@@ -347,30 +347,30 @@ sequenceDiagram
 - `Git repo`: pulled latest (has DEV's merged code)
 - `projects.json`: dev.active=false, dev.issueId=null (sessions map preserved for reuse)
 - `Issue Tracker`: label "Doing" → "To Test" (+ "To Test" → "Testing" if auto-chain)
- `audit.log`: 1 entry (task_complete) + optional auto-chain entries
+- `audit.log`: 1 entry (work_finish) + optional auto-chain entries

 ### Phase 6: QA pickup

-Same as Phase 3, but with `role: "qa"`. Label transitions "To Test" → "Testing". Uses the qa tier.
+Same as Phase 3, but with `role: "qa"`. Label transitions "To Test" → "Testing". Uses the reviewer level.

-### Phase 7: QA result (3 possible outcomes)
+### Phase 7: QA result (4 possible outcomes)

 #### 7a. QA Pass

 ```mermaid
 sequenceDiagram
-    participant A as Orchestrator
-    participant TC as task_complete
+    participant QA as QA Session
+    participant WF as work_finish
    participant GL as Issue Tracker
    participant PJ as projects.json
    participant AL as audit.log

-    A->>TC: task_complete({ role: "qa", result: "pass", projectGroupId: "-123" })
-    TC->>PJ: deactivateWorker(-123, qa)
-    TC->>GL: glab issue update 42 --unlabel "Testing" --label "Done"
-    TC->>GL: glab issue close 42
-    TC->>AL: append { event: "task_complete", role: "qa", result: "pass" }
-    TC-->>A: { announcement: "🎉 QA PASS #42. Issue closed." }
+    QA->>WF: work_finish({ role: "qa", result: "pass", projectGroupId: "-123" })
+    WF->>PJ: deactivateWorker(-123, qa)
+    WF->>GL: transitionLabel(42, "Testing", "Done")
+    WF->>GL: closeIssue(42)
+    WF->>AL: append { event: "work_finish", role: "qa", result: "pass" }
+    WF-->>QA: { announcement: "🎉 QA PASS #42. Issue closed." }
 ```

 **Ticket complete.** Issue closed, label "Done".
@@ -379,18 +379,18 @@ sequenceDiagram

 ```mermaid
 sequenceDiagram
-    participant A as Orchestrator
-    participant TC as task_complete
+    participant QA as QA Session
+    participant WF as work_finish
    participant GL as Issue Tracker
    participant PJ as projects.json
    participant AL as audit.log

-    A->>TC: task_complete({ role: "qa", result: "fail", projectGroupId: "-123", summary: "OAuth redirect broken" })
-    TC->>PJ: deactivateWorker(-123, qa)
-    TC->>GL: glab issue update 42 --unlabel "Testing" --label "To Improve"
-    TC->>GL: glab issue reopen 42
-    TC->>AL: append { event: "task_complete", role: "qa", result: "fail" }
-    TC-->>A: { announcement: "❌ QA FAIL #42 — OAuth redirect broken. Sent back to DEV." }
+    QA->>WF: work_finish({ role: "qa", result: "fail", projectGroupId: "-123", summary: "OAuth redirect broken" })
+    WF->>PJ: deactivateWorker(-123, qa)
+    WF->>GL: transitionLabel(42, "Testing", "To Improve")
+    WF->>GL: reopenIssue(42)
+    WF->>AL: append { event: "work_finish", role: "qa", result: "fail" }
+    WF-->>QA: { announcement: "❌ QA FAIL #42 — OAuth redirect broken. Sent back to DEV." }
 ```

 **Cycle restarts:** Issue goes to "To Improve". Next heartbeat, DEV picks it up again (Phase 3, but from "To Improve" instead of "To Do").
@@ -414,39 +414,35 @@ Worker cannot complete (missing info, environment errors, etc.). Issue returns t

 ### Completion enforcement

-Three layers guarantee that `task_complete` always runs:
+Three layers guarantee that `work_finish` always runs:

-1. **Completion contract** — Every task message sent to a worker session includes a mandatory `## MANDATORY: Task Completion` section listing available results and requiring `task_complete` even on failure. Workers are instructed to use `"blocked"` if stuck.
+1. **Completion contract** — Every task message sent to a worker session includes a mandatory `## MANDATORY: Task Completion` section listing available results and requiring `work_finish` even on failure. Workers are instructed to use `"blocked"` if stuck.

 2. **Blocked result** — Both DEV and QA can use `"blocked"` to gracefully return a task to queue without losing work. DEV blocked: `Doing → To Do`. QA blocked: `Testing → To Test`. This gives workers an escape hatch instead of silently dying.

-3. **Stale worker watchdog** — The heartbeat's health check detects workers active for >2 hours. With `autoFix=true`, it deactivates the worker and reverts the label back to queue. This catches sessions that crashed, ran out of context, or otherwise failed without calling `task_complete`. The `session_health` tool provides the same check for manual invocation.
+3. **Stale worker watchdog** — The heartbeat's health check detects workers active for >2 hours. With `fix=true`, it deactivates the worker and reverts the label back to queue. This catches sessions that crashed, ran out of context, or otherwise failed without calling `work_finish`. The `health` tool provides the same check for manual invocation.

 ### Phase 8: Heartbeat (continuous)

-The heartbeat runs periodically (triggered by the agent or a scheduled message). It combines health check + queue scan:
+The heartbeat runs periodically (via background service or manual `work_heartbeat` trigger). It combines health check + queue scan:

 ```mermaid
 sequenceDiagram
-    participant A as Orchestrator
-    participant SH as session_health
-    participant QS as queue_status
-    participant TP as task_pickup
-    Note over A: Heartbeat triggered
+    participant HB as Heartbeat Service
+    participant SH as health check
+    participant TK as projectTick
+    participant WS as work_start (dispatch)
+    Note over HB: Tick triggered (every 60s)

-    A->>SH: session_health({ autoFix: true })
-    Note over SH: Checks sessions via Gateway RPC (sessions.list)
-    SH-->>A: { healthy: true }
+    HB->>SH: checkWorkerHealth per project per role
+    Note over SH: Checks for zombies, stale workers
+    SH-->>HB: { fixes applied }

-    A->>QS: queue_status()
-    QS-->>A: { projects: [{ dev: idle, queue: { toDo: [#43], toTest: [#44] } }] }
-
-    Note over A: DEV idle + To Do #43 → assign medior
-    A->>TP: task_pickup({ issueId: 43, role: "dev", model: "medior", ... })
-    Note over TP: Plugin handles everything:<br/>tier resolve → session lookup →<br/>label transition → dispatch task →<br/>state update → audit log
-
-    Note over A: QA idle + To Test #44 → assign qa
-    A->>TP: task_pickup({ issueId: 44, role: "qa", model: "qa", ... })
+    HB->>TK: projectTick per project
+    Note over TK: Scans queue: To Improve > To Test > To Do
+    TK->>WS: dispatchTask (fill free slots)
+    WS-->>TK: { dispatched }
+    TK-->>HB: { pickups, skipped }
 ```

 ## Data flow map
@@ -458,21 +454,23 @@ Every piece of data and where it lives:
 │ Issue Tracker (source of truth for tasks)                       │
 │                                                                 │
 │  Issue #42: "Add login page"                                    │
-│  Labels: [To Do | Doing | To Test | Testing | Done | ...]       │
+│  Labels: [Planning | To Do | Doing | To Test | Testing | ...]   │
 │  State: open / closed                                           │
 │  MRs/PRs: linked merge/pull requests                            │
 │  Created by: orchestrator (task_create), workers, or humans     │
 └─────────────────────────────────────────────────────────────────┘
-        ↕ glab/gh CLI (read/write, auto-detected)
+        ↕ gh/glab CLI (read/write, auto-detected)
 ┌─────────────────────────────────────────────────────────────────┐
 │ DevClaw Plugin (orchestration logic)                            │
 │                                                                 │
-│  devclaw_setup  → agent creation + workspace + model config    │
-│  task_pickup    → tier + label + dispatch + role instr (e2e)   │
-│  task_complete  → label + state + git pull + auto-chain        │
+│  setup          → agent creation + workspace + model config     │
+│  work_start     → level + label + dispatch + role instr (e2e)   │
+│  work_finish    → label + state + git pull + auto-chain         │
 │  task_create    → create issue in tracker                       │
-│  queue_status   → read labels + read state                     │
-│  session_health → check sessions + fix zombies                 │
+│  task_update    → manual label state change                     │
+│  task_comment   → add comment to issue                          │
+│  status         → read labels + read state                      │
+│  health         → check sessions + fix zombies                  │
 │  project_register → labels + prompts + state init (one-time)    │
 └─────────────────────────────────────────────────────────────────┘
        ↕ atomic file I/O          ↕ OpenClaw CLI (plugin shells out)
@@ -481,39 +479,40 @@ Every piece of data and where it lives:
 │                                │ │ (called by plugin, not agent)│
 │  Per project:                  │ │                              │
 │    dev:                        │ │  openclaw gateway call       │
-│      active, issueId, model    │ │    sessions.patch → create   │
+│      active, issueId, level    │ │    sessions.patch → create   │
 │      sessions:                 │ │    sessions.list  → health   │
 │        junior: <key>           │ │    sessions.delete → cleanup │
 │        medior: <key>           │ │                              │
-│        senior: <key>           │ │  openclaw agent              │
-│    qa:                         │ │    --session-id <key>        │
-│      active, issueId, model    │ │    --message "task..."       │
+│        senior: <key>           │ │  openclaw gateway call agent │
+│    qa:                         │ │    --params { sessionKey,    │
+│      active, issueId, level    │ │      message, agentId }      │
 │      sessions:                 │ │    → dispatches to session   │
-│        qa: <key>               │ │                              │
+│        reviewer: <key>         │ │                              │
+│        tester: <key>           │ │                              │
 └────────────────────────────────┘ └──────────────────────────────┘
        ↕ append-only
 ┌─────────────────────────────────────────────────────────────────┐
 │ log/audit.log (observability)                                   │
 │                                                                 │
 │  NDJSON, one line per event:                                    │
-│  task_pickup, task_complete, model_selection,                   │
-│  queue_status, health_check, session_spawn, session_reuse,     │
-│  project_register, devclaw_setup                                │
+│  work_start, work_finish, model_selection,                      │
+│  status, health, task_create, task_update,                      │
+│  task_comment, project_register, setup, heartbeat_tick          │
 │                                                                 │
-│  Query with: cat audit.log | jq 'select(.event=="task_pickup")' │
+│  Query: cat audit.log | jq 'select(.event=="work_start")'      │
 └─────────────────────────────────────────────────────────────────┘

 ┌─────────────────────────────────────────────────────────────────┐
-│ Telegram (user-facing messages)                                 │
+│ Telegram / WhatsApp (user-facing messages)                      │
 │                                                                 │
 │  Per group chat:                                                │
 │    "🔧 Spawning DEV (medior) for #42: Add login page"          │
 │    "⚡ Sending DEV (medior) for #57: Fix validation"            │
-│    "✅ DEV done #42 — Login page with OAuth. Moved to QA queue."│
+│    "✅ DEV DONE #42 — Login page with OAuth."                   │
 │    "🎉 QA PASS #42. Issue closed."                              │
-│    "❌ QA FAIL #42 — OAuth redirect broken. Sent back to DEV."  │
-│    "🚫 DEV BLOCKED #42 — Missing dependencies. Returned to queue."│
-│    "🚫 QA BLOCKED #42 — Env not available. Returned to QA queue."│
+│    "❌ QA FAIL #42 — OAuth redirect broken."                    │
+│    "🚫 DEV BLOCKED #42 — Missing dependencies."                │
+│    "🚫 QA BLOCKED #42 — Env not available."                    │
 └─────────────────────────────────────────────────────────────────┘

 ┌─────────────────────────────────────────────────────────────────┐
@@ -521,7 +520,7 @@ Every piece of data and where it lives:
 │                                                                 │
 │  DEV sub-agent sessions: read code, write code, create MRs      │
 │  QA sub-agent sessions: read code, run tests, review MRs        │
-│  task_complete (DEV done): git pull to sync latest               │
+│  work_finish (DEV done): git pull to sync latest                │
 └─────────────────────────────────────────────────────────────────┘
 ```

@@ -553,7 +552,7 @@ graph LR
    subgraph "Sub-agent sessions handle"
        CR[Code writing]
        MR[MR creation/review]
-        TC_W[Task completion<br/>via task_complete]
+        WF_W[Task completion<br/>via work_finish]
        BUG[Bug filing<br/>via task_create]
    end

@@ -565,20 +564,22 @@ graph LR

 ## IssueProvider abstraction

-All issue tracker operations go through the `IssueProvider` interface, defined in `lib/issue-provider.ts`. This abstraction allows DevClaw to support multiple issue trackers without changing tool logic.
+All issue tracker operations go through the `IssueProvider` interface, defined in `lib/providers/provider.ts`. This abstraction allows DevClaw to support multiple issue trackers without changing tool logic.

 **Interface methods:**
 - `ensureLabel` / `ensureAllStateLabels` — idempotent label creation
+- `createIssue` — create issue with label and assignees
 - `listIssuesByLabel` / `getIssue` — issue queries
 - `transitionLabel` — atomic label state transition (unlabel + label)
 - `closeIssue` / `reopenIssue` — issue lifecycle
 - `hasStateLabel` / `getCurrentStateLabel` — label inspection
- `hasMergedMR` — MR/PR verification
+- `hasMergedMR` / `getMergedMRUrl` — MR/PR verification
+- `addComment` — add comment to issue
 - `healthCheck` — verify provider connectivity

 **Current providers:**
- **GitLab** (`lib/providers/gitlab.ts`) — wraps `glab` CLI
 - **GitHub** (`lib/providers/github.ts`) — wraps `gh` CLI
+- **GitLab** (`lib/providers/gitlab.ts`) — wraps `glab` CLI

 **Planned providers:**
 - **Jira** — via REST API
@@ -589,16 +590,16 @@ Provider selection is handled by `createProvider()` in `lib/providers/index.ts`.

 | Failure | Detection | Recovery |
 |---|---|---|
-| Session dies mid-task | `session_health` checks via `sessions.list` Gateway RPC | `autoFix`: reverts label, clears active state, removes dead session from sessions map. Next heartbeat picks up task again (creates fresh session for that tier). |
-| glab command fails | Plugin tool throws error, returns to agent | Agent retries or reports to Telegram group |
-| `openclaw agent` CLI fails | Plugin catches error during dispatch | Plugin rolls back: reverts label, clears active state. Returns error to agent for reporting. |
-| `sessions.patch` fails | Plugin catches error during session creation | Plugin rolls back label transition. Returns error. No orphaned state. |
+| Session dies mid-task | `health` checks via `sessions.list` Gateway RPC | `fix=true`: reverts label, clears active state. Next heartbeat picks up task again (creates fresh session for that level). |
+| gh/glab command fails | Plugin tool throws error, returns to agent | Agent retries or reports to Telegram group |
+| `openclaw gateway call agent` fails | Plugin catches error during dispatch | Plugin rolls back: reverts label, clears active state. Returns error. No orphaned state. |
+| `sessions.patch` fails | Plugin catches error during session creation | Plugin rolls back label transition. Returns error. |
 | projects.json corrupted | Tool can't parse JSON | Manual fix needed. Atomic writes (temp+rename) prevent partial writes. |
-| Label out of sync | `task_pickup` verifies label before transitioning | Throws error if label doesn't match expected state. Agent reports mismatch. |
-| Worker already active | `task_pickup` checks `active` flag | Throws error: "DEV worker already active on project". Must complete current task first. |
-| Stale worker (>2h) | `session_health` and heartbeat health check | `autoFix`: deactivates worker, reverts label to queue (To Do / To Test). Task available for next pickup. |
-| Worker stuck/blocked | Worker calls `task_complete` with `"blocked"` | Deactivates worker, reverts label to queue. Issue available for retry. |
-| `project_register` fails | Plugin catches error during label creation or state write | Clean error returned. No partial state — labels are idempotent, projects.json not written until all labels succeed. |
+| Label out of sync | `work_start` verifies label before transitioning | Throws error if label doesn't match expected state. |
+| Worker already active | `work_start` checks `active` flag | Throws error: "DEV already active on project". Must complete current task first. |
+| Stale worker (>2h) | `health` and heartbeat health check | `fix=true`: deactivates worker, reverts label to queue. Task available for next pickup. |
+| Worker stuck/blocked | Worker calls `work_finish` with `"blocked"` | Deactivates worker, reverts label to queue. Issue available for retry. |
+| `project_register` fails | Plugin catches error during label creation or state write | Clean error returned. Labels are idempotent, projects.json not written until all labels succeed. |

 ## File locations

@@ -606,8 +607,9 @@ Provider selection is handled by `createProvider()` in `lib/providers/index.ts`.
 |---|---|---|
 | Plugin source | `~/.openclaw/extensions/devclaw/` | Plugin code |
 | Plugin manifest | `~/.openclaw/extensions/devclaw/openclaw.plugin.json` | Plugin registration |
-| Agent config | `~/.openclaw/openclaw.json` | Agent definition + tool permissions + tier config |
+| Agent config | `~/.openclaw/openclaw.json` | Agent definition + tool permissions + model config |
 | Worker state | `~/.openclaw/workspace-<agent>/projects/projects.json` | Per-project DEV/QA state |
+| Role instructions | `~/.openclaw/workspace-<agent>/projects/roles/<project>/` | Per-project `dev.md` and `qa.md` |
 | Audit log | `~/.openclaw/workspace-<agent>/log/audit.log` | NDJSON event log |
 | Session transcripts | `~/.openclaw/agents/<agent>/sessions/<uuid>.jsonl` | Conversation history per session |
 | Git repos | `~/git/<project>/` | Project source code |
--- a/docs/CONFIGURATION.md
+++ b/docs/CONFIGURATION.md
@@ -0,0 +1,354 @@
+# DevClaw — Configuration Reference
+
+All DevClaw configuration lives in two places: `openclaw.json` (plugin-level settings) and `projects.json` (per-project state).
+
+## Plugin Configuration (`openclaw.json`)
+
+DevClaw is configured under `plugins.entries.devclaw.config` in `openclaw.json`.
+
+### Model Tiers
+
+Override which LLM model powers each developer level:
+
+```json
+{
+  "plugins": {
+    "entries": {
+      "devclaw": {
+        "config": {
+          "models": {
+            "dev": {
+              "junior": "anthropic/claude-haiku-4-5",
+              "medior": "anthropic/claude-sonnet-4-5",
+              "senior": "anthropic/claude-opus-4-5"
+            },
+            "qa": {
+              "reviewer": "anthropic/claude-sonnet-4-5",
+              "tester": "anthropic/claude-haiku-4-5"
+            }
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+**Resolution order** (per `lib/tiers.ts:resolveModel`):
+
+1. Plugin config `models.<role>.<level>` — explicit override
+2. `DEFAULT_MODELS[role][level]` — built-in defaults (table below)
+3. Passthrough — treat the level string as a raw model ID
+
+**Default models:**
+
+| Role | Level | Default model |
+|---|---|---|
+| dev | junior | `anthropic/claude-haiku-4-5` |
+| dev | medior | `anthropic/claude-sonnet-4-5` |
+| dev | senior | `anthropic/claude-opus-4-5` |
+| qa | reviewer | `anthropic/claude-sonnet-4-5` |
+| qa | tester | `anthropic/claude-haiku-4-5` |
+
+### Project Execution Mode
+
+Controls cross-project parallelism:
+
+```json
+{
+  "plugins": {
+    "entries": {
+      "devclaw": {
+        "config": {
+          "projectExecution": "parallel"
+        }
+      }
+    }
+  }
+}
+```
+
+| Value | Behavior |
+|---|---|
+| `"parallel"` (default) | Multiple projects can have active workers simultaneously |
+| `"sequential"` | Only one project's workers active at a time. Useful for single-agent deployments. |
+
+Enforced in `work_heartbeat` and the heartbeat service before dispatching.
+
+### Heartbeat Service
+
+Token-free interval-based health checks + queue dispatch:
+
+```json
+{
+  "plugins": {
+    "entries": {
+      "devclaw": {
+        "config": {
+          "work_heartbeat": {
+            "enabled": true,
+            "intervalSeconds": 60,
+            "maxPickupsPerTick": 4
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+| Setting | Type | Default | Description |
+|---|---|---|---|
+| `enabled` | boolean | `true` | Enable the heartbeat service |
+| `intervalSeconds` | number | `60` | Seconds between ticks |
+| `maxPickupsPerTick` | number | `4` | Maximum worker dispatches per tick (budget control) |
+
+**Source:** [`lib/services/heartbeat.ts`](../lib/services/heartbeat.ts)
+
+The heartbeat service runs as a plugin service tied to the gateway lifecycle. Every tick: health pass (auto-fix zombies, stale workers) → tick pass (fill free slots by priority). Zero LLM tokens consumed.
+
+### Notifications
+
+Control which lifecycle events send notifications:
+
+```json
+{
+  "plugins": {
+    "entries": {
+      "devclaw": {
+        "config": {
+          "notifications": {
+            "heartbeatDm": true,
+            "workerStart": true,
+            "workerComplete": true
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+| Setting | Default | Description |
+|---|---|---|
+| `heartbeatDm` | `true` | Send heartbeat summary to orchestrator DM |
+| `workerStart` | `true` | Announce when a worker picks up a task |
+| `workerComplete` | `true` | Announce when a worker finishes a task |
+
+### DevClaw Agent IDs
+
+List which agents are recognized as DevClaw orchestrators (used for context detection):
+
+```json
+{
+  "plugins": {
+    "entries": {
+      "devclaw": {
+        "config": {
+          "devClawAgentIds": ["my-orchestrator"]
+        }
+      }
+    }
+  }
+}
+```
+
+### Agent Tool Permissions
+
+Restrict DevClaw tools to your orchestrator agent:
+
+```json
+{
+  "agents": {
+    "list": [
+      {
+        "id": "my-orchestrator",
+        "tools": {
+          "allow": [
+            "work_start",
+            "work_finish",
+            "task_create",
+            "task_update",
+            "task_comment",
+            "status",
+            "health",
+            "work_heartbeat",
+            "project_register",
+            "setup",
+            "onboard"
+          ]
+        }
+      }
+    ]
+  }
+}
+```
+
+---
+
+## Project State (`projects.json`)
+
+All project state lives in `<workspace>/projects/projects.json`, keyed by group ID.
+
+**Source:** [`lib/projects.ts`](../lib/projects.ts)
+
+### Schema
+
+```json
+{
+  "projects": {
+    "<groupId>": {
+      "name": "my-webapp",
+      "repo": "~/git/my-webapp",
+      "groupName": "Dev - My Webapp",
+      "baseBranch": "development",
+      "deployBranch": "development",
+      "deployUrl": "https://my-webapp.example.com",
+      "channel": "telegram",
+      "roleExecution": "parallel",
+      "dev": {
+        "active": false,
+        "issueId": null,
+        "startTime": null,
+        "level": null,
+        "sessions": {
+          "junior": null,
+          "medior": "agent:orchestrator:subagent:my-webapp-dev-medior",
+          "senior": null
+        }
+      },
+      "qa": {
+        "active": false,
+        "issueId": null,
+        "startTime": null,
+        "level": null,
+        "sessions": {
+          "reviewer": "agent:orchestrator:subagent:my-webapp-qa-reviewer",
+          "tester": null
+        }
+      }
+    }
+  }
+}
+```
+
+### Project fields
+
+| Field | Type | Description |
+|---|---|---|
+| `name` | string | Short project name |
+| `repo` | string | Path to git repo (supports `~/` expansion) |
+| `groupName` | string | Group display name |
+| `baseBranch` | string | Base branch for development |
+| `deployBranch` | string | Branch that triggers deployment |
+| `deployUrl` | string | Deployment URL |
+| `channel` | string | Messaging channel (`"telegram"`, `"whatsapp"`, etc.) |
+| `roleExecution` | `"parallel"` \| `"sequential"` | DEV/QA parallelism for this project |
+
+### Worker state fields
+
+Each project has `dev` and `qa` worker state objects:
+
+| Field | Type | Description |
+|---|---|---|
+| `active` | boolean | Whether this role has an active worker |
+| `issueId` | string \| null | Issue being worked on (as string) |
+| `startTime` | string \| null | ISO timestamp when worker became active |
+| `level` | string \| null | Current level (`junior`, `medior`, `senior`, `reviewer`, `tester`) |
+| `sessions` | Record<string, string \| null> | Per-level session keys |
+
+**DEV session keys:** `junior`, `medior`, `senior`
+**QA session keys:** `reviewer`, `tester`
+
+### Key design decisions
+
+- **Session-per-level** — each level gets its own worker session, accumulating context independently. Level selection maps directly to a session key.
+- **Sessions preserved on completion** — when a worker completes a task, the sessions map is preserved (only `active`, `issueId`, and `startTime` are cleared). This enables session reuse.
+- **Atomic writes** — all writes go through temp-file-then-rename to prevent corruption.
+- **Sessions persist indefinitely** — no auto-cleanup. The `health` tool handles manual cleanup.
+
+---
+
+## Workspace File Layout
+
+```
+<workspace>/
+├── projects/
+│   ├── projects.json          ← Project state (auto-managed)
+│   └── roles/
+│       ├── my-webapp/         ← Per-project role instructions (editable)
+│       │   ├── dev.md
+│       │   └── qa.md
+│       ├── another-project/
+│       │   ├── dev.md
+│       │   └── qa.md
+│       └── default/           ← Fallback role instructions
+│           ├── dev.md
+│           └── qa.md
+├── log/
+│   └── audit.log              ← NDJSON event log (auto-managed)
+├── AGENTS.md                  ← Agent identity documentation
+└── HEARTBEAT.md               ← Heartbeat operation guide
+```
+
+### Role instruction files
+
+`work_start` loads role instructions from `projects/roles/<project>/<role>.md` at dispatch time, falling back to `projects/roles/default/<role>.md`. These files are appended to the task message sent to worker sessions.
+
+Edit to customize: deployment steps, test commands, acceptance criteria, coding standards.
+
+**Source:** [`lib/dispatch.ts:loadRoleInstructions`](../lib/dispatch.ts)
+
+---
+
+## Audit Log
+
+Append-only NDJSON at `<workspace>/log/audit.log`. Auto-truncated to 250 lines.
+
+**Source:** [`lib/audit.ts`](../lib/audit.ts)
+
+### Event types
+
+| Event | Trigger |
+|---|---|
+| `work_start` | Task dispatched to worker |
+| `model_selection` | Level resolved to model ID |
+| `work_finish` | Task completed |
+| `work_heartbeat` | Heartbeat tick completed |
+| `task_create` | Issue created |
+| `task_update` | Issue state changed |
+| `task_comment` | Comment added to issue |
+| `status` | Queue status queried |
+| `health` | Health scan completed |
+| `heartbeat_tick` | Heartbeat service tick (background) |
+| `project_register` | Project registered |
+
+### Querying
+
+```bash
+# All task dispatches
+cat audit.log | jq 'select(.event=="work_start")'
+
+# All completions for a project
+cat audit.log | jq 'select(.event=="work_finish" and .project=="my-webapp")'
+
+# Model selections
+cat audit.log | jq 'select(.event=="model_selection")'
+```
+
+---
+
+## Issue Provider
+
+DevClaw uses an `IssueProvider` interface (`lib/providers/provider.ts`) to abstract issue tracker operations. The provider is auto-detected from the git remote URL.
+
+**Supported providers:**
+
+| Provider | CLI | Detection |
+|---|---|---|
+| GitHub | `gh` | Remote contains `github.com` |
+| GitLab | `glab` | Remote contains `gitlab` |
+
+**Planned:** Jira (via REST API)
+
+**Source:** [`lib/providers/index.ts`](../lib/providers/index.ts)
--- a/docs/CONTEXT-AWARENESS.md
+++ b/docs/CONTEXT-AWARENESS.md
@@ -1,6 +1,6 @@
-# Context-Aware DevClaw
+# DevClaw — Context Awareness

-DevClaw now adapts its behavior based on how you interact with it.
+DevClaw adapts its behavior based on how you interact with it.

 ## Design Philosophy

@@ -12,170 +12,122 @@ DevClaw enforces strict boundaries between projects:
 - Project work happens **inside that project's group**
 - Setup and configuration happen **outside project groups**

-This design prevents:
- ❌ Cross-project contamination (workers picking up wrong project's tasks)
- ❌ Confusion about which project you're working on
- ❌ Accidental registration of wrong groups
- ❌ Setup discussions cluttering project work channels
+This prevents:
+- Cross-project contamination (workers picking up wrong project's tasks)
+- Confusion about which project you're working on
+- Accidental registration of wrong groups
+- Setup discussions cluttering project work channels

 This enables:
- ✅ Clear mental model: "This group = this project"
- ✅ Isolated work streams: Each project progresses independently
- ✅ Dedicated teams: Workers focus on one project at a time
- ✅ Clean separation: Setup vs. operational work
+- Clear mental model: "This group = this project"
+- Isolated work streams: Each project progresses independently
+- Dedicated teams: Workers focus on one project at a time
+- Clean separation: Setup vs. operational work

 ## Three Interaction Contexts

-### 1. **Via Another Agent** (Setup Mode)
-When you talk to your main agent (like Henk) about DevClaw:
- ✅ Use: `devclaw_onboard`, `devclaw_setup`
- ❌ Avoid: `task_pickup`, `queue_status` (operational tools)
+### 1. Via Another Agent (Setup Mode)
+
+When you talk to your main agent about DevClaw:
+- Use: `onboard`, `setup`
+- Avoid: `work_start`, `status` (operational tools)

 **Example:**
 ```
-User → Henk: "Can you help me set up DevClaw?"
-Henk → Calls devclaw_onboard
+User → Main Agent: "Can you help me set up DevClaw?"
+Main Agent → Calls onboard
 ```

-### 2. **Direct Message to DevClaw Agent**
+### 2. Direct Message to DevClaw Agent
+
 When you DM the DevClaw agent directly on Telegram/WhatsApp:
- ✅ Use: `queue_status` (all projects), `session_health` (system overview)
- ❌ Avoid: `task_pickup` (project-specific work), setup tools
+- Use: `status` (all projects), `health` (system overview)
+- Avoid: `work_start` (project-specific work), setup tools

 **Example:**
 ```
 User → DevClaw DM: "Show me the status of all projects"
-DevClaw → Calls queue_status (shows all projects)
+DevClaw → Calls status (shows all projects)
 ```

-### 3. **Project Group Chat**
+### 3. Project Group Chat
+
 When you message in a Telegram/WhatsApp group bound to a project:
- ✅ Use: `task_pickup`, `task_complete`, `task_create`, `queue_status` (auto-filtered)
- ❌ Avoid: Setup tools, system-wide queries
+- Use: `work_start`, `work_finish`, `task_create`, `status` (auto-filtered)
+- Avoid: Setup tools, system-wide queries

 **Example:**
 ```
-User → OpenClaw Dev Group: "@henk pick up issue #42"
-DevClaw → Calls task_pickup (only works in groups)
+User → Project Group: "pick up issue #42"
+DevClaw → Calls work_start (only works in groups)
 ```

 ## How It Works

 ### Context Detection
+
 Each tool automatically detects:
- **Agent ID** - Is this the DevClaw agent or another agent?
- **Message Channel** - Telegram, WhatsApp, or CLI?
- **Session Key** - Is this a group chat or direct message?
+- **Agent ID** — Is this the DevClaw agent or another agent?
+- **Message Channel** — Telegram, WhatsApp, or CLI?
+- **Session Key** — Is this a group chat or direct message?
  - Format: `agent:{agentId}:{channel}:{type}:{id}`
  - Telegram group: `agent:devclaw:telegram:group:-5266044536`
  - WhatsApp group: `agent:devclaw:whatsapp:group:120363123@g.us`
  - DM: `agent:devclaw:telegram:user:657120585`
- **Project Binding** - Which project is this group bound to?
+- **Project Binding** — Which project is this group bound to?

 ### Guardrails
+
 Tools include context-aware guidance in their responses:
 ```json
 {
-  "contextGuidance": "🛡️ Context: Project Group Chat (telegram)\n
-    You're in a Telegram group for project 'openclaw-core'.\n
-    Use task_pickup, task_complete for project work.",
+  "contextGuidance": "Context: Project Group Chat (telegram)\n    You're in a Telegram group for project 'my-webapp'.\n    Use work_start, work_finish for project work.",
  ...
 }
 ```

-## Integrated Tools
+## Tool Context Requirements

-### ✅ `devclaw_onboard`
- **Works best:** Via another agent or direct DM
- **Blocks:** Group chats (setup shouldn't happen in project groups)
+| Tool | Group chat | Direct DM | Via agent |
+|---|---|---|---|
+| `onboard` | Blocked | Works | Works |
+| `setup` | Works | Works | Works |
+| `work_start` | Works | Blocked | Blocked |
+| `work_finish` | Works | Works | Works |
+| `task_create` | Works | Works | Works |
+| `task_update` | Works | Works | Works |
+| `task_comment` | Works | Works | Works |
+| `status` | Auto-filtered | All projects | Suggests onboard |
+| `health` | Auto-filtered | All projects | Works |
+| `work_heartbeat` | Single project | All projects | Works |
+| `project_register` | Works (required) | Blocked | Blocked |

-### ✅ `queue_status`
- **Group context:** Auto-filters to that project
- **Direct context:** Shows all projects
- **Via-agent context:** Suggests using devclaw_onboard instead
-
-### ✅ `task_pickup`
- **ONLY works:** In project group chats
- **Blocks:** Direct DMs and setup conversations
-
-### ✅ `project_register`
- **ONLY works:** In the Telegram/WhatsApp group you're registering
- **Blocks:** Direct DMs and via-agent conversations
- **Auto-detects:** Group ID from current chat (projectGroupId parameter now optional)
-
-**Why this matters:**
- **Project Isolation**: Each group = one project = one dedicated team
- **Clear Boundaries**: Forces deliberate project registration from within the project's space
- **Team Clarity**: You're physically in the group when binding it, making the connection explicit
- **No Mistakes**: Impossible to accidentally register the wrong group when you're in it
- **Natural Workflow**: "This group is for Project X" → register Project X here
-
-## Testing
-
-### Debug Tool
-Use `context_test` to see what context is detected:
-```
-# In any context:
-context_test
-
-# Returns:
-{
-  "detectedContext": { "type": "group", "projectName": "openclaw-core" },
-  "guardrails": "🛡️ Context: Project Group Chat..."
-}
-```
-
-### Manual Testing
-1. **Setup Mode:** Message your main agent → "Help me configure DevClaw"
-2. **Status Check:** DM DevClaw agent (Telegram/WhatsApp) → "Show me the queue"
-3. **Project Work:** Post in project group (Telegram/WhatsApp) → "@henk pick up #42"
-
-Each context should trigger different guardrails.
-
-## Configuration
-
-Add to `~/.openclaw/openclaw.json`:
-```json
-"plugins": {
-  "entries": {
-    "devclaw": {
-      "config": {
-        "devClawAgentIds": ["henk-development", "devclaw-test"],
-        "models": { ... }
-      }
-    }
-  }
-}
-```
-
-The `devClawAgentIds` array lists which agents are DevClaw orchestrators.
-
-## Implementation Details
-
- **Module:** [lib/context-guard.ts](../lib/context-guard.ts)
- **Tests:** [tests/unit/context-guard.test.ts](../tests/unit/context-guard.test.ts) (15 passing)
- **Integrated tools:** 4 key tools (`devclaw_onboard`, `queue_status`, `task_pickup`, `project_register`)
- **Detection logic:** Checks agentId, messageChannel, sessionKey pattern matching
+**Why `project_register` requires group context:**
+- Forces deliberate project registration from within the project's space
+- You're physically in the group when binding it, making the connection explicit
+- Impossible to accidentally register the wrong group

 ## WhatsApp Support

-DevClaw **fully supports WhatsApp** groups with the same architecture as Telegram:
+DevClaw fully supports WhatsApp groups with the same architecture as Telegram:

- ✅ WhatsApp group detection via `sessionKey.includes("@g.us")`
- ✅ Projects keyed by WhatsApp group ID (e.g., `"120363123@g.us"`)
- ✅ Context-aware tools work identically for both channels
- ✅ One project = one group (Telegram OR WhatsApp)
+- WhatsApp group detection via `sessionKey.includes("@g.us")`
+- Projects keyed by WhatsApp group ID (e.g., `"120363123@g.us"`)
+- Context-aware tools work identically for both channels
+- One project = one group (Telegram OR WhatsApp)

 **To register a WhatsApp project:**
 1. Go to the WhatsApp group chat
 2. Call `project_register` from within the group
 3. Group ID auto-detected from context

-The architecture treats Telegram and WhatsApp identically - the only difference is the group ID format.
+## Implementation

-## Future Enhancements
+- **Module:** [`lib/context-guard.ts`](../lib/context-guard.ts)
+- **Detection logic:** Checks agentId, messageChannel, sessionKey pattern matching
+- **Configuration:** `devClawAgentIds` in plugin config lists which agents are DevClaw orchestrators

- [ ] Integrate into remaining tools (`task_complete`, `session_health`, `task_create`, `devclaw_setup`)
- [ ] System prompt injection (requires OpenClaw core support)
- [ ] Context-based tool filtering (hide irrelevant tools)
- [ ] Per-project context overrides
+## Related
+
+- [Configuration — devClawAgentIds](CONFIGURATION.md#devclaw-agent-ids)
+- [Architecture — Scope boundaries](ARCHITECTURE.md#scope-boundaries)
--- a/docs/MANAGEMENT.md
+++ b/docs/MANAGEMENT.md
@@ -12,14 +12,14 @@ DevClaw exists because of a gap that management theorists identified decades ago

 In 1969, Paul Hersey and Ken Blanchard published what would become Situational Leadership Theory. The central idea is deceptively simple: the way you delegate should match the capability and reliability of the person doing the work. You don't hand an intern the system architecture redesign. You don't ask your principal engineer to rename a CSS class.

-DevClaw's model selection does exactly this. When a task comes in, the plugin evaluates complexity from the issue title and description, then routes it to the cheapest model that can handle it:
+DevClaw's level selection does exactly this. When a task comes in, the plugin routes it to the cheapest model that can handle it:

-| Complexity                       | Model  | Analogy                     |
-| -------------------------------- | ------ | --------------------------- |
-| Simple (typos, renames, copy)    | Haiku  | Junior dev — just execute   |
-| Standard (features, bug fixes)   | Sonnet | Mid-level — think and build |
-| Complex (architecture, security) | Opus   | Senior — design and reason  |
-| Review                           | Grok   | Independent reviewer        |
+| Complexity                       | Level    | Analogy                     |
+| -------------------------------- | -------- | --------------------------- |
+| Simple (typos, renames, copy)    | Junior   | The intern — just execute   |
+| Standard (features, bug fixes)   | Medior   | Mid-level — think and build |
+| Complex (architecture, security) | Senior   | The architect — design and reason |
+| Review                           | Reviewer | Independent code reviewer   |

 This isn't just cost optimization. It mirrors what effective managers do instinctively: match the delegation level to the task, not to a fixed assumption about the delegate.

@@ -27,11 +27,11 @@ This isn't just cost optimization. It mirrors what effective managers do instinc

 Classical management theory — later formalized by Bernard Bass in his work on Transformational Leadership — introduced a concept called Management by Exception (MBE). The principle: a manager should only be pulled back into a workstream when something deviates from the expected path.

-DevClaw's task lifecycle is built on this. The orchestrator delegates a task via `task_pickup`, then steps away. It only re-engages in three scenarios:
+DevClaw's task lifecycle is built on this. The orchestrator delegates a task via `work_start`, then steps away. It only re-engages in three scenarios:

 1. **DEV completes work** → The task moves to QA automatically. No orchestrator involvement needed.
 2. **QA passes** → The issue closes. Pipeline complete.
-3. **QA fails** → The task cycles back to DEV with a fix request. The orchestrator may need to adjust the model tier.
+3. **QA fails** → The task cycles back to DEV with a fix request. The orchestrator may need to adjust the model level.
 4. **QA refines** → The task enters a holding state that _requires human decision_. This is the explicit escalation boundary.

 The "refine" state is the most interesting from a delegation perspective. It's a conscious architectural decision that says: some judgments should not be automated. When the QA agent determines that a task needs rethinking rather than just fixing, it escalates to the only actor who has the full business context — the human.
@@ -61,7 +61,7 @@ One of the most common delegation failures is self-review. You don't ask the per
 DevClaw enforces structural separation between development and review by design:

 - DEV and QA are separate sub-agent sessions with separate state.
- QA uses a different model entirely (Grok), introducing genuine independence.
+- QA uses the reviewer level, which can be a different model entirely, introducing genuine independence.
 - The review happens after a clean label transition — QA picks up from `To Test`, not from watching DEV work in real time.

 This mirrors a principle from organizational design: effective controls require independence between execution and verification. It's the same reason companies separate their audit function from their operations.
@@ -72,7 +72,7 @@ Ronald Coase won a Nobel Prize for explaining why firms exist: transaction costs

 DevClaw applies the same logic to AI sessions. Spawning a new sub-agent session costs approximately 50,000 tokens of context loading — the agent needs to read the full codebase before it can do useful work. That's the onboarding cost.

-The plugin tracks session IDs across task completions. When a DEV finishes task A and task B is ready on the same project, DevClaw detects the existing session and returns `"sessionAction": "send"` instead of `"spawn"`. The orchestrator routes the new task to the running session. No re-onboarding. No context reload.
+The plugin tracks session keys across task completions. When a DEV finishes task A and task B is ready on the same project, DevClaw detects the existing session and reuses it instead of spawning a new one. No re-onboarding. No context reload.

 In management terms: keep your team stable. Reassigning the same person to the next task on their project is almost always cheaper than bringing in someone new — even if the new person is theoretically better qualified.

@@ -101,11 +101,11 @@ This is the deepest lesson from delegation theory: **good delegation isn't about

 Management research points to a few directions that could extend DevClaw's delegation model:

-**Progressive delegation.** Blanchard's model suggests increasing task complexity for delegates as they prove competent. DevClaw could track QA pass rates per model tier and automatically promote — if Haiku consistently passes QA on borderline tasks, start routing more work to it. This is how good managers develop their people, and it reduces cost over time.
+**Progressive delegation.** Blanchard's model suggests increasing task complexity for delegates as they prove competent. DevClaw could track QA pass rates per model level and automatically promote — if junior consistently passes QA on borderline tasks, start routing more work to it. This is how good managers develop their people, and it reduces cost over time.

 **Delegation authority expansion.** The Vroom-Yetton decision model maps when a leader should decide alone versus consulting the team. Currently, sub-agents have narrow authority — they execute tasks but can't restructure the backlog. Selectively expanding this (e.g., allowing a DEV agent to split a task it judges too large) would reduce orchestrator bottlenecks, mirroring how managers gradually give high-performers more autonomy.

-**Outcome-based learning.** Delegation research emphasizes that the _delegator_ learns from outcomes too. Aggregated metrics — QA fail rate by model tier, average cycles to Done, time-in-state distributions — would help both the orchestrator agent and the human calibrate their delegation patterns over time.
+**Outcome-based learning.** Delegation research emphasizes that the _delegator_ learns from outcomes too. Aggregated metrics — QA fail rate by model level, average cycles to Done, time-in-state distributions — would help both the orchestrator agent and the human calibrate their delegation patterns over time.

 ---

--- a/docs/ONBOARDING.md
+++ b/docs/ONBOARDING.md
@@ -1,18 +1,18 @@
 # DevClaw — Onboarding Guide

-## What you need before starting
+Step-by-step setup: install the plugin, configure an agent, register projects, and run your first task.
+
+## Prerequisites

 | Requirement | Why | How to check |
 |---|---|---|
 | [OpenClaw](https://openclaw.ai) installed | DevClaw is an OpenClaw plugin | `openclaw --version` |
 | Node.js >= 20 | Runtime for plugin | `node --version` |
-| [`glab`](https://gitlab.com/gitlab-org/cli) or [`gh`](https://cli.github.com) CLI | Issue tracker provider (auto-detected from remote) | `glab --version` or `gh --version` |
-| CLI authenticated | Plugin calls glab/gh for every label transition | `glab auth status` or `gh auth status` |
-| A GitLab/GitHub repo with issues | The task backlog lives in the issue tracker | `glab issue list` or `gh issue list` from your repo |
+| [`gh`](https://cli.github.com) or [`glab`](https://gitlab.com/gitlab-org/cli) CLI | Issue tracker provider (auto-detected from git remote) | `gh --version` or `glab --version` |
+| CLI authenticated | Plugin calls gh/glab for every label transition | `gh auth status` or `glab auth status` |
+| A GitHub/GitLab repo with issues | The task backlog lives in the issue tracker | `gh issue list` or `glab issue list` from your repo |

-## Setup
-
-### 1. Install the plugin
+## Step 1: Install the plugin

 ```bash
 # Copy to extensions directory (auto-discovered on next restart)
@@ -25,21 +25,21 @@ openclaw plugins list
 # Should show: DevClaw | devclaw | loaded
 ```

-### 2. Run setup
+## Step 2: Run setup

 There are three ways to set up DevClaw:

-#### Option A: Conversational onboarding (recommended)
+### Option A: Conversational onboarding (recommended)

-Call the `devclaw_onboard` tool from any agent that has the DevClaw plugin loaded. The agent will walk you through configuration step by step — asking about:
+Call the `onboard` tool from any agent that has the DevClaw plugin loaded. The agent walks you through configuration step by step — asking about:
 - Agent selection (current or create new)
 - Channel binding (telegram/whatsapp/none) — for new agents only
- Model tiers (accept defaults or customize)
+- Model levels (accept defaults or customize)
 - Optional project registration

 The tool returns instructions that guide the agent through the QA-style setup conversation.

-#### Option B: CLI wizard
+### Option B: CLI wizard

 ```bash
 openclaw devclaw setup
@@ -48,12 +48,13 @@ openclaw devclaw setup
 The setup wizard walks you through:

 1. **Agent** — Create a new orchestrator agent or configure an existing one
-2. **Developer team** — Choose which LLM model powers each developer tier:
-   - **Junior** (fast, cheap tasks) — default: `anthropic/claude-haiku-4-5`
-   - **Medior** (standard tasks) — default: `anthropic/claude-sonnet-4-5`
-   - **Senior** (complex tasks) — default: `anthropic/claude-opus-4-5`
-   - **QA** (code review) — default: `anthropic/claude-sonnet-4-5`
-3. **Workspace** — Writes AGENTS.md, HEARTBEAT.md, role templates, and initializes memory
+2. **Developer team** — Choose which LLM model powers each developer level:
+   - **DEV junior** (fast, cheap tasks) — default: `anthropic/claude-haiku-4-5`
+   - **DEV medior** (standard tasks) — default: `anthropic/claude-sonnet-4-5`
+   - **DEV senior** (complex tasks) — default: `anthropic/claude-opus-4-5`
+   - **QA reviewer** (code review) — default: `anthropic/claude-sonnet-4-5`
+   - **QA tester** (manual testing) — default: `anthropic/claude-haiku-4-5`
+3. **Workspace** — Writes AGENTS.md, HEARTBEAT.md, role templates, and initializes state

 Non-interactive mode:
 ```bash
@@ -66,45 +67,45 @@ openclaw devclaw setup --agent my-orchestrator \
  --senior "anthropic/claude-opus-4-5"
 ```

-#### Option C: Tool call (agent-driven)
+### Option C: Tool call (agent-driven)

 **Conversational onboarding via tool:**
 ```json
-devclaw_onboard({ mode: "first-run" })
+onboard({ "mode": "first-run" })
 ```

-The tool returns step-by-step instructions that guide the agent through the QA-style setup conversation.
+The tool returns step-by-step instructions that guide the agent through the setup conversation.

 **Direct setup (skip conversation):**
 ```json
-{
+setup({
  "newAgentName": "My Dev Orchestrator",
  "channelBinding": "telegram",
  "models": {
+    "dev": {
      "junior": "anthropic/claude-haiku-4-5",
      "senior": "anthropic/claude-opus-4-5"
+    },
+    "qa": {
+      "reviewer": "anthropic/claude-sonnet-4-5"
    }
  }
+})
 ```

-This calls `devclaw_setup` directly without conversational prompts.
+## Step 3: Channel binding (optional, for new agents)

-### 3. Channel binding (optional, for new agents)
-
-If you created a new agent during conversational onboarding and selected a channel binding (telegram/whatsapp), the agent is automatically bound and will receive messages from that channel. **Skip to step 4.**
+If you created a new agent during conversational onboarding and selected a channel binding (telegram/whatsapp), the agent is automatically bound. **Skip to step 4.**

 **Smart Migration**: If an existing agent already has a channel-wide binding (e.g., the old orchestrator receives all telegram messages), the onboarding agent will:
-1. Call `analyze_channel_bindings` to detect the conflict
+1. Detect the conflict
 2. Ask if you want to migrate the binding from the old agent to the new one
 3. If you confirm, the binding is automatically moved — no manual config edit needed

-This is useful when you're replacing an old orchestrator with a new one.
+If you didn't bind a channel during setup:

-If you didn't bind a channel during setup, you have two options:
+**Option A: Manually edit `openclaw.json`**

-**Option A: Manually edit `openclaw.json`** (for existing agents or post-creation binding)
-
-Add an entry to the `bindings` array:
 ```json
 {
  "bindings": [
@@ -136,131 +137,115 @@ Restart OpenClaw after editing.

 **Option B: Add bot to Telegram/WhatsApp group**

-If using a channel-wide binding (no peer filter), the agent will receive all messages from that channel. Add your orchestrator bot to the relevant Telegram group for the project.
+If using a channel-wide binding (no peer filter), the agent receives all messages from that channel. Add your orchestrator bot to the relevant Telegram group.

-### 4. Register your project
+## Step 4: Register your project

-Tell the orchestrator agent to register a new project:
+Go to the Telegram/WhatsApp group for the project and tell the orchestrator agent:

-> "Register project my-project at ~/git/my-project for group -1234567890 with base branch development"
+> "Register project my-project at ~/git/my-project with base branch development"

 The agent calls `project_register`, which atomically:
 - Validates the repo and auto-detects GitHub/GitLab from remote
 - Creates all 8 state labels (idempotent)
- Scaffolds prompt instruction files (`projects/prompts/<project>/dev.md` and `qa.md`)
- Adds the project entry to `projects.json` with `autoChain: false`
+- Scaffolds role instruction files (`projects/roles/<project>/dev.md` and `qa.md`)
+- Adds the project entry to `projects.json`
 - Logs the registration event

+**Initial state in `projects.json`:**
+
 ```json
 {
  "projects": {
    "-1234567890": {
      "name": "my-project",
      "repo": "~/git/my-project",
-      "groupName": "Dev - My Project",
-      "deployUrl": "",
+      "groupName": "Project: my-project",
      "baseBranch": "development",
      "deployBranch": "development",
-      "autoChain": false,
+      "channel": "telegram",
+      "roleExecution": "parallel",
      "dev": {
        "active": false,
        "issueId": null,
        "startTime": null,
-        "model": null,
+        "level": null,
        "sessions": { "junior": null, "medior": null, "senior": null }
      },
      "qa": {
        "active": false,
        "issueId": null,
        "startTime": null,
-        "model": null,
-        "sessions": { "qa": null }
+        "level": null,
+        "sessions": { "reviewer": null, "tester": null }
      }
    }
  }
 }
 ```

-**Manual fallback:** If you prefer CLI control, you can still create labels manually with `glab label create` and edit `projects.json` directly. See the [Architecture docs](ARCHITECTURE.md) for label names and colors.
+**Finding the Telegram group ID:** The group ID is the numeric ID of your Telegram supergroup (a negative number like `-1234567890`). When you call `project_register` from within the group, the ID is auto-detected from context.

-**Finding the Telegram group ID:** The group ID is the numeric ID of your Telegram supergroup (a negative number like `-1234567890`). You can find it via the Telegram bot API or from message metadata in OpenClaw logs.
-
-### 5. Create your first issue
+## Step 5: Create your first issue

 Issues can be created in multiple ways:
 - **Via the agent** — Ask the orchestrator in the Telegram group: "Create an issue for adding a login page" (uses `task_create`)
 - **Via workers** — DEV/QA workers can call `task_create` to file follow-up bugs they discover
- **Via CLI** — `cd ~/git/my-project && glab issue create --title "My first task" --label "To Do"` (or `gh issue create`)
+- **Via CLI** — `cd ~/git/my-project && gh issue create --title "My first task" --label "To Do"` (or `glab issue create`)
 - **Via web UI** — Create an issue and add the "To Do" label

-### 6. Test the pipeline
+Note: `task_create` defaults to the "Planning" label. Use "To Do" explicitly when the task is ready for immediate work.
+
+## Step 6: Test the pipeline

 Ask the agent in the Telegram group:

 > "Check the queue status"

-The agent should call `queue_status` and report the "To Do" issue. Then:
+The agent should call `status` and report the "To Do" issue. Then:

 > "Pick up issue #1 for DEV"

-The agent calls `task_pickup`, which assigns a developer tier, transitions the label to "Doing", creates or reuses a worker session, and dispatches the task — all in one call. The agent just posts the announcement.
+The agent calls `work_start`, which assigns a developer level, transitions the label to "Doing", creates or reuses a worker session, and dispatches the task — all in one call. The agent posts the announcement.

 ## Adding more projects

-Tell the agent to register a new project (step 3) and add the bot to the new Telegram group (step 4). That's it — `project_register` handles labels and state setup.
+Tell the agent to register a new project (step 4) from within the new project's Telegram group. That's it — `project_register` handles labels and state setup.

 Each project is fully isolated — separate queue, separate workers, separate state.

-## Developer tiers
+## Developer levels

-DevClaw assigns tasks to developer tiers instead of raw model names. This makes the system intuitive — you're assigning a "junior dev" to fix a typo, not configuring model parameters.
+DevClaw assigns tasks to developer levels instead of raw model names. This makes the system intuitive — you're assigning a "junior dev" to fix a typo, not configuring model parameters.

-| Tier | Role | Default model | When to assign |
-|------|------|---------------|----------------|
-| **junior** | Junior developer | `anthropic/claude-haiku-4-5` | Typos, single-file fixes, CSS changes |
-| **medior** | Mid-level developer | `anthropic/claude-sonnet-4-5` | Features, bug fixes, multi-file changes |
-| **senior** | Senior developer | `anthropic/claude-opus-4-5` | Architecture, migrations, system-wide refactoring |
-| **qa** | QA engineer | `anthropic/claude-sonnet-4-5` | Code review, test validation |
+| Role | Level | Default model | When to assign |
+|------|-------|---------------|----------------|
+| DEV | **junior** | `anthropic/claude-haiku-4-5` | Typos, single-file fixes, CSS changes |
+| DEV | **medior** | `anthropic/claude-sonnet-4-5` | Features, bug fixes, multi-file changes |
+| DEV | **senior** | `anthropic/claude-opus-4-5` | Architecture, migrations, system-wide refactoring |
+| QA | **reviewer** | `anthropic/claude-sonnet-4-5` | Code review, test validation |
+| QA | **tester** | `anthropic/claude-haiku-4-5` | Manual testing, smoke tests |

-Change which model powers each tier in `openclaw.json`:
-```json
-{
-  "plugins": {
-    "entries": {
-      "devclaw": {
-        "config": {
-          "models": {
-            "junior": "anthropic/claude-haiku-4-5",
-            "medior": "anthropic/claude-sonnet-4-5",
-            "senior": "anthropic/claude-opus-4-5",
-            "qa": "anthropic/claude-sonnet-4-5"
-          }
-        }
-      }
-    }
-  }
-}
-```
+Change which model powers each level in `openclaw.json` — see [Configuration](CONFIGURATION.md#model-tiers).

 ## What the plugin handles vs. what you handle

 | Responsibility | Who | Details |
 |---|---|---|
 | Plugin installation | You (once) | `cp -r devclaw ~/.openclaw/extensions/` |
-| Agent + workspace setup | Plugin (`devclaw_setup`) | Creates agent, configures models, writes workspace files |
-| Channel binding analysis | Plugin (`analyze_channel_bindings`) | Detects channel conflicts, validates channel configuration |
-| Channel binding migration | Plugin (`devclaw_setup` with `migrateFrom`) | Automatically moves channel-wide bindings between agents |
-| Label setup | Plugin (`project_register`) | 8 labels, created idempotently via `IssueProvider` |
-| Prompt file scaffolding | Plugin (`project_register`) | Creates `projects/prompts/<project>/dev.md` and `qa.md` |
+| Agent + workspace setup | Plugin (`setup`) | Creates agent, configures models, writes workspace files |
+| Channel binding migration | Plugin (`setup` with `migrateFrom`) | Automatically moves channel-wide bindings between agents |
+| Label setup | Plugin (`project_register`) | 8 labels, created idempotently via IssueProvider |
+| Prompt file scaffolding | Plugin (`project_register`) | Creates `projects/roles/<project>/dev.md` and `qa.md` |
 | Project registration | Plugin (`project_register`) | Entry in `projects.json` with empty worker state |
 | Telegram group setup | You (once per project) | Add bot to group |
 | Issue creation | Plugin (`task_create`) | Orchestrator or workers create issues from chat |
-| Label transitions | Plugin | Atomic label transitions via issue tracker CLI |
-| Developer assignment | Plugin | LLM-selected tier by orchestrator, keyword heuristic fallback |
+| Label transitions | Plugin | Atomic transitions via issue tracker CLI |
+| Developer assignment | Plugin | LLM-selected level by orchestrator, keyword heuristic fallback |
 | State management | Plugin | Atomic read/write to `projects.json` |
 | Session management | Plugin | Creates, reuses, and dispatches to sessions via CLI. Agent never touches session tools. |
-| Task completion | Plugin (`task_complete`) | Workers self-report. Auto-chains if enabled. |
-| Prompt instructions | Plugin (`task_pickup`) | Loaded from `projects/prompts/<project>/<role>.md`, appended to task message |
+| Task completion | Plugin (`work_finish`) | Workers self-report. Auto-chains if enabled. |
+| Prompt instructions | Plugin (`work_start`) | Loaded from `projects/roles/<project>/<role>.md`, appended to task message |
 | Audit logging | Plugin | Automatic NDJSON append per tool call |
-| Zombie detection | Plugin | `session_health` checks active vs alive |
-| Queue scanning | Plugin | `queue_status` queries issue tracker per project |
+| Zombie detection | Plugin | `health` checks active vs alive |
+| Queue scanning | Plugin | `status` queries issue tracker per project |
--- a/docs/QA_WORKFLOW.md
+++ b/docs/QA_WORKFLOW.md
@@ -1,8 +1,6 @@
-# QA Workflow
+# DevClaw — QA Workflow

-## Overview
-
-Quality Assurance (QA) in DevClaw follows a structured workflow that ensures every review is documented and traceable.
+Quality Assurance in DevClaw follows a structured workflow that ensures every review is documented and traceable.

 ## Required Steps

@@ -28,10 +26,10 @@ task_comment({

 ### 3. Complete the Task

-After posting your comment, call `task_complete`:
+After posting your comment, call `work_finish`:

 ```javascript
-task_complete({
+work_finish({
  role: "qa",
  projectGroupId: "<group-id>",
  result: "pass",  // or "fail", "refine", "blocked"
@@ -39,15 +37,24 @@ task_complete({
 })
 ```

+## QA Results
+
+| Result | Label transition | Meaning |
+|---|---|---|
+| `"pass"` | Testing → Done | Approved. Issue closed. |
+| `"fail"` | Testing → To Improve | Issues found. Issue reopened, sent back to DEV. |
+| `"refine"` | Testing → Refining | Needs human decision. Pipeline pauses. |
+| `"blocked"` | Testing → To Test | Cannot complete (env issues, etc.). Returns to QA queue. |
+
 ## Why Comments Are Required

-1. **Audit Trail**: Every review decision is documented
-2. **Knowledge Sharing**: Future reviewers understand what was tested
-3. **Quality Metrics**: Enables tracking of test coverage
-4. **Debugging**: When issues arise later, we know what was checked
-5. **Compliance**: Some projects require documented QA evidence
+1. **Audit Trail** — Every review decision is documented in the issue tracker
+2. **Knowledge Sharing** — Future reviewers understand what was tested
+3. **Quality Metrics** — Enables tracking of test coverage
+4. **Debugging** — When issues arise later, we know what was checked
+5. **Compliance** — Some projects require documented QA evidence

-## Comment Template
+## Comment Templates

 ### For Passing Reviews

@@ -72,15 +79,14 @@ task_complete({
 ### For Failing Reviews

 ```markdown
-## QA Review - Issues Found
+## QA Review — Issues Found

 **Tested:**
 - [What you tested]

 **Issues Found:**
 1. [Issue description with steps to reproduce]
-2. [Issue description with steps to reproduce]
-3. [Issue description with expected vs actual behavior]
+2. [Issue description with expected vs actual behavior]

 **Environment:**
 - [Test environment details]
@@ -90,25 +96,25 @@ task_complete({

 ## Enforcement

-As of [current date], QA workers are instructed via role templates to:
- Always call `task_comment` BEFORE `task_complete`
+QA workers receive instructions via role templates to:
+- Always call `task_comment` BEFORE `work_finish`
 - Include specific details about what was tested
 - Document results, environment, and any notes

 Prompt templates affected:
- `projects/prompts/<project>/qa.md`
+- `projects/roles/<project>/qa.md`
 - All project-specific QA templates should follow this pattern

 ## Best Practices

-1. **Be Specific**: Don't just say "tested the feature" - list what you tested
-2. **Include Environment**: Version numbers, browser, OS can matter
-3. **Document Edge Cases**: If you tested special scenarios, note them
-4. **Use Screenshots**: For UI issues, screenshots help (link in comment)
-5. **Reference Requirements**: Link back to acceptance criteria from the issue
+1. **Be Specific** — Don't just say "tested the feature" — list what you tested
+2. **Include Environment** — Version numbers, browser, OS can matter
+3. **Document Edge Cases** — If you tested special scenarios, note them
+4. **Reference Requirements** — Link back to acceptance criteria from the issue
+5. **Use Screenshots** — For UI issues, screenshots help (link in comment)

 ## Related

- Issue #103: Enforce QA comment on every review (pass or fail)
- Tool: `task_comment` - Add comments to issues
- Tool: `task_complete` - Complete QA tasks
+- Tool: [`task_comment`](TOOLS.md#task_comment) — Add comments to issues
+- Tool: [`work_finish`](TOOLS.md#work_finish) — Complete QA tasks
+- Config: [`projects/roles/<project>/qa.md`](CONFIGURATION.md#role-instruction-files) — QA role instructions
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -15,16 +15,16 @@ This works for the common case but breaks down when you want:

 Roles become a configurable list instead of a hardcoded pair. Each role defines:
 - **Name** — e.g. `design`, `dev`, `qa`, `devops`
- **Tiers** — which developer tiers can be assigned (e.g. design only needs `medior`)
+- **Levels** — which developer levels can be assigned (e.g. design only needs `medior`)
 - **Pipeline position** — where it sits in the task lifecycle
 - **Worker count** — how many concurrent workers (default: 1)

 ```json
 {
  "roles": {
-    "dev": { "tiers": ["junior", "medior", "senior"], "workers": 1 },
-    "qa": { "tiers": ["qa"], "workers": 1 },
-    "devops": { "tiers": ["medior", "senior"], "workers": 1 }
+    "dev": { "levels": ["junior", "medior", "senior"], "workers": 1 },
+    "qa": { "levels": ["reviewer", "tester"], "workers": 1 },
+    "devops": { "levels": ["medior", "senior"], "workers": 1 }
  },
  "pipeline": ["dev", "qa", "devops"]
 }
@@ -35,15 +35,15 @@ The pipeline definition replaces the hardcoded `Doing → To Test → Testing
 ### Open questions

 - How do custom labels map? Generate from role names, or let users define?
- Should roles have their own instruction files (`projects/prompts/<project>/<role>.md`) — yes, this already works
+- Should roles have their own instruction files (`projects/roles/<project>/<role>.md`) — yes, this already works
 - How to handle parallel roles (e.g. frontend + backend DEV in parallel before QA)?

 ---

-## Channel-agnostic groups
+## Channel-agnostic Groups

 Currently DevClaw maps projects to **Telegram group IDs**. The `projectGroupId` is a Telegram-specific negative number. This means:
- WhatsApp groups can't be used as project channels
+- WhatsApp groups can't be used as project channels (partially supported now via `channel` field)
 - Discord, Slack, or other channels are excluded
 - The naming (`groupId`, `groupName`) is Telegram-specific

@@ -77,19 +77,20 @@ Key changes:
 - All tool params, state keys, and docs updated accordingly
 - Backward compatible: existing Telegram-only keys migrated on read

-This enables any OpenClaw channel (Telegram, WhatsApp, Discord, Slack, etc.) to host a project — each group chat becomes an autonomous dev team regardless of platform.
+This enables any OpenClaw channel (Telegram, WhatsApp, Discord, Slack, etc.) to host a project.

 ### Open questions

 - Should one project be bindable to multiple channels? (e.g. Telegram for devs, WhatsApp for stakeholder updates)
- How does the orchestrator agent handle cross-channel context? (OpenClaw bindings already route by channel)
+- How does the orchestrator agent handle cross-channel context?

 ---

-## Other ideas
+## Other Ideas

 - **Jira provider** — `IssueProvider` interface already abstracts GitHub/GitLab; Jira is the obvious next addition
- **Deployment integration** — `task_complete` QA pass could trigger a deploy step via webhook or CLI
- **Cost tracking** — log token usage per task/tier, surface in `queue_status`
+- **Deployment integration** — `work_finish` QA pass could trigger a deploy step via webhook or CLI
+- **Cost tracking** — log token usage per task/level, surface in `status`
 - **Priority scoring** — automatic priority assignment based on labels, age, and dependencies
 - **Session archival** — auto-archive idle sessions after configurable timeout (currently indefinite)
+- **Progressive delegation** — track QA pass rates per level and auto-promote (see [Management Theory](MANAGEMENT.md))
--- a/docs/TESTING.md
+++ b/docs/TESTING.md
@@ -59,10 +59,15 @@ npm run test:ui
      "devclaw": {
        "config": {
          "models": {
+            "dev": {
              "junior": "anthropic/claude-haiku-4-5",
              "medior": "anthropic/claude-sonnet-4-5",
-            "senior": "anthropic/claude-opus-4-5",
-            "qa": "anthropic/claude-sonnet-4-5"
+              "senior": "anthropic/claude-opus-4-5"
+            },
+            "qa": {
+              "reviewer": "anthropic/claude-sonnet-4-5",
+              "tester": "anthropic/claude-haiku-4-5"
+            }
          }
        }
      }
--- a/docs/TOOLS.md
+++ b/docs/TOOLS.md
@@ -0,0 +1,361 @@
+# DevClaw — Tools Reference
+
+Complete reference for all 11 tools registered by DevClaw. See [`index.ts`](../index.ts) for registration.
+
+## Worker Lifecycle
+
+### `work_start`
+
+Pick up a task from the issue queue. Handles level assignment, label transition, session creation/reuse, task dispatch, and audit logging — all in one call.
+
+**Source:** [`lib/tools/work-start.ts`](../lib/tools/work-start.ts)
+
+**Context:** Only works in project group chats.
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `issueId` | number | No | Issue ID. If omitted, picks next by priority. |
+| `role` | `"dev"` \| `"qa"` | No | Worker role. Auto-detected from issue label if omitted. |
+| `projectGroupId` | string | No | Project group ID. Auto-detected from group context. |
+| `level` | string | No | Developer level (`junior`, `medior`, `senior`, `reviewer`). Auto-detected if omitted. |
+
+**What it does atomically:**
+
+1. Resolves project from `projects.json`
+2. Validates no active worker for this role
+3. Fetches issue from tracker, verifies correct label state
+4. Assigns level (LLM-chosen via `level` param → label detection → keyword heuristic fallback)
+5. Resolves level to model ID via config or defaults
+6. Loads prompt instructions from `projects/roles/<project>/<role>.md`
+7. Looks up existing session for assigned level (session-per-level)
+8. Transitions label (e.g. `To Do` → `Doing`)
+9. Creates session via Gateway RPC if new (`sessions.patch`)
+10. Dispatches task to worker session via CLI (`openclaw gateway call agent`)
+11. Updates `projects.json` state (active, issueId, level, session key)
+12. Writes audit log entries (work_start + model_selection)
+13. Sends notification
+14. Returns announcement text
+
+**Level selection priority:**
+
+1. `level` parameter (LLM-selected) — highest priority
+2. Issue label (e.g. a label named "junior" or "senior")
+3. Keyword heuristic from `model-selector.ts` — fallback
+
+**Execution guards:**
+
+- Rejects if role already has an active worker
+- Respects `roleExecution` (sequential: rejects if other role is active)
+
+**On failure:** Rolls back label transition. No orphaned state.
+
+---
+
+### `work_finish`
+
+Complete a task with a result. Called by workers (DEV/QA sub-agent sessions) directly, or by the orchestrator.
+
+**Source:** [`lib/tools/work-finish.ts`](../lib/tools/work-finish.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `role` | `"dev"` \| `"qa"` | Yes | Worker role |
+| `result` | string | Yes | Completion result (see table below) |
+| `projectGroupId` | string | Yes | Project group ID |
+| `summary` | string | No | Brief summary for the announcement |
+| `prUrl` | string | No | PR/MR URL (auto-detected if omitted) |
+
+**Valid results by role:**
+
+| Role | Result | Label transition | Side effects |
+|---|---|---|---|
+| DEV | `"done"` | Doing → To Test | git pull, auto-detect PR URL |
+| DEV | `"blocked"` | Doing → To Do | Task returns to queue |
+| QA | `"pass"` | Testing → Done | Issue closed |
+| QA | `"fail"` | Testing → To Improve | Issue reopened |
+| QA | `"refine"` | Testing → Refining | Awaits human decision |
+| QA | `"blocked"` | Testing → To Test | Task returns to QA queue |
+
+**What it does atomically:**
+
+1. Validates role:result combination
+2. Resolves project and active worker
+3. Executes completion via pipeline service (label transition + side effects)
+4. Deactivates worker (sessions map preserved for reuse)
+5. Sends notification
+6. Ticks queue to fill free worker slots
+7. Writes audit log
+
+**Auto-chaining** (when enabled on the project): `dev:done` dispatches QA automatically. `qa:fail` re-dispatches DEV using the previous level.
+
+---
+
+## Task Management
+
+### `task_create`
+
+Create a new issue in the project's issue tracker.
+
+**Source:** [`lib/tools/task-create.ts`](../lib/tools/task-create.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | Yes | Project group ID |
+| `title` | string | Yes | Issue title |
+| `description` | string | No | Full issue body (markdown) |
+| `label` | StateLabel | No | State label. Defaults to `"Planning"`. |
+| `assignees` | string[] | No | GitHub/GitLab usernames to assign |
+| `pickup` | boolean | No | If true, immediately pick up for DEV after creation |
+
+**Use cases:**
+
+- Orchestrator creates tasks from chat messages
+- Workers file follow-up bugs discovered during development
+- Breaking down epics into smaller tasks
+
+**Default behavior:** Creates issues in `"Planning"` state. Only use `"To Do"` when the user explicitly requests immediate work.
+
+---
+
+### `task_update`
+
+Change an issue's state label manually without going through the full pickup/complete flow.
+
+**Source:** [`lib/tools/task-update.ts`](../lib/tools/task-update.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | Yes | Project group ID |
+| `issueId` | number | Yes | Issue ID to update |
+| `state` | StateLabel | Yes | New state label |
+| `reason` | string | No | Audit log reason for the change |
+
+**Valid states:** `Planning`, `To Do`, `Doing`, `To Test`, `Testing`, `Done`, `To Improve`, `Refining`
+
+**Use cases:**
+
+- Manual state adjustments (e.g. `Planning → To Do` after approval)
+- Failed auto-transitions that need correction
+- Bulk state changes by orchestrator
+
+---
+
+### `task_comment`
+
+Add a comment to an issue for feedback, notes, or discussion.
+
+**Source:** [`lib/tools/task-comment.ts`](../lib/tools/task-comment.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | Yes | Project group ID |
+| `issueId` | number | Yes | Issue ID to comment on |
+| `body` | string | Yes | Comment body (markdown) |
+| `authorRole` | `"dev"` \| `"qa"` \| `"orchestrator"` | No | Attribution role prefix |
+
+**Use cases:**
+
+- QA adds review feedback before pass/fail decision
+- DEV posts implementation notes or progress updates
+- Orchestrator adds summary comments
+
+When `authorRole` is provided, the comment is prefixed with a role emoji and attribution label.
+
+---
+
+## Operations
+
+### `status`
+
+Lightweight queue + worker state dashboard.
+
+**Source:** [`lib/tools/status.ts`](../lib/tools/status.ts)
+
+**Context:** Auto-filters to project in group chats. Shows all projects in DMs.
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | No | Filter to specific project. Omit for all. |
+
+**Returns per project:**
+
+- Worker state: active/idle, current issue, level, start time
+- Queue counts: To Do, To Test, To Improve
+- Role execution mode
+
+---
+
+### `health`
+
+Worker health scan with optional auto-fix.
+
+**Source:** [`lib/tools/health.ts`](../lib/tools/health.ts)
+
+**Context:** Auto-filters to project in group chats.
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | No | Filter to specific project. Omit for all. |
+| `fix` | boolean | No | Apply fixes for detected issues. Default: `false` (read-only). |
+| `activeSessions` | string[] | No | Active session IDs for zombie detection. |
+
+**Health checks:**
+
+| Issue | Severity | Detection | Auto-fix |
+|---|---|---|---|
+| Active worker with no session key | Critical | `active=true` but no session in map | Deactivate worker |
+| Active worker whose session is dead | Critical | Session key not in active sessions list | Deactivate worker, revert label |
+| Worker active >2 hours | Warning | `startTime` older than 2h | Deactivate worker, revert label to queue |
+| Inactive worker with lingering issue ID | Warning | `active=false` but `issueId` still set | Clear issueId |
+
+---
+
+### `work_heartbeat`
+
+Manual trigger for heartbeat: health fix + queue dispatch. Same logic as the background heartbeat service, but invoked on demand.
+
+**Source:** [`lib/tools/work-heartbeat.ts`](../lib/tools/work-heartbeat.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | No | Target single project. Omit for all. |
+| `dryRun` | boolean | No | Report only, don't dispatch. Default: `false`. |
+| `maxPickups` | number | No | Max worker dispatches per tick. |
+| `activeSessions` | string[] | No | Active session IDs for zombie detection. |
+
+**Two-pass sweep:**
+
+1. **Health pass** — Runs `checkWorkerHealth` per project per role. Auto-fixes zombies, stale workers, orphaned state.
+2. **Tick pass** — Calls `projectTick` per project. Fills free worker slots by priority (To Improve > To Test > To Do).
+
+**Execution guards:**
+
+- `projectExecution: "sequential"` — only one project active at a time
+- `roleExecution: "sequential"` — only one role (DEV or QA) active at a time per project (enforced in `projectTick`)
+
+---
+
+## Setup
+
+### `project_register`
+
+One-time project setup. Creates state labels, scaffolds prompt files, adds project to state.
+
+**Source:** [`lib/tools/project-register.ts`](../lib/tools/project-register.ts)
+
+**Context:** Only works in the Telegram/WhatsApp group being registered.
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | No | Auto-detected from current group if omitted |
+| `name` | string | Yes | Short project name (e.g. `my-webapp`) |
+| `repo` | string | Yes | Path to git repo (e.g. `~/git/my-project`) |
+| `groupName` | string | No | Display name. Defaults to `Project: {name}`. |
+| `baseBranch` | string | Yes | Base branch for development |
+| `deployBranch` | string | No | Deploy branch. Defaults to baseBranch. |
+| `deployUrl` | string | No | Deployment URL |
+| `roleExecution` | `"parallel"` \| `"sequential"` | No | DEV/QA parallelism. Default: `"parallel"`. |
+
+**What it does atomically:**
+
+1. Validates project not already registered
+2. Resolves repo path, auto-detects GitHub/GitLab from git remote
+3. Verifies provider health (CLI installed and authenticated)
+4. Creates all 8 state labels (idempotent — safe to run again)
+5. Adds project entry to `projects.json` with empty worker state
+   - DEV sessions: `{ junior: null, medior: null, senior: null }`
+   - QA sessions: `{ reviewer: null, tester: null }`
+6. Scaffolds prompt files: `projects/roles/<project>/dev.md` and `qa.md`
+7. Writes audit log
+
+---
+
+### `setup`
+
+Agent + workspace initialization.
+
+**Source:** [`lib/tools/setup.ts`](../lib/tools/setup.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `newAgentName` | string | No | Create a new agent. Omit to configure current workspace. |
+| `channelBinding` | `"telegram"` \| `"whatsapp"` | No | Channel to bind (with `newAgentName` only) |
+| `migrateFrom` | string | No | Agent ID to migrate channel binding from |
+| `models` | object | No | Model overrides per role and level (see [Configuration](CONFIGURATION.md#model-tiers)) |
+| `projectExecution` | `"parallel"` \| `"sequential"` | No | Project execution mode |
+
+**What it does:**
+
+1. Creates a new agent or configures existing workspace
+2. Optionally binds messaging channel (Telegram/WhatsApp)
+3. Optionally migrates channel binding from another agent
+4. Writes workspace files: AGENTS.md, HEARTBEAT.md, `projects/projects.json`
+5. Configures model tiers in `openclaw.json`
+
+---
+
+### `onboard`
+
+Conversational onboarding guide. Returns step-by-step instructions for the agent to walk the user through setup.
+
+**Source:** [`lib/tools/onboard.ts`](../lib/tools/onboard.ts)
+
+**Context:** Works in DMs and via-agent. Blocks group chats (setup should not happen in project groups).
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `mode` | `"first-run"` \| `"reconfigure"` | No | Auto-detected from current state |
+
+**Flow:**
+
+1. Call `onboard` — returns QA-style step-by-step instructions
+2. Agent walks user through: agent selection, channel binding, model tiers
+3. Agent calls `setup` with collected answers
+4. User registers projects via `project_register` in group chats
+
+---
+
+## Completion Rules Reference
+
+The pipeline service (`lib/services/pipeline.ts`) defines declarative completion rules:
+
+```
+dev:done    → Doing    → To Test     (git pull, detect PR)
+dev:blocked → Doing    → To Do       (return to queue)
+qa:pass     → Testing  → Done        (close issue)
+qa:fail     → Testing  → To Improve  (reopen issue)
+qa:refine   → Testing  → Refining    (await human decision)
+qa:blocked  → Testing  → To Test     (return to QA queue)
+```
+
+## Issue Priority Order
+
+When the heartbeat or `work_heartbeat` fills free worker slots, issues are prioritized:
+
+1. **To Improve** — QA failures get fixed first (highest priority)
+2. **To Test** — Completed DEV work gets reviewed next
+3. **To Do** — Fresh tasks are picked up last
+
+This ensures the pipeline clears its backlog before starting new work.