Merge pull request #129 from laurentenhoor/claude/update-docs-benefits-BhoG1

2026-02-11 15:48:12 +08:00
parent 163ac6ed3d b8ea37189b
commit 1e15c42657
12 changed files with 1501 additions and 896 deletions
--- a/README.md
+++ b/README.md
@@ -2,393 +2,235 @@
  <img src="assets/DevClaw.png" width="300" alt="DevClaw Logo">
 </p>

-# DevClaw - Development Plugin for OpenClaw
+# DevClaw — Development Plugin for OpenClaw

-**Every group chat becomes an autonomous development team.**
+**Turn any group chat into a dev team that ships.**

-Add the agent to a Telegram/WhatsApp group, point it at a GitLab/GitHub repo — that group now has an **orchestrator** managing the backlog, a **DEV** worker session writing code, and a **QA** worker session reviewing it. All autonomous. Add another group, get another team. Each project runs in complete isolation with its own task queue, workers, and session state.
+DevClaw is a plugin for [OpenClaw](https://openclaw.ai) that turns your orchestrator agent into a development manager. It hires developers, assigns tasks, reviews code, and keeps the pipeline moving — across as many projects as you have group chats. [Get started &rarr;](#getting-started)

-DevClaw is the [OpenClaw](https://openclaw.ai) plugin that makes this work.
+---

-## Why
+## What it looks like

-[OpenClaw](https://openclaw.ai) is great at giving AI agents the ability to develop software — spawn worker sessions, manage sessions, work with code. But running a real multi-project development pipeline exposes a gap: the orchestration layer between "agent can write code" and "agent reliably manages multiple projects" is brittle. Every task involves 10+ coordinated steps across GitLab labels, session state, model selection, and audit logging. Agents forget steps, corrupt state, null out session IDs they should preserve, or pick the wrong model for the job.
+You have two projects in two Telegram groups. You go to bed. You wake up:

-DevClaw fills that gap with guardrails. It gives the orchestrator atomic tools that make it impossible to forget a label transition, lose a session reference, or skip an audit log entry. The complexity of multi-project orchestration moves from agent instructions (that LLMs follow imperfectly) into deterministic code (that runs the same way every time).
+```
+── Group: "Dev - My Webapp" ──────────────────────────────

-## The idea
+Agent:  "⚡ Sending DEV (medior) for #42: Add login page"
+Agent:  "✅ DEV DONE #42 — Login page with OAuth. Moved to QA."
+Agent:  "🔍 Sending QA (reviewer) for #42: Add login page"
+Agent:  "🎉 QA PASS #42. Issue closed."
+Agent:  "⚡ Sending DEV (junior) for #43: Fix button color on /settings"
+Agent:  "✅ DEV DONE #43 — Updated to brand blue. Moved to QA."
+Agent:  "❌ QA FAIL #43 — Color doesn't match dark mode. Back to DEV."
+Agent:  "⚡ Sending DEV (junior) for #43: Fix button color on /settings"

-One orchestrator agent manages all your projects. It reads task backlogs, creates issues, decides priorities, and delegates work. For each task, DevClaw assigns a developer from your **team** — a junior, medior, or senior dev writes the code, then a QA engineer reviews it. Every Telegram/WhatsApp group is a separate project — the orchestrator keeps them completely isolated while managing them all from a single process.
+  You:  "Create an issue for refactoring the profile page, pick it up."

-DevClaw gives the orchestrator nine tools that replace hundreds of lines of manual orchestration logic. Instead of following a 10-step checklist per task (fetch issue, check labels, pick model, check for existing session, transition label, dispatch task, update state, log audit event...), it calls `task_pickup` and the plugin handles everything atomically — including session dispatch. Workers call `task_complete` themselves for atomic state updates, and can file follow-up issues via `task_create`.
+Agent:  created #44 "Refactor user profile page" on GitHub — To Do
+Agent:  "⚡ Sending DEV (medior) for #44: Refactor user profile page"

-## Developer tiers
+Agent:  "✅ DEV DONE #43 — Fixed dark-mode color. Back to QA."
+Agent:  "🎉 QA PASS #43. Issue closed."

-DevClaw uses a developer seniority model. Each tier maps to a configurable LLM model:
+── Group: "Dev - My API" ─────────────────────────────────

-| Tier       | Role                | Default model                 | Assigns to                                        |
-| ---------- | ------------------- | ----------------------------- | ------------------------------------------------- |
-| **junior** | Junior developer    | `anthropic/claude-haiku-4-5`  | Typos, single-file fixes, simple changes          |
-| **medior** | Mid-level developer | `anthropic/claude-sonnet-4-5` | Features, bug fixes, multi-file changes           |
-| **senior** | Senior developer    | `anthropic/claude-opus-4-5`   | Architecture, migrations, system-wide refactoring |
-| **qa**     | QA engineer         | `anthropic/claude-sonnet-4-5` | Code review, test validation                      |
-
-Configure which model each tier uses during setup or in `openclaw.json` plugin config.
-
-## How it works
-
-```mermaid
-graph TB
-    subgraph "Group Chat A"
-        direction TB
-        A_O["🎯 Orchestrator"]
-        A_GL[GitLab Issues]
-        A_DEV["🔧 DEV (worker session)"]
-        A_QA["🔍 QA (worker session)"]
-        A_O -->|task_pickup| A_GL
-        A_O -->|task_pickup dispatches| A_DEV
-        A_O -->|task_pickup dispatches| A_QA
-    end
-
-    subgraph "Group Chat B"
-        direction TB
-        B_O["🎯 Orchestrator"]
-        B_GL[GitLab Issues]
-        B_DEV["🔧 DEV (worker session)"]
-        B_QA["🔍 QA (worker session)"]
-        B_O -->|task_pickup| B_GL
-        B_O -->|task_pickup dispatches| B_DEV
-        B_O -->|task_pickup dispatches| B_QA
-    end
-
-    subgraph "Group Chat C"
-        direction TB
-        C_O["🎯 Orchestrator"]
-        C_GL[GitLab Issues]
-        C_DEV["🔧 DEV (worker session)"]
-        C_QA["🔍 QA (worker session)"]
-        C_O -->|task_pickup| C_GL
-        C_O -->|task_pickup dispatches| C_DEV
-        C_O -->|task_pickup dispatches| C_QA
-    end
-
-    AGENT["Single OpenClaw Agent"]
-    AGENT --- A_O
-    AGENT --- B_O
-    AGENT --- C_O
+Agent:  "🧠 Spawning DEV (senior) for #18: Migrate auth to OAuth2"
+Agent:  "✅ DEV DONE #18 — OAuth2 provider with refresh tokens. Moved to QA."
+Agent:  "🎉 QA PASS #18. Issue closed."
+Agent:  "⚡ Sending DEV (medior) for #19: Add rate limiting to /api/search"
 ```

-It's the same agent process — but each group chat gives it a different project context. The orchestrator role, the workers, the task queue, and all state are fully isolated per group.
+Multiple issues shipped, a QA failure automatically retried, and a second project's migration completed — all while you slept. When you dropped in mid-stream to create an issue, the scheduler kept going before, during, and after.

-## Task lifecycle
+---

-Every task (GitLab issue) moves through a fixed pipeline of label states. Issues are created by the orchestrator agent or by worker sessions — not manually. DevClaw tools handle every transition atomically — label change, state update, audit log, and session management in a single call.
+## Why DevClaw
+
+### Autonomous multi-project development
+
+Each project is fully isolated — own queue, workers, sessions, and state. DEV and QA execute in parallel within each project, and multiple projects run simultaneously. A token-free scheduling engine drives it all autonomously:
+
+- **[Scheduling engine](#automatic-scheduling)** — `work_heartbeat` continuously scans queues, dispatches workers, and drives DEV → QA → DEV [feedback loops](#how-tasks-flow-between-roles)
+- **[Project isolation](#execution-modes)** — parallel workers per project, parallel projects across the system
+- **[Role instructions](#custom-instructions-per-project)** — per-project, per-role prompts injected at dispatch time
+
+### Process enforcement
+
+GitHub/GitLab issues are the single source of truth — not an internal database. Every tool call wraps the full operation into deterministic code with rollback on failure:
+
+- **[External task state](#your-issues-stay-in-your-tracker)** — labels, transitions, and status queries go through your issue tracker
+- **[Atomic operations](#what-atomic-means-here)** — label transition + state update + session dispatch + audit log in one call
+- **[Tool-based guardrails](#the-toolbox)** — 11 tools enforce the process; the agent provides intent, the plugin handles mechanics
+
+### ~60-80% token savings
+
+Three mechanisms compound to cut token usage dramatically versus running one large model with fresh context each time:
+
+- **[Tier selection](#meet-your-team)** — Haiku for typos, Sonnet for features, Opus for architecture (~30-50% on simple tasks)
+- **[Session reuse](#sessions-accumulate-context)** — workers accumulate codebase knowledge across tasks (~40-60% per task)
+- **[Token-free scheduling](#automatic-scheduling)** — `work_heartbeat` runs on pure CLI calls, zero LLM tokens for orchestration
+
+---
+
+## The problem DevClaw solves
+
+OpenClaw is a great multi-agent runtime. It handles sessions, tools, channels, gateway RPC — everything you need to run AI agents. But it's a general-purpose platform. It has no opinion about how software gets built.
+
+Without DevClaw, your orchestrator agent has to figure out on its own how to:
+- Pick the right model for the task complexity
+- Create or reuse the right worker session
+- Transition issue labels in the right order
+- Track which worker is doing what across projects
+- Schedule QA after DEV completes, and re-schedule DEV after QA fails
+- Detect crashed workers and recover
+- Log everything for auditability
+
+That's a lot of reasoning per task. LLMs do it imperfectly — they forget steps, corrupt state, pick the wrong model, lose session references. You end up babysitting the thing you built to avoid babysitting.
+
+DevClaw moves all of that into deterministic plugin code. The agent says "pick up issue #42." The plugin handles the other 10 steps atomically. Every time, the same way, zero reasoning tokens spent on orchestration.
+
+---
+
+## Meet your team
+
+DevClaw doesn't think in model IDs. It thinks in people.
+
+When a task comes in, you don't configure `anthropic/claude-sonnet-4-5` — you assign a **medior developer**. The orchestrator evaluates task complexity and picks the right person for the job:
+
+### Developers
+
+| Level | Assigns to | Model |
+|---|---|---|
+| **Junior** | Typos, CSS fixes, renames, single-file changes | Haiku |
+| **Medior** | Features, bug fixes, multi-file changes | Sonnet |
+| **Senior** | Architecture, migrations, system-wide refactoring | Opus |
+
+### QA
+
+| Level | Assigns to | Model |
+|---|---|---|
+| **Reviewer** | Code review, test validation, PR inspection | Sonnet |
+| **Tester** | Manual testing, smoke tests | Haiku |
+
+A CSS typo gets the intern. A database migration gets the architect. You're not burning Opus tokens on a color change, and you're not sending Haiku to redesign your auth system.
+
+Every mapping is [configurable](docs/CONFIGURATION.md#model-tiers) — swap in any model you want per level.
+
+---
+
+## How a task moves through the pipeline
+
+Every issue follows the same path, no exceptions. DevClaw enforces it:
+
+```
+Planning → To Do → Doing → To Test → Testing → Done
+```

 ```mermaid
 stateDiagram-v2
    [*] --> Planning
    Planning --> ToDo: Ready for development

-    ToDo --> Doing: task_pickup (DEV) ⇄ blocked
-    Doing --> ToTest: task_complete (DEV done)
+    ToDo --> Doing: DEV picks up
+    Doing --> ToTest: DEV done

-    ToTest --> Testing: task_pickup (QA) / auto-chain ⇄ blocked
-    Testing --> Done: task_complete (QA pass)
-    Testing --> ToImprove: task_complete (QA fail)
-    Testing --> Refining: task_complete (QA refine)
+    ToTest --> Testing: Scheduler picks up QA
+    Testing --> Done: QA pass (issue closed)
+    Testing --> ToImprove: QA fail (back to DEV)
+    Testing --> Refining: QA needs human input

-    ToImprove --> Doing: task_pickup (DEV fix) or auto-chain
-    Refining --> ToDo: Human decision
+    ToImprove --> Doing: Scheduler picks up DEV fix
+    Refining --> ToDo: Human decides

    Done --> [*]
 ```

-### Worker self-reporting
+These labels live on your actual GitHub/GitLab issues. Not in some internal database — in the tool you already use. Filter by `Doing` in GitHub to see what's in progress. Set up a webhook on `Done` to trigger deploys. The issue tracker is the source of truth.

-Workers (DEV/QA sub-agent sessions) call `task_complete` directly when they finish — no orchestrator involvement needed for the state transition. Workers can also call `task_create` to file follow-up issues they discover during work.
+### What "atomic" means here

-### Completion enforcement
+When you say "pick up #42 for DEV", the plugin does all of this in one operation:
+1. Verifies the issue is in the right state
+2. Picks the developer level (or uses what you specified)
+3. Transitions the label (`To Do` → `Doing`)
+4. Creates or reuses the right worker session
+5. Dispatches the task with project-specific instructions
+6. Updates internal state
+7. Logs an audit entry

-Three layers guarantee that `task_complete` always runs, preventing tasks from getting stuck in "Doing" or "Testing" forever:
+If step 4 fails, step 3 is rolled back. No half-states, no orphaned labels, no "the issue says Doing but nobody's working on it."

-1. **Completion contract** — Every task message includes a mandatory section requiring the worker to call `task_complete`, even on failure. Workers use `"blocked"` if stuck.
-2. **Blocked result** — Both DEV and QA can return `"blocked"` to gracefully put a task back in queue (`Doing → To Do`, `Testing → To Test`) instead of silently dying.
-3. **Stale worker watchdog** — The heartbeat health check detects workers active >2 hours and auto-reverts labels to queue, catching sessions that crashed or ran out of context.
+---

-### Auto-chaining
+## What happens behind the scenes

-When a project has `autoChain: true`, `task_complete` automatically dispatches the next step:
+### Workers report back themselves

- **DEV "done"** → QA is dispatched immediately (using the qa tier)
- **QA "fail"** → DEV fix is dispatched immediately (reuses previous DEV tier)
- **QA "pass" / "refine" / "blocked"** → no chaining (pipeline done, needs human input, or returned to queue)
- **DEV "blocked"** → no chaining (returned to queue for retry)
+When a developer finishes, they call `work_finish` directly — no orchestrator involved:

-When `autoChain` is false, `task_complete` returns a `nextAction` hint for the orchestrator to act on.
+- **DEV "done"** → label moves to `To Test`, scheduler picks up QA on next tick
+- **DEV "blocked"** → label moves back to `To Do`, task returns to queue
+- **QA "pass"** → label moves to `Done`, issue closes
+- **QA "fail"** → label moves to `To Improve`, scheduler picks up DEV on next tick

-## Session reuse
+The orchestrator doesn't need to poll, check, or coordinate. Workers are self-reporting.

-Worker sessions are expensive to start — each new spawn requires the session to read the full codebase (~50K tokens). DevClaw maintains **separate sessions per tier per role** (session-per-tier design). When a medior dev finishes task A and picks up task B on the same project, the plugin detects the existing session and sends the task directly — no new session needed.
+### Sessions accumulate context

-The plugin handles session dispatch internally via OpenClaw CLI. The orchestrator agent never calls `sessions_spawn` or `sessions_send` — it just calls `task_pickup` and the plugin does the rest.
+Each developer level gets its own persistent session per project. Your medior dev that's done 5 features on `my-app` already knows the codebase — it doesn't re-read 50K tokens of source code every time it picks up a new task.

-```mermaid
-sequenceDiagram
-    participant O as Orchestrator
-    participant DC as DevClaw Plugin
-    participant GL as GitLab
-    participant S as Worker Session
+That's a **~40-60% token saving per task** from session reuse alone.

-    O->>DC: task_pickup({ issueId: 42, role: "dev" })
-    DC->>GL: Fetch issue, verify label
-    DC->>DC: Assign tier (junior/medior/senior)
-    DC->>DC: Check existing session for assigned tier
-    DC->>GL: Transition label (To Do → Doing)
-    DC->>S: Dispatch task via CLI (create or reuse session)
-    DC->>DC: Update projects.json, write audit log
-    DC-->>O: { success: true, announcement: "🔧 DEV (medior) picking up #42" }
-```
+Combined with tier selection (not using Opus when Haiku will do) and the token-free heartbeat (more on that next), DevClaw significantly reduces your token bill versus running everything through one large model.

-## Developer assignment
+### Everything is logged

-The orchestrator LLM evaluates each issue's title, description, and labels to assign the appropriate developer tier, then passes it to `task_pickup` via the `model` parameter. This gives the LLM full context for the decision — it can weigh factors like codebase familiarity, task dependencies, and recent failure history that keyword matching would miss.
-
-The keyword heuristic in `model-selector.ts` serves as a **fallback only**, used when the orchestrator omits the `model` parameter.
-
-| Tier   | Role                | When                                                        |
-| ------ | ------------------- | ----------------------------------------------------------- |
-| junior | Junior developer    | Typos, CSS, renames, copy changes                           |
-| medior | Mid-level developer | Features, bug fixes, multi-file changes                     |
-| senior | Senior developer    | Architecture, migrations, security, system-wide refactoring |
-| qa     | QA engineer         | All QA tasks (code review, test validation)                 |
-
-## State management
-
-All project state lives in a single `projects/projects.json` file in the orchestrator's workspace, keyed by Telegram group ID:
-
-```json
-{
-  "projects": {
-    "-1234567890": {
-      "name": "my-webapp",
-      "repo": "~/git/my-webapp",
-      "groupName": "Dev - My Webapp",
-      "baseBranch": "development",
-      "autoChain": true,
-      "dev": {
-        "active": false,
-        "issueId": null,
-        "model": "medior",
-        "sessions": {
-          "junior": "agent:orchestrator:subagent:a9e4d078-...",
-          "medior": "agent:orchestrator:subagent:b3f5c912-...",
-          "senior": null
-        }
-      },
-      "qa": {
-        "active": false,
-        "issueId": null,
-        "model": "qa",
-        "sessions": {
-          "qa": "agent:orchestrator:subagent:18707821-..."
-        }
-      }
-    }
-  }
-}
-```
-
-Key design decisions:
-
- **Session-per-tier** — each tier gets its own worker session, accumulating context independently. Tier selection maps directly to a session key.
- **Sessions preserved on completion** — when a worker completes a task, `sessions` map is **preserved** (only `active` and `issueId` are cleared). This enables session reuse on the next pickup.
- **Plugin-controlled dispatch** — the plugin creates and dispatches to sessions via OpenClaw CLI (`sessions.patch` + `openclaw agent`). The orchestrator agent never calls `sessions_spawn` or `sessions_send`.
- **Sessions persist indefinitely** — no auto-cleanup. `session_health` handles manual cleanup when needed.
-
-All writes go through atomic temp-file-then-rename to prevent corruption.
-
-## Tools
-
-### `devclaw_setup`
-
-Set up DevClaw in an agent's workspace. Creates AGENTS.md, HEARTBEAT.md, role templates, and configures models. Can optionally create a new agent.
-
-**Parameters:**
-
- `newAgentName` (string, optional) — Create a new agent with this name
- `models` (object, optional) — Model overrides per tier: `{ junior, medior, senior, qa }`
-
-### `task_pickup`
-
-Pick up a task from the issue queue for a DEV or QA worker.
-
-**Parameters:**
-
- `issueId` (number, required) — Issue ID
- `role` ("dev" | "qa", required) — Worker role
- `projectGroupId` (string, required) — Telegram group ID
- `model` (string, optional) — Developer tier (junior, medior, senior, qa). The orchestrator should evaluate the task complexity and choose. Falls back to keyword heuristic if omitted.
-
-**What it does atomically:**
-
-1. Resolves project from `projects.json`
-2. Validates no active worker for this role
-3. Fetches issue from issue tracker, verifies correct label state
-4. Assigns tier (LLM-chosen via `model` param, keyword heuristic fallback)
-5. Loads prompt instructions from `projects/prompts/<project>/<role>.md`
-6. Looks up existing session for assigned tier (session-per-tier)
-7. Transitions label (e.g. `To Do` → `Doing`)
-8. Creates session via Gateway RPC if new (`sessions.patch`)
-9. Dispatches task to worker session via CLI (`openclaw agent`) with role instructions appended
-10. Updates `projects.json` state (active, issueId, tier, session key)
-11. Writes audit log entry
-12. Returns announcement text for the orchestrator to post
-
-### `task_complete`
-
-Complete a task with a result. Called by workers (DEV/QA sub-agent sessions) directly, or by the orchestrator.
-
-**Parameters:**
-
- `role` ("dev" | "qa", required)
- `result` ("done" | "pass" | "fail" | "refine" | "blocked", required)
- `projectGroupId` (string, required)
- `summary` (string, optional) — For the Telegram announcement
-
-**Results:**
-
- **DEV "done"** — Pulls latest code, moves label `Doing` → `To Test`, deactivates worker. If `autoChain` enabled, automatically dispatches QA.
- **DEV "blocked"** — Moves label `Doing` → `To Do`, deactivates worker. Task returns to queue for retry.
- **QA "pass"** — Moves label `Testing` → `Done`, closes issue, deactivates worker
- **QA "fail"** — Moves label `Testing` → `To Improve`, reopens issue. If `autoChain` enabled, automatically dispatches DEV fix (reuses previous DEV tier).
- **QA "refine"** — Moves label `Testing` → `Refining`, awaits human decision
- **QA "blocked"** — Moves label `Testing` → `To Test`, deactivates worker. Task returns to QA queue for retry.
-
-### `task_update`
-
-Change an issue's state label programmatically without going through the full pickup/complete flow.
-
-**Parameters:**
-
- `projectGroupId` (string, required) — Telegram/WhatsApp group ID
- `issueId` (number, required) — Issue ID to update
- `state` (string, required) — New state label (Planning, To Do, Doing, To Test, Testing, Done, To Improve, Refining)
- `reason` (string, optional) — Audit log reason for the change
-
-**Use cases:**
- Manual state adjustments (e.g., Planning → To Do after approval)
- Failed auto-transitions that need correction
- Bulk state changes by orchestrator
-
-### `task_comment`
-
-Add a comment to an issue for feedback, notes, or discussion.
-
-**Parameters:**
-
- `projectGroupId` (string, required) — Telegram/WhatsApp group ID
- `issueId` (number, required) — Issue ID to comment on
- `body` (string, required) — Comment body in markdown
- `authorRole` ("dev" | "qa" | "orchestrator", optional) — Attribution role
-
-**Use cases:**
- QA adds review feedback without blocking pass/fail
- DEV posts implementation notes or progress updates
- Orchestrator adds summary comments
-
-### `task_create`
-
-Create a new issue in the project's issue tracker. Used by workers to file follow-up bugs, or by the orchestrator to create tasks from chat.
-
-**Parameters:**
-
- `projectGroupId` (string, required) — Telegram group ID
- `title` (string, required) — Issue title
- `description` (string, optional) — Full issue body in markdown
- `label` (string, optional) — State label (defaults to "Planning")
- `assignees` (string[], optional) — Usernames to assign
- `pickup` (boolean, optional) — If true, immediately pick up for DEV after creation
-
-### `queue_status`
-
-Returns task queue counts and worker status across all projects (or a specific one).
-
-**Parameters:**
-
- `projectGroupId` (string, optional) — Omit for all projects
-
-### `session_health`
-
-Detects and optionally fixes state inconsistencies.
-
-**Parameters:**
-
- `autoFix` (boolean, optional) — Auto-fix zombies and stale state
-
-**What it does:**
-
- Queries live sessions via Gateway RPC (`sessions.list`)
- Cross-references with `projects.json` worker state
-
-**Checks:**
-
- Active worker with no session key (critical, auto-fixable)
- Active worker whose session is dead — zombie (critical, auto-fixable)
- Worker active for >2 hours — stale watchdog (warning, auto-fixable: reverts label to queue)
- Inactive worker with lingering issue ID (warning, auto-fixable)
-
-### `project_register`
-
-Register a new project with DevClaw. Creates all required issue tracker labels (idempotent), scaffolds role instruction files, and adds the project to `projects.json`. One-time setup per project. Auto-detects GitHub/GitLab from git remote.
-
-**Parameters:**
-
- `projectGroupId` (string, required) — Telegram group ID (key in projects.json)
- `name` (string, required) — Short project name
- `repo` (string, required) — Path to git repo (e.g. `~/git/my-project`)
- `groupName` (string, required) — Telegram group display name
- `baseBranch` (string, required) — Base branch for development
- `deployBranch` (string, optional) — Defaults to baseBranch
- `deployUrl` (string, optional) — Deployment URL
-
-**What it does atomically:**
-
-1. Validates project not already registered
-2. Resolves repo path, auto-detects GitHub/GitLab, and verifies access
-3. Creates all 8 state labels (idempotent — safe to run on existing projects)
-4. Adds project entry to `projects.json` with empty worker state and `autoChain: false`
-5. Scaffolds prompt instruction files: `projects/prompts/<project>/dev.md` and `projects/prompts/<project>/qa.md`
-6. Writes audit log entry
-7. Returns announcement text
-
-## Audit logging
-
-Every tool call automatically appends an NDJSON entry to `log/audit.log`. No manual logging required from the orchestrator agent.
-
-```jsonl
-{"ts":"2026-02-08T10:30:00Z","event":"task_pickup","project":"my-webapp","issue":42,"role":"dev","tier":"medior","sessionAction":"send"}
-{"ts":"2026-02-08T10:30:01Z","event":"model_selection","issue":42,"role":"dev","tier":"medior","reason":"Standard dev task"}
-{"ts":"2026-02-08T10:45:00Z","event":"task_complete","project":"my-webapp","issue":42,"role":"dev","result":"done"}
-```
-
-## Quick start
+Every tool call writes an NDJSON line to `audit.log`:

 ```bash
-# 1. Install the plugin
-cp -r devclaw ~/.openclaw/extensions/
-
-# 2. Run setup (interactive — creates agent, configures models, writes workspace files)
-openclaw devclaw setup
-
-# 3. Add bot to Telegram group, then register a project
-# (via the agent in Telegram)
+cat audit.log | jq 'select(.event=="work_start")'
 ```

-See the [Onboarding Guide](docs/ONBOARDING.md) for detailed instructions.
+Full trace of every task, every level selection, every label transition, every health fix. No manual logging needed.

-## Configuration
+---

-Model tier configuration in `openclaw.json`:
+## Automatic scheduling
+
+DevClaw doesn't wait for you to tell it what to do next. A background scheduling system continuously scans for available work and dispatches workers — zero LLM tokens, pure deterministic code. This is the engine that keeps the pipeline moving: when DEV finishes, the scheduler sees a `To Test` issue and dispatches QA. When QA fails, the scheduler sees a `To Improve` issue and dispatches DEV. No hand-offs, no orchestrator reasoning — just label-driven scheduling.
+
+### The `work_heartbeat`
+
+Every tick (default: 60 seconds), the scheduler runs two passes:
+
+1. **Health pass** — detects workers stuck for >2 hours, reverts their labels back to queue, deactivates them. Catches crashed sessions, context overflows, or workers that died without reporting back.
+2. **Queue pass** — scans for available tasks by priority (`To Improve` > `To Test` > `To Do`), fills free worker slots. DEV and QA slots are filled independently.
+
+All CLI calls and JSON reads. Workers only consume tokens when they actually start coding or reviewing. The scheduler also fires immediately after every `work_finish` (as a tick), so transitions happen without waiting for the next interval.
+
+### How tasks flow between roles
+
+When a worker calls `work_finish`, the plugin transitions the label. The scheduler picks up the rest:
+
+- **DEV "done"** → label moves to `To Test` → next tick dispatches QA
+- **QA "fail"** → label moves to `To Improve` → next tick dispatches DEV (reuses previous level)
+- **QA "pass"** → label moves to `Done`, issue closes
+- **"blocked"** → label reverts to queue (`To Do` or `To Test`) for retry
+
+No orchestrator involvement. Workers self-report, the scheduler fills free slots.
+
+### Execution modes
+
+Each project is fully isolated — its own queue, workers, sessions, state. No cross-project contamination. Two levels of parallelism control how work gets scheduled:
+
+- **Project-level (`roleExecution`)** — DEV and QA work simultaneously on different tasks (default: `parallel`) or take turns (`sequential`)
+- **Plugin-level (`projectExecution`)** — all registered projects dispatch workers independently (default: `parallel`) or only one project runs at a time (`sequential`)
+
+### Configuration
+
+All scheduling behavior is configurable in `openclaw.json`:

 ```json
 {
@@ -396,12 +238,12 @@ Model tier configuration in `openclaw.json`:
    "entries": {
      "devclaw": {
        "config": {
-          "models": {
-            "junior": "anthropic/claude-haiku-4-5",
-            "medior": "anthropic/claude-sonnet-4-5",
-            "senior": "anthropic/claude-opus-4-5",
-            "qa": "anthropic/claude-sonnet-4-5"
-          }
+          "work_heartbeat": {
+            "enabled": true,
+            "intervalSeconds": 60,
+            "maxPickupsPerTick": 4
+          },
+          "projectExecution": "parallel"
        }
      }
    }
@@ -409,61 +251,156 @@ Model tier configuration in `openclaw.json`:
 }
 ```

-Restrict tools to your orchestrator agent only:
+Per-project settings live in `projects.json`:

 ```json
 {
-  "agents": {
-    "list": [
-      {
-        "id": "my-orchestrator",
-        "tools": {
-          "allow": [
-            "devclaw_setup",
-            "task_pickup",
-            "task_complete",
-            "task_update",
-            "task_comment",
-            "task_create",
-            "queue_status",
-            "session_health",
-            "project_register"
-          ]
-        }
-      }
-    ]
+  "-1234567890": {
+    "name": "my-app",
+    "roleExecution": "parallel"
  }
 }
 ```

-> DevClaw uses an `IssueProvider` interface to abstract issue tracker operations. GitLab (via `glab` CLI) and GitHub (via `gh` CLI) are supported — the provider is auto-detected from the git remote URL. Jira is planned.
+| Setting | Where | Default | What it controls |
+|---|---|---|---|
+| `work_heartbeat.enabled` | `openclaw.json` | `true` | Turn the heartbeat on/off |
+| `work_heartbeat.intervalSeconds` | `openclaw.json` | `60` | Seconds between ticks |
+| `work_heartbeat.maxPickupsPerTick` | `openclaw.json` | `4` | Max workers dispatched per tick |
+| `projectExecution` | `openclaw.json` | `"parallel"` | All projects at once, or one at a time |
+| `roleExecution` | `projects.json` | `"parallel"` | DEV+QA at once, or one role at a time |

-## Prompt instructions
+See the [Configuration reference](docs/CONFIGURATION.md) for the full schema.

-Workers receive role-specific instructions appended to their task message. `project_register` scaffolds editable files:
+---
+
+## Task management
+
+### Your issues stay in your tracker
+
+DevClaw doesn't have its own task database. All task state lives in **GitHub Issues** or **GitLab Issues** — auto-detected from your git remote. The eight pipeline labels are created on your repo when you register a project. Your project manager sees progress in GitHub without knowing DevClaw exists. Your CI/CD can trigger on label changes. If you stop using DevClaw, your issues and labels stay exactly where they are.
+
+The provider is pluggable (`IssueProvider` interface). GitHub and GitLab work today. Jira, Linear, or anything else just needs to implement the same interface.
+
+### Creating, updating, and commenting
+
+Tasks can come from anywhere — the orchestrator creates them from chat, workers file bugs they discover mid-task, or you create them directly in GitHub/GitLab:

 ```
-workspace/
-├── projects/
-│   ├── projects.json     ← project state
-│   └── prompts/
-│       ├── my-webapp/    ← per-project prompts (edit to customize)
-│       │   ├── dev.md
-│       │   └── qa.md
-│       └── another-project/
-│           ├── dev.md
-│           └── qa.md
-├── log/
-│   └── audit.log         ← NDJSON event log
+You:    "Create an issue: fix the broken OAuth redirect"
+Agent:  creates issue #43 with label "Planning"
+
+You:    "Move #43 to To Do"
+Agent:  transitions label Planning → To Do
+
+You:    "Add a comment on #42: needs to handle the edge case for expired tokens"
+Agent:  adds comment attributed to "orchestrator"
 ```

-`task_pickup` loads `projects/prompts/<project>/<role>.md`. Edit these files to customize worker behavior per project — for example, adding project-specific deployment steps or test commands.
+Workers can also comment during work — QA leaves review feedback, DEV posts implementation notes. Every comment carries role attribution so you know who said what.

-## Requirements
+### Custom instructions per project

- [OpenClaw](https://openclaw.ai)
+Each project gets instruction files that workers receive with every task they pick up:
+
+```
+workspace/projects/roles/
+├── my-webapp/
+│   ├── dev.md     "Run npm test before committing. Deploy URL: staging.example.com"
+│   └── qa.md      "Check OAuth flow. Verify mobile responsiveness."
+├── my-api/
+│   ├── dev.md     "Run cargo test. Follow REST conventions in CONTRIBUTING.md"
+│   └── qa.md      "Verify all endpoints return correct status codes."
+└── default/
+    ├── dev.md     (fallback for projects without custom instructions)
+    └── qa.md
+```
+
+Deployment steps, test commands, coding standards, acceptance criteria — all injected at dispatch time, per project, per role.
+
+---
+
+## Getting started
+
+### Prerequisites
+
+- [OpenClaw](https://openclaw.ai) installed (`openclaw --version`)
 - Node.js >= 20
- [`glab`](https://gitlab.com/gitlab-org/cli) CLI installed and authenticated (GitLab provider), or [`gh`](https://cli.github.com) CLI (GitHub provider)
+- `gh` CLI ([GitHub](https://cli.github.com)) or `glab` CLI ([GitLab](https://gitlab.com/gitlab-org/cli)), authenticated
+
+### Install
+
+```bash
+cp -r devclaw ~/.openclaw/extensions/
+```
+
+### Set up through conversation
+
+The easiest way to configure DevClaw is to just talk to your agent:
+
+```
+You:   "Help me set up DevClaw"
+Agent: "I'll walk you through it. Should I use this agent as the
+        orchestrator, or create a new one?"
+You:   "Use this one"
+
+Agent: "Want to bind a messaging channel?"
+You:   "Telegram"
+
+Agent: "Here are the default developer assignments:
+        Junior → Haiku, Medior → Sonnet, Senior → Opus
+        Reviewer → Sonnet, Tester → Haiku
+        Keep these or customize?"
+You:   "Keep them"
+
+Agent: "Done. Want to register a project?"
+You:   "Yes — my-app at ~/git/my-app, main branch"
+
+Agent: "Project registered. 8 labels created on your repo.
+        Role instructions scaffolded. Try: 'check the queue'"
+```
+
+You can also use the [CLI wizard or non-interactive setup](docs/ONBOARDING.md#step-2-run-setup) for scripted environments.
+
+---
+
+## The toolbox
+
+DevClaw gives the orchestrator 11 tools. These aren't just convenience wrappers — they're **guardrails**. Each tool encodes a complex multi-step operation into a single atomic call. The agent provides intent, the plugin handles mechanics. The agent physically cannot skip a label transition, forget to update state, or dispatch to the wrong session — those decisions are made by deterministic code, not LLM reasoning.
+
+| Tool | What it does |
+|---|---|
+| `work_start` | Pick up a task — resolves level, transitions label, dispatches session, logs audit |
+| `work_finish` | Complete a task — transitions label, updates state, ticks queue for next dispatch |
+| `task_create` | Create a new issue (used by workers to file bugs they discover) |
+| `task_update` | Manually change an issue's state label |
+| `task_comment` | Add a comment to an issue (with role attribution) |
+| `status` | Dashboard: queue counts + who's working on what |
+| `health` | Detect zombie workers, stale sessions, state inconsistencies |
+| `work_heartbeat` | Manually trigger a health check + queue dispatch cycle |
+| `project_register` | One-time project setup: creates labels, scaffolds instructions, initializes state |
+| `setup` | Agent + workspace initialization |
+| `onboard` | Conversational setup guide |
+
+Full parameters and usage in the [Tools Reference](docs/TOOLS.md).
+
+---
+
+## Documentation
+
+| | |
+|---|---|
+| **[Architecture](docs/ARCHITECTURE.md)** | System design, session model, data flow, end-to-end diagrams |
+| **[Tools Reference](docs/TOOLS.md)** | Complete reference for all 11 tools |
+| **[Configuration](docs/CONFIGURATION.md)** | `openclaw.json`, `projects.json`, heartbeat, notifications |
+| **[Onboarding Guide](docs/ONBOARDING.md)** | Full step-by-step setup |
+| **[QA Workflow](docs/QA_WORKFLOW.md)** | QA process and review templates |
+| **[Context Awareness](docs/CONTEXT-AWARENESS.md)** | How tools adapt to group vs. DM vs. agent context |
+| **[Testing](docs/TESTING.md)** | Test suite, fixtures, CI/CD |
+| **[Management Theory](docs/MANAGEMENT.md)** | The delegation model behind the design |
+| **[Roadmap](docs/ROADMAP.md)** | What's coming next |
+
+---

 ## License

--- a/VERIFICATION.md
+++ b/VERIFICATION.md
@@ -1,45 +0,0 @@
-# Verification: task_create Default State
-
-## Issue #115 Request
-Change default state for new tasks from "To Do" to "Planning"
-
-## Current Implementation Status
-**Already implemented** - The default has been "Planning" since initial commit.
-
-### Code Evidence
-File: `lib/tools/task-create.ts` (line 68)
-```typescript
-const label = (params.label as StateLabel) ?? "Planning";
-```
-
-### Documentation Evidence
-File: `README.md` (line 308)
-```
- `label` (string, optional) — State label (defaults to "Planning")
-```
-
-### Tool Description
-The tool description itself states:
-```
-The issue is created with a state label (defaults to "Planning").
-```
-
-## Timeline
- **Feb 9, 2026** (commit 8a79755e): Initial task_create implementation with "Planning" default
- **Feb 10, 2026**: Issue #115 created requesting this change (already done)
-
-## Verification Test
-Default behavior can be verified by calling task_create without specifying a label:
-
-```javascript
-task_create({
-  projectGroupId: "-5239235162",
-  title: "Test Issue"
-  // label parameter omitted - should default to "Planning"
-})
-```
-
-Expected result: Issue created with "Planning" label, NOT "To Do"
-
-## Conclusion
-The requested feature is already fully implemented. No code changes needed.
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -1,64 +1,116 @@
 # DevClaw — Architecture & Component Interaction

+## How it works
+
+One OpenClaw agent process serves multiple group chats — each group gives it a different project context. The orchestrator role, the workers, the task queue, and all state are fully isolated per group.
+
+```mermaid
+graph TB
+    subgraph "Group Chat A"
+        direction TB
+        A_O["Orchestrator"]
+        A_GL[GitHub/GitLab Issues]
+        A_DEV["DEV (worker session)"]
+        A_QA["QA (worker session)"]
+        A_O -->|work_start| A_GL
+        A_O -->|dispatches| A_DEV
+        A_O -->|dispatches| A_QA
+    end
+
+    subgraph "Group Chat B"
+        direction TB
+        B_O["Orchestrator"]
+        B_GL[GitHub/GitLab Issues]
+        B_DEV["DEV (worker session)"]
+        B_QA["QA (worker session)"]
+        B_O -->|work_start| B_GL
+        B_O -->|dispatches| B_DEV
+        B_O -->|dispatches| B_QA
+    end
+
+    AGENT["Single OpenClaw Agent"]
+    AGENT --- A_O
+    AGENT --- B_O
+```
+
+Worker sessions are expensive to start — each new spawn reads the full codebase (~50K tokens). DevClaw maintains **separate sessions per level per role** ([session-per-level design](#session-per-level-design)). When a medior dev finishes task A and picks up task B on the same project, the accumulated context carries over — no re-reading the repo. The plugin handles all session dispatch internally via OpenClaw CLI; the orchestrator agent never calls `sessions_spawn` or `sessions_send`.
+
+```mermaid
+sequenceDiagram
+    participant O as Orchestrator
+    participant DC as DevClaw Plugin
+    participant IT as Issue Tracker
+    participant S as Worker Session
+
+    O->>DC: work_start({ issueId: 42, role: "dev" })
+    DC->>IT: Fetch issue, verify label
+    DC->>DC: Assign level (junior/medior/senior)
+    DC->>DC: Check existing session for assigned level
+    DC->>IT: Transition label (To Do → Doing)
+    DC->>S: Dispatch task via CLI (create or reuse session)
+    DC->>DC: Update projects.json, write audit log
+    DC-->>O: { success: true, announcement: "..." }
+```
+
 ## Agents vs Sessions

 Understanding the OpenClaw model is key to understanding how DevClaw works:

 - **Agent** — A configured entity in `openclaw.json`. Has a workspace, model, identity files (SOUL.md, IDENTITY.md), and tool permissions. Persists across restarts.
 - **Session** — A runtime conversation instance. Each session has its own context window and conversation history, stored as a `.jsonl` transcript file.
- **Sub-agent session** — A session created under the orchestrator agent for a specific worker role. NOT a separate agent — it's a child session running under the same agent, with its own isolated context. Format: `agent:<parent>:subagent:<uuid>`.
+- **Sub-agent session** — A session created under the orchestrator agent for a specific worker role. NOT a separate agent — it's a child session running under the same agent, with its own isolated context. Format: `agent:<parent>:subagent:<project>-<role>-<level>`.

-### Session-per-tier design
+### Session-per-level design

-Each project maintains **separate sessions per developer tier per role**. A project's DEV might have a junior session, a medior session, and a senior session — each accumulating its own codebase context over time.
+Each project maintains **separate sessions per developer level per role**. A project's DEV might have a junior session, a medior session, and a senior session — each accumulating its own codebase context over time.

 ```
 Orchestrator Agent (configured in openclaw.json)
  └─ Main session (long-lived, handles all projects)
       │
       ├─ Project A
-       │    ├─ DEV sessions: { junior: <uuid>, medior: <uuid>, senior: null }
-       │    └─ QA sessions:  { qa: <uuid> }
+       │    ├─ DEV sessions: { junior: <key>, medior: <key>, senior: null }
+       │    └─ QA sessions:  { reviewer: <key>, tester: null }
       │
       └─ Project B
-            ├─ DEV sessions: { junior: null, medior: <uuid>, senior: null }
-            └─ QA sessions:  { qa: <uuid> }
+            ├─ DEV sessions: { junior: null, medior: <key>, senior: null }
+            └─ QA sessions:  { reviewer: <key>, tester: null }
 ```

-Why per-tier instead of switching models on one session:
+Why per-level instead of switching models on one session:
 - **No model switching overhead** — each session always uses the same model
 - **Accumulated context** — a junior session that's done 20 typo fixes knows the project well; a medior session that's done 5 features knows it differently
 - **No cross-model confusion** — conversation history stays with the model that generated it
- **Deterministic reuse** — tier selection directly maps to a session key, no patching needed
+- **Deterministic reuse** — level selection directly maps to a session key, no patching needed

 ### Plugin-controlled session lifecycle

 DevClaw controls the **full** session lifecycle end-to-end. The orchestrator agent never calls `sessions_spawn` or `sessions_send` — the plugin handles session creation and task dispatch internally using the OpenClaw CLI:

 ```
-Plugin dispatch (inside task_pickup):
-  1. Assign tier, look up session, decide spawn vs send
+Plugin dispatch (inside work_start):
+  1. Assign level, look up session, decide spawn vs send
  2. New session:  openclaw gateway call sessions.patch → create entry + set model
-                   openclaw agent --session-id <key> --message "task..."
-  3. Existing:     openclaw agent --session-id <key> --message "task..."
+                   openclaw gateway call agent → dispatch task
+  3. Existing:     openclaw gateway call agent → dispatch task to existing session
  4. Return result to orchestrator (announcement text, no session instructions)
 ```

-The agent's only job after `task_pickup` returns is to post the announcement to Telegram. Everything else — tier assignment, session creation, task dispatch, state update, audit logging — is deterministic plugin code.
+The agent's only job after `work_start` returns is to post the announcement to Telegram. Everything else — level assignment, session creation, task dispatch, state update, audit logging — is deterministic plugin code.

 **Why this matters:** Previously the plugin returned instructions like `{ sessionAction: "spawn", model: "sonnet" }` and the agent had to correctly call `sessions_spawn` with the right params. This was the fragile handoff point where agents would forget `cleanup: "keep"`, use wrong models, or corrupt session state. Moving dispatch into the plugin eliminates that entire class of errors.

-**Session persistence:** Sessions created via `sessions.patch` persist indefinitely (no auto-cleanup). The plugin manages lifecycle explicitly through `session_health`.
+**Session persistence:** Sessions created via `sessions.patch` persist indefinitely (no auto-cleanup). The plugin manages lifecycle explicitly through the `health` tool.

 **What we trade off vs. registered sub-agents:**

 | Feature | Sub-agent system | Plugin-controlled | DevClaw equivalent |
 |---|---|---|---|
 | Auto-reporting | Sub-agent reports to parent | No | Heartbeat polls for completion |
-| Concurrency control | `maxConcurrent` | No | `task_pickup` checks `active` flag |
+| Concurrency control | `maxConcurrent` | No | `work_start` checks `active` flag |
 | Lifecycle tracking | Parent-child registry | No | `projects.json` tracks all sessions |
-| Timeout detection | `runTimeoutSeconds` | No | `session_health` flags stale >2h |
-| Cleanup | Auto-archive | No | `session_health` manual cleanup |
+| Timeout detection | `runTimeoutSeconds` | No | `health` flags stale >2h |
+| Cleanup | Auto-archive | No | `health` manual cleanup |

 DevClaw provides equivalent guardrails for everything except auto-reporting, which the heartbeat handles.

@@ -74,22 +126,22 @@ graph TB
    subgraph "OpenClaw Runtime"
        MS[Main Session<br/>orchestrator agent]
        GW[Gateway RPC<br/>sessions.patch / sessions.list]
-        CLI[openclaw agent CLI]
+        CLI[openclaw gateway call agent]
        DEV_J[DEV session<br/>junior]
        DEV_M[DEV session<br/>medior]
        DEV_S[DEV session<br/>senior]
-        QA_E[QA session<br/>qa]
+        QA_R[QA session<br/>reviewer]
    end

    subgraph "DevClaw Plugin"
-        TP[task_pickup]
-        TC[task_complete]
+        WS[work_start]
+        WF[work_finish]
        TCR[task_create]
-        QS[queue_status]
-        SH[session_health]
+        ST[status]
+        SH[health]
        PR[project_register]
-        DS[devclaw_setup]
-        TIER[Tier Resolver]
+        DS[setup]
+        TIER[Level Resolver]
        PJ[projects.json]
        AL[audit.log]
    end
@@ -103,34 +155,34 @@ graph TB
    TG -->|delivers| MS
    MS -->|announces| TG

-    MS -->|calls| TP
-    MS -->|calls| TC
+    MS -->|calls| WS
+    MS -->|calls| WF
    MS -->|calls| TCR
-    MS -->|calls| QS
+    MS -->|calls| ST
    MS -->|calls| SH
    MS -->|calls| PR
    MS -->|calls| DS

-    TP -->|resolves tier| TIER
-    TP -->|transitions labels| GL
-    TP -->|reads/writes| PJ
-    TP -->|appends| AL
-    TP -->|creates session| GW
-    TP -->|dispatches task| CLI
+    WS -->|resolves level| TIER
+    WS -->|transitions labels| GL
+    WS -->|reads/writes| PJ
+    WS -->|appends| AL
+    WS -->|creates session| GW
+    WS -->|dispatches task| CLI

-    TC -->|transitions labels| GL
-    TC -->|closes/reopens| GL
-    TC -->|reads/writes| PJ
-    TC -->|git pull| REPO
-    TC -->|auto-chain dispatch| CLI
-    TC -->|appends| AL
+    WF -->|transitions labels| GL
+    WF -->|closes/reopens| GL
+    WF -->|reads/writes| PJ
+    WF -->|git pull| REPO
+    WF -->|tick dispatch| CLI
+    WF -->|appends| AL

    TCR -->|creates issue| GL
    TCR -->|appends| AL

-    QS -->|lists issues by label| GL
-    QS -->|reads| PJ
-    QS -->|appends| AL
+    ST -->|lists issues by label| GL
+    ST -->|reads| PJ
+    ST -->|appends| AL

    SH -->|reads/writes| PJ
    SH -->|checks sessions| GW
@@ -144,12 +196,12 @@ graph TB
    CLI -->|sends task| DEV_J
    CLI -->|sends task| DEV_M
    CLI -->|sends task| DEV_S
-    CLI -->|sends task| QA_E
+    CLI -->|sends task| QA_R

    DEV_J -->|writes code, creates MRs| REPO
    DEV_M -->|writes code, creates MRs| REPO
    DEV_S -->|writes code, creates MRs| REPO
-    QA_E -->|reviews code, tests| REPO
+    QA_R -->|reviews code, tests| REPO
 ```

 ## End-to-end flow: human to sub-agent
@@ -163,7 +215,7 @@ sequenceDiagram
    participant MS as Main Session<br/>(orchestrator)
    participant DC as DevClaw Plugin
    participant GW as Gateway RPC
-    participant CLI as openclaw agent CLI
+    participant CLI as openclaw gateway call agent
    participant DEV as DEV Session<br/>(medior)
    participant GL as Issue Tracker

@@ -171,34 +223,34 @@ sequenceDiagram

    H->>TG: "check status" (or heartbeat triggers)
    TG->>MS: delivers message
-    MS->>DC: queue_status()
-    DC->>GL: glab issue list --label "To Do"
+    MS->>DC: status()
+    DC->>GL: list issues by label "To Do"
    DC-->>MS: { toDo: [#42], dev: idle }

    Note over MS: Decides to pick up #42 for DEV as medior

-    MS->>DC: task_pickup({ issueId: 42, role: "dev", model: "medior", ... })
-    DC->>DC: resolve tier "medior" → model ID
+    MS->>DC: work_start({ issueId: 42, role: "dev", level: "medior", ... })
+    DC->>DC: resolve level "medior" → model ID
    DC->>DC: lookup dev.sessions.medior → null (first time)
-    DC->>GL: glab issue update 42 --unlabel "To Do" --label "Doing"
+    DC->>GL: transition label "To Do" → "Doing"
    DC->>GW: sessions.patch({ key: new-session-key, model: "anthropic/claude-sonnet-4-5" })
-    DC->>CLI: openclaw agent --session-id <key> --message "Build login page for #42..."
+    DC->>CLI: openclaw gateway call agent --params { sessionKey, message }
    CLI->>DEV: creates session, delivers task
    DC->>DC: store session key in projects.json + append audit.log
-    DC-->>MS: { success: true, announcement: "🔧 DEV (medior) picking up #42" }
+    DC-->>MS: { success: true, announcement: "🔧 Spawning DEV (medior) for #42" }

-    MS->>TG: "🔧 DEV (medior) picking up #42: Add login page"
+    MS->>TG: "🔧 Spawning DEV (medior) for #42: Add login page"
    TG->>H: sees announcement

    Note over DEV: Works autonomously — reads code, writes code, creates MR
-    Note over DEV: Calls task_complete when done
+    Note over DEV: Calls work_finish when done

-    DEV->>DC: task_complete({ role: "dev", result: "done", ... })
-    DC->>GL: glab issue update 42 --unlabel "Doing" --label "To Test"
+    DEV->>DC: work_finish({ role: "dev", result: "done", ... })
+    DC->>GL: transition label "Doing" → "To Test"
    DC->>DC: deactivate worker (sessions preserved)
-    DC-->>DEV: { announcement: "✅ DEV done #42" }
+    DC-->>DEV: { announcement: "✅ DEV DONE #42" }

-    MS->>TG: "✅ DEV done #42 — moved to QA queue"
+    MS->>TG: "✅ DEV DONE #42 — moved to QA queue"
    TG->>H: sees announcement
 ```

@@ -208,16 +260,16 @@ On the **next DEV task** for this project that also assigns medior:
 sequenceDiagram
    participant MS as Main Session
    participant DC as DevClaw Plugin
-    participant CLI as openclaw agent CLI
+    participant CLI as openclaw gateway call agent
    participant DEV as DEV Session<br/>(medior, existing)

-    MS->>DC: task_pickup({ issueId: 57, role: "dev", model: "medior", ... })
-    DC->>DC: resolve tier "medior" → model ID
+    MS->>DC: work_start({ issueId: 57, role: "dev", level: "medior", ... })
+    DC->>DC: resolve level "medior" → model ID
    DC->>DC: lookup dev.sessions.medior → existing key!
    Note over DC: No sessions.patch needed — session already exists
-    DC->>CLI: openclaw agent --session-id <key> --message "Fix validation for #57..."
+    DC->>CLI: openclaw gateway call agent --params { sessionKey, message }
    CLI->>DEV: delivers task to existing session (has full codebase context)
-    DC-->>MS: { success: true, announcement: "⚡ DEV (medior) picking up #57" }
+    DC-->>MS: { success: true, announcement: "⚡ Sending DEV (medior) for #57" }
 ```

 Session reuse saves ~50K tokens per task by not re-reading the codebase.
@@ -228,149 +280,144 @@ This traces a single issue from creation to completion, showing every component

 ### Phase 1: Issue created

-Issues are created by the orchestrator agent or by sub-agent sessions via `glab`. The orchestrator can create issues based on user requests in Telegram, backlog planning, or QA feedback. Sub-agents can also create issues when they discover bugs or related work during development.
+Issues are created by the orchestrator agent or by sub-agent sessions via `task_create` or directly via `gh`/`glab`. The orchestrator can create issues based on user requests in Telegram, backlog planning, or QA feedback. Sub-agents can also create issues when they discover bugs during development.

 ```
-Orchestrator Agent → Issue Tracker: creates issue #42 with label "To Do"
+Orchestrator Agent → Issue Tracker: creates issue #42 with label "Planning"
 ```

-**State:** Issue tracker has issue #42 labeled "To Do". Nothing in DevClaw yet.
+**State:** Issue tracker has issue #42 labeled "Planning". Nothing in DevClaw yet.

 ### Phase 2: Heartbeat detects work

 ```
-Heartbeat triggers → Orchestrator calls queue_status()
+Heartbeat triggers → Orchestrator calls status()
 ```

 ```mermaid
 sequenceDiagram
    participant A as Orchestrator
-    participant QS as queue_status
+    participant QS as status
    participant GL as Issue Tracker
    participant PJ as projects.json
    participant AL as audit.log

-    A->>QS: queue_status({ projectGroupId: "-123" })
+    A->>QS: status({ projectGroupId: "-123" })
    QS->>PJ: readProjects()
    PJ-->>QS: { dev: idle, qa: idle }
-    QS->>GL: glab issue list --label "To Do"
+    QS->>GL: list issues by label "To Do"
    GL-->>QS: [{ id: 42, title: "Add login page" }]
-    QS->>GL: glab issue list --label "To Test"
+    QS->>GL: list issues by label "To Test"
    GL-->>QS: []
-    QS->>GL: glab issue list --label "To Improve"
+    QS->>GL: list issues by label "To Improve"
    GL-->>QS: []
-    QS->>AL: append { event: "queue_status", ... }
+    QS->>AL: append { event: "status", ... }
    QS-->>A: { dev: idle, queue: { toDo: [#42] } }
 ```

-**Orchestrator decides:** DEV is idle, issue #42 is in To Do → pick it up. Evaluates complexity → assigns medior tier.
+**Orchestrator decides:** DEV is idle, issue #42 is in To Do → pick it up. Evaluates complexity → assigns medior level.

 ### Phase 3: DEV pickup

-The plugin handles everything end-to-end — tier resolution, session lookup, label transition, state update, **and** task dispatch to the worker session. The agent's only job after is to post the announcement.
+The plugin handles everything end-to-end — level resolution, session lookup, label transition, state update, **and** task dispatch to the worker session. The agent's only job after is to post the announcement.

 ```mermaid
 sequenceDiagram
    participant A as Orchestrator
-    participant TP as task_pickup
+    participant WS as work_start
    participant GL as Issue Tracker
-    participant TIER as Tier Resolver
+    participant TIER as Level Resolver
    participant GW as Gateway RPC
-    participant CLI as openclaw agent CLI
+    participant CLI as openclaw gateway call agent
    participant PJ as projects.json
    participant AL as audit.log

-    A->>TP: task_pickup({ issueId: 42, role: "dev", projectGroupId: "-123", model: "medior" })
-    TP->>PJ: readProjects()
-    TP->>GL: glab issue view 42 --output json
-    GL-->>TP: { title: "Add login page", labels: ["To Do"] }
-    TP->>TP: Verify label is "To Do" ✓
-    TP->>TIER: resolve "medior" → "anthropic/claude-sonnet-4-5"
-    TP->>PJ: lookup dev.sessions.medior
-    TP->>GL: glab issue update 42 --unlabel "To Do" --label "Doing"
+    A->>WS: work_start({ issueId: 42, role: "dev", projectGroupId: "-123", level: "medior" })
+    WS->>PJ: readProjects()
+    WS->>GL: getIssue(42)
+    GL-->>WS: { title: "Add login page", labels: ["To Do"] }
+    WS->>WS: Verify label is "To Do"
+    WS->>TIER: resolve "medior" → "anthropic/claude-sonnet-4-5"
+    WS->>PJ: lookup dev.sessions.medior
+    WS->>GL: transitionLabel(42, "To Do", "Doing")
    alt New session
-        TP->>GW: sessions.patch({ key: new-key, model: "anthropic/claude-sonnet-4-5" })
+        WS->>GW: sessions.patch({ key: new-key, model: "anthropic/claude-sonnet-4-5" })
    end
-    TP->>CLI: openclaw agent --session-id <key> --message "task..."
-    TP->>PJ: activateWorker + store session key
-    TP->>AL: append task_pickup + model_selection
-    TP-->>A: { success: true, announcement: "🔧 ..." }
+    WS->>CLI: openclaw gateway call agent --params { sessionKey, message }
+    WS->>PJ: activateWorker + store session key
+    WS->>AL: append work_start + model_selection
+    WS-->>A: { success: true, announcement: "🔧 ..." }
 ```

 **Writes:**
 - `Issue Tracker`: label "To Do" → "Doing"
- `projects.json`: dev.active=true, dev.issueId="42", dev.model="medior", dev.sessions.medior=key
- `audit.log`: 2 entries (task_pickup, model_selection)
+- `projects.json`: dev.active=true, dev.issueId="42", dev.level="medior", dev.sessions.medior=key
+- `audit.log`: 2 entries (work_start, model_selection)
 - `Session`: task message delivered to worker session via CLI

 ### Phase 4: DEV works

 ```
 DEV sub-agent session → reads codebase, writes code, creates MR
-DEV sub-agent session → calls task_complete({ role: "dev", result: "done", ... })
+DEV sub-agent session → calls work_finish({ role: "dev", result: "done", ... })
 ```

-This happens inside the OpenClaw session. The worker calls `task_complete` directly for atomic state updates. If the worker discovers unrelated bugs, it calls `task_create` to file them.
+This happens inside the OpenClaw session. The worker calls `work_finish` directly for atomic state updates. If the worker discovers unrelated bugs, it calls `task_create` to file them.

 ### Phase 5: DEV complete (worker self-reports)

 ```mermaid
 sequenceDiagram
    participant DEV as DEV Session
-    participant TC as task_complete
+    participant WF as work_finish
    participant GL as Issue Tracker
    participant PJ as projects.json
    participant AL as audit.log
    participant REPO as Git Repo
-    participant QA as QA Session (auto-chain)
+    participant QA as QA Session

-    DEV->>TC: task_complete({ role: "dev", result: "done", projectGroupId: "-123", summary: "Login page with OAuth" })
-    TC->>PJ: readProjects()
-    PJ-->>TC: { dev: { active: true, issueId: "42" } }
-    TC->>REPO: git pull
-    TC->>PJ: deactivateWorker(-123, dev)
+    DEV->>WF: work_finish({ role: "dev", result: "done", projectGroupId: "-123", summary: "Login page with OAuth" })
+    WF->>PJ: readProjects()
+    PJ-->>WF: { dev: { active: true, issueId: "42" } }
+    WF->>REPO: git pull
+    WF->>PJ: deactivateWorker(-123, dev)
    Note over PJ: active→false, issueId→null<br/>sessions map PRESERVED
-    TC->>GL: transition label "Doing" → "To Test"
-    TC->>AL: append { event: "task_complete", role: "dev", result: "done" }
+    WF->>GL: transitionLabel "Doing" → "To Test"
+    WF->>AL: append { event: "work_finish", role: "dev", result: "done" }

-    alt autoChain enabled
-        TC->>GL: transition label "To Test" → "Testing"
-        TC->>QA: dispatchTask(role: "qa", tier: "qa")
-        TC->>PJ: activateWorker(-123, qa)
-        TC-->>DEV: { announcement: "✅ DEV done #42", autoChain: { dispatched: true, role: "qa" } }
-    else autoChain disabled
-        TC-->>DEV: { announcement: "✅ DEV done #42", nextAction: "qa_pickup" }
-    end
+    WF->>WF: tick queue (fill free slots)
+    Note over WF: Scheduler sees "To Test" issue, QA slot free → dispatches QA
+    WF-->>DEV: { announcement: "✅ DEV DONE #42", tickPickups: [...] }
 ```

 **Writes:**
 - `Git repo`: pulled latest (has DEV's merged code)
 - `projects.json`: dev.active=false, dev.issueId=null (sessions map preserved for reuse)
- `Issue Tracker`: label "Doing" → "To Test" (+ "To Test" → "Testing" if auto-chain)
- `audit.log`: 1 entry (task_complete) + optional auto-chain entries
+- `Issue Tracker`: label "Doing" → "To Test"
+- `audit.log`: 1 entry (work_finish) + tick entries if workers dispatched

 ### Phase 6: QA pickup

-Same as Phase 3, but with `role: "qa"`. Label transitions "To Test" → "Testing". Uses the qa tier.
+Same as Phase 3, but with `role: "qa"`. Label transitions "To Test" → "Testing". Uses the reviewer level.

-### Phase 7: QA result (3 possible outcomes)
+### Phase 7: QA result (4 possible outcomes)

 #### 7a. QA Pass

 ```mermaid
 sequenceDiagram
-    participant A as Orchestrator
-    participant TC as task_complete
+    participant QA as QA Session
+    participant WF as work_finish
    participant GL as Issue Tracker
    participant PJ as projects.json
    participant AL as audit.log

-    A->>TC: task_complete({ role: "qa", result: "pass", projectGroupId: "-123" })
-    TC->>PJ: deactivateWorker(-123, qa)
-    TC->>GL: glab issue update 42 --unlabel "Testing" --label "Done"
-    TC->>GL: glab issue close 42
-    TC->>AL: append { event: "task_complete", role: "qa", result: "pass" }
-    TC-->>A: { announcement: "🎉 QA PASS #42. Issue closed." }
+    QA->>WF: work_finish({ role: "qa", result: "pass", projectGroupId: "-123" })
+    WF->>PJ: deactivateWorker(-123, qa)
+    WF->>GL: transitionLabel(42, "Testing", "Done")
+    WF->>GL: closeIssue(42)
+    WF->>AL: append { event: "work_finish", role: "qa", result: "pass" }
+    WF-->>QA: { announcement: "🎉 QA PASS #42. Issue closed." }
 ```

 **Ticket complete.** Issue closed, label "Done".
@@ -379,18 +426,18 @@ sequenceDiagram

 ```mermaid
 sequenceDiagram
-    participant A as Orchestrator
-    participant TC as task_complete
+    participant QA as QA Session
+    participant WF as work_finish
    participant GL as Issue Tracker
    participant PJ as projects.json
    participant AL as audit.log

-    A->>TC: task_complete({ role: "qa", result: "fail", projectGroupId: "-123", summary: "OAuth redirect broken" })
-    TC->>PJ: deactivateWorker(-123, qa)
-    TC->>GL: glab issue update 42 --unlabel "Testing" --label "To Improve"
-    TC->>GL: glab issue reopen 42
-    TC->>AL: append { event: "task_complete", role: "qa", result: "fail" }
-    TC-->>A: { announcement: "❌ QA FAIL #42 — OAuth redirect broken. Sent back to DEV." }
+    QA->>WF: work_finish({ role: "qa", result: "fail", projectGroupId: "-123", summary: "OAuth redirect broken" })
+    WF->>PJ: deactivateWorker(-123, qa)
+    WF->>GL: transitionLabel(42, "Testing", "To Improve")
+    WF->>GL: reopenIssue(42)
+    WF->>AL: append { event: "work_finish", role: "qa", result: "fail" }
+    WF-->>QA: { announcement: "❌ QA FAIL #42 — OAuth redirect broken. Sent back to DEV." }
 ```

 **Cycle restarts:** Issue goes to "To Improve". Next heartbeat, DEV picks it up again (Phase 3, but from "To Improve" instead of "To Do").
@@ -410,43 +457,39 @@ DEV Blocked: "Doing" → "To Do"
 QA Blocked:  "Testing" → "To Test"
 ```

-Worker cannot complete (missing info, environment errors, etc.). Issue returns to queue for retry. No auto-chain — the task is available for the next heartbeat pickup.
+Worker cannot complete (missing info, environment errors, etc.). Issue returns to queue for retry. The task is available for the next heartbeat pickup.

 ### Completion enforcement

-Three layers guarantee that `task_complete` always runs:
+Three layers guarantee that `work_finish` always runs:

-1. **Completion contract** — Every task message sent to a worker session includes a mandatory `## MANDATORY: Task Completion` section listing available results and requiring `task_complete` even on failure. Workers are instructed to use `"blocked"` if stuck.
+1. **Completion contract** — Every task message sent to a worker session includes a mandatory `## MANDATORY: Task Completion` section listing available results and requiring `work_finish` even on failure. Workers are instructed to use `"blocked"` if stuck.

 2. **Blocked result** — Both DEV and QA can use `"blocked"` to gracefully return a task to queue without losing work. DEV blocked: `Doing → To Do`. QA blocked: `Testing → To Test`. This gives workers an escape hatch instead of silently dying.

-3. **Stale worker watchdog** — The heartbeat's health check detects workers active for >2 hours. With `autoFix=true`, it deactivates the worker and reverts the label back to queue. This catches sessions that crashed, ran out of context, or otherwise failed without calling `task_complete`. The `session_health` tool provides the same check for manual invocation.
+3. **Stale worker watchdog** — The heartbeat's health check detects workers active for >2 hours. With `fix=true`, it deactivates the worker and reverts the label back to queue. This catches sessions that crashed, ran out of context, or otherwise failed without calling `work_finish`. The `health` tool provides the same check for manual invocation.

 ### Phase 8: Heartbeat (continuous)

-The heartbeat runs periodically (triggered by the agent or a scheduled message). It combines health check + queue scan:
+The heartbeat runs periodically (via background service or manual `work_heartbeat` trigger). It combines health check + queue scan:

 ```mermaid
 sequenceDiagram
-    participant A as Orchestrator
-    participant SH as session_health
-    participant QS as queue_status
-    participant TP as task_pickup
-    Note over A: Heartbeat triggered
+    participant HB as Heartbeat Service
+    participant SH as health check
+    participant TK as projectTick
+    participant WS as work_start (dispatch)
+    Note over HB: Tick triggered (every 60s)

-    A->>SH: session_health({ autoFix: true })
-    Note over SH: Checks sessions via Gateway RPC (sessions.list)
-    SH-->>A: { healthy: true }
+    HB->>SH: checkWorkerHealth per project per role
+    Note over SH: Checks for zombies, stale workers
+    SH-->>HB: { fixes applied }

-    A->>QS: queue_status()
-    QS-->>A: { projects: [{ dev: idle, queue: { toDo: [#43], toTest: [#44] } }] }
-
-    Note over A: DEV idle + To Do #43 → assign medior
-    A->>TP: task_pickup({ issueId: 43, role: "dev", model: "medior", ... })
-    Note over TP: Plugin handles everything:<br/>tier resolve → session lookup →<br/>label transition → dispatch task →<br/>state update → audit log
-
-    Note over A: QA idle + To Test #44 → assign qa
-    A->>TP: task_pickup({ issueId: 44, role: "qa", model: "qa", ... })
+    HB->>TK: projectTick per project
+    Note over TK: Scans queue: To Improve > To Test > To Do
+    TK->>WS: dispatchTask (fill free slots)
+    WS-->>TK: { dispatched }
+    TK-->>HB: { pickups, skipped }
 ```

 ## Data flow map
@@ -455,25 +498,27 @@ Every piece of data and where it lives:

 ```
 ┌─────────────────────────────────────────────────────────────────┐
-│ Issue Tracker (source of truth for tasks)                        │
+│ Issue Tracker (source of truth for tasks)                       │
 │                                                                 │
 │  Issue #42: "Add login page"                                    │
-│  Labels: [To Do | Doing | To Test | Testing | Done | ...]       │
+│  Labels: [Planning | To Do | Doing | To Test | Testing | ...]   │
 │  State: open / closed                                           │
 │  MRs/PRs: linked merge/pull requests                            │
 │  Created by: orchestrator (task_create), workers, or humans     │
 └─────────────────────────────────────────────────────────────────┘
-        ↕ glab/gh CLI (read/write, auto-detected)
+        ↕ gh/glab CLI (read/write, auto-detected)
 ┌─────────────────────────────────────────────────────────────────┐
 │ DevClaw Plugin (orchestration logic)                            │
 │                                                                 │
-│  devclaw_setup  → agent creation + workspace + model config    │
-│  task_pickup    → tier + label + dispatch + role instr (e2e)   │
-│  task_complete  → label + state + git pull + auto-chain        │
-│  task_create    → create issue in tracker                      │
-│  queue_status   → read labels + read state                     │
-│  session_health → check sessions + fix zombies                 │
-│  project_register → labels + prompts + state init (one-time)   │
+│  setup          → agent creation + workspace + model config     │
+│  work_start     → level + label + dispatch + role instr (e2e)   │
+│  work_finish    → label + state + git pull + tick queue          │
+│  task_create    → create issue in tracker                       │
+│  task_update    → manual label state change                     │
+│  task_comment   → add comment to issue                          │
+│  status         → read labels + read state                      │
+│  health         → check sessions + fix zombies                  │
+│  project_register → labels + prompts + state init (one-time)    │
 └─────────────────────────────────────────────────────────────────┘
        ↕ atomic file I/O          ↕ OpenClaw CLI (plugin shells out)
 ┌────────────────────────────────┐ ┌──────────────────────────────┐
@@ -481,39 +526,40 @@ Every piece of data and where it lives:
 │                                │ │ (called by plugin, not agent)│
 │  Per project:                  │ │                              │
 │    dev:                        │ │  openclaw gateway call       │
-│      active, issueId, model    │ │    sessions.patch → create   │
+│      active, issueId, level    │ │    sessions.patch → create   │
 │      sessions:                 │ │    sessions.list  → health   │
 │        junior: <key>           │ │    sessions.delete → cleanup │
 │        medior: <key>           │ │                              │
-│        senior: <key>           │ │  openclaw agent              │
-│    qa:                         │ │    --session-id <key>        │
-│      active, issueId, model    │ │    --message "task..."       │
+│        senior: <key>           │ │  openclaw gateway call agent │
+│    qa:                         │ │    --params { sessionKey,    │
+│      active, issueId, level    │ │      message, agentId }      │
 │      sessions:                 │ │    → dispatches to session   │
-│        qa: <key>               │ │                              │
+│        reviewer: <key>         │ │                              │
+│        tester: <key>           │ │                              │
 └────────────────────────────────┘ └──────────────────────────────┘
        ↕ append-only
 ┌─────────────────────────────────────────────────────────────────┐
 │ log/audit.log (observability)                                   │
 │                                                                 │
 │  NDJSON, one line per event:                                    │
-│  task_pickup, task_complete, model_selection,                   │
-│  queue_status, health_check, session_spawn, session_reuse,     │
-│  project_register, devclaw_setup                                │
+│  work_start, work_finish, model_selection,                      │
+│  status, health, task_create, task_update,                      │
+│  task_comment, project_register, setup, heartbeat_tick          │
 │                                                                 │
-│  Query with: cat audit.log | jq 'select(.event=="task_pickup")' │
+│  Query: cat audit.log | jq 'select(.event=="work_start")'      │
 └─────────────────────────────────────────────────────────────────┘

 ┌─────────────────────────────────────────────────────────────────┐
-│ Telegram (user-facing messages)                                 │
+│ Telegram / WhatsApp (user-facing messages)                      │
 │                                                                 │
 │  Per group chat:                                                │
-│    "🔧 Spawning DEV (medior) for #42: Add login page"           │
+│    "🔧 Spawning DEV (medior) for #42: Add login page"          │
 │    "⚡ Sending DEV (medior) for #57: Fix validation"            │
-│    "✅ DEV done #42 — Login page with OAuth. Moved to QA queue."│
+│    "✅ DEV DONE #42 — Login page with OAuth."                   │
 │    "🎉 QA PASS #42. Issue closed."                              │
-│    "❌ QA FAIL #42 — OAuth redirect broken. Sent back to DEV."  │
-│    "🚫 DEV BLOCKED #42 — Missing dependencies. Returned to queue."│
-│    "🚫 QA BLOCKED #42 — Env not available. Returned to QA queue."│
+│    "❌ QA FAIL #42 — OAuth redirect broken."                    │
+│    "🚫 DEV BLOCKED #42 — Missing dependencies."                │
+│    "🚫 QA BLOCKED #42 — Env not available."                    │
 └─────────────────────────────────────────────────────────────────┘

 ┌─────────────────────────────────────────────────────────────────┐
@@ -521,7 +567,7 @@ Every piece of data and where it lives:
 │                                                                 │
 │  DEV sub-agent sessions: read code, write code, create MRs      │
 │  QA sub-agent sessions: read code, run tests, review MRs        │
-│  task_complete (DEV done): git pull to sync latest               │
+│  work_finish (DEV done): git pull to sync latest                │
 └─────────────────────────────────────────────────────────────────┘
 ```

@@ -537,7 +583,7 @@ graph LR
        PR[Project registration]
        SETUP[Agent + workspace setup]
        SD[Session dispatch<br/>create + send via CLI]
-        AC[Auto-chaining<br/>DEV→QA, QA fail→DEV]
+        AC[Scheduling<br/>tick queue after work_finish]
        RI[Role instructions<br/>loaded per project]
        A[Audit logging]
        Z[Zombie cleanup]
@@ -553,7 +599,7 @@ graph LR
    subgraph "Sub-agent sessions handle"
        CR[Code writing]
        MR[MR creation/review]
-        TC_W[Task completion<br/>via task_complete]
+        WF_W[Task completion<br/>via work_finish]
        BUG[Bug filing<br/>via task_create]
    end

@@ -565,20 +611,22 @@ graph LR

 ## IssueProvider abstraction

-All issue tracker operations go through the `IssueProvider` interface, defined in `lib/issue-provider.ts`. This abstraction allows DevClaw to support multiple issue trackers without changing tool logic.
+All issue tracker operations go through the `IssueProvider` interface, defined in `lib/providers/provider.ts`. This abstraction allows DevClaw to support multiple issue trackers without changing tool logic.

 **Interface methods:**
 - `ensureLabel` / `ensureAllStateLabels` — idempotent label creation
+- `createIssue` — create issue with label and assignees
 - `listIssuesByLabel` / `getIssue` — issue queries
 - `transitionLabel` — atomic label state transition (unlabel + label)
 - `closeIssue` / `reopenIssue` — issue lifecycle
 - `hasStateLabel` / `getCurrentStateLabel` — label inspection
- `hasMergedMR` — MR/PR verification
+- `hasMergedMR` / `getMergedMRUrl` — MR/PR verification
+- `addComment` — add comment to issue
 - `healthCheck` — verify provider connectivity

 **Current providers:**
- **GitLab** (`lib/providers/gitlab.ts`) — wraps `glab` CLI
 - **GitHub** (`lib/providers/github.ts`) — wraps `gh` CLI
+- **GitLab** (`lib/providers/gitlab.ts`) — wraps `glab` CLI

 **Planned providers:**
 - **Jira** — via REST API
@@ -589,16 +637,16 @@ Provider selection is handled by `createProvider()` in `lib/providers/index.ts`.

 | Failure | Detection | Recovery |
 |---|---|---|
-| Session dies mid-task | `session_health` checks via `sessions.list` Gateway RPC | `autoFix`: reverts label, clears active state, removes dead session from sessions map. Next heartbeat picks up task again (creates fresh session for that tier). |
-| glab command fails | Plugin tool throws error, returns to agent | Agent retries or reports to Telegram group |
-| `openclaw agent` CLI fails | Plugin catches error during dispatch | Plugin rolls back: reverts label, clears active state. Returns error to agent for reporting. |
-| `sessions.patch` fails | Plugin catches error during session creation | Plugin rolls back label transition. Returns error. No orphaned state. |
+| Session dies mid-task | `health` checks via `sessions.list` Gateway RPC | `fix=true`: reverts label, clears active state. Next heartbeat picks up task again (creates fresh session for that level). |
+| gh/glab command fails | Plugin tool throws error, returns to agent | Agent retries or reports to Telegram group |
+| `openclaw gateway call agent` fails | Plugin catches error during dispatch | Plugin rolls back: reverts label, clears active state. Returns error. No orphaned state. |
+| `sessions.patch` fails | Plugin catches error during session creation | Plugin rolls back label transition. Returns error. |
 | projects.json corrupted | Tool can't parse JSON | Manual fix needed. Atomic writes (temp+rename) prevent partial writes. |
-| Label out of sync | `task_pickup` verifies label before transitioning | Throws error if label doesn't match expected state. Agent reports mismatch. |
-| Worker already active | `task_pickup` checks `active` flag | Throws error: "DEV worker already active on project". Must complete current task first. |
-| Stale worker (>2h) | `session_health` and heartbeat health check | `autoFix`: deactivates worker, reverts label to queue (To Do / To Test). Task available for next pickup. |
-| Worker stuck/blocked | Worker calls `task_complete` with `"blocked"` | Deactivates worker, reverts label to queue. Issue available for retry. |
-| `project_register` fails | Plugin catches error during label creation or state write | Clean error returned. No partial state — labels are idempotent, projects.json not written until all labels succeed. |
+| Label out of sync | `work_start` verifies label before transitioning | Throws error if label doesn't match expected state. |
+| Worker already active | `work_start` checks `active` flag | Throws error: "DEV already active on project". Must complete current task first. |
+| Stale worker (>2h) | `health` and heartbeat health check | `fix=true`: deactivates worker, reverts label to queue. Task available for next pickup. |
+| Worker stuck/blocked | Worker calls `work_finish` with `"blocked"` | Deactivates worker, reverts label to queue. Issue available for retry. |
+| `project_register` fails | Plugin catches error during label creation or state write | Clean error returned. Labels are idempotent, projects.json not written until all labels succeed. |

 ## File locations

@@ -606,8 +654,9 @@ Provider selection is handled by `createProvider()` in `lib/providers/index.ts`.
 |---|---|---|
 | Plugin source | `~/.openclaw/extensions/devclaw/` | Plugin code |
 | Plugin manifest | `~/.openclaw/extensions/devclaw/openclaw.plugin.json` | Plugin registration |
-| Agent config | `~/.openclaw/openclaw.json` | Agent definition + tool permissions + tier config |
+| Agent config | `~/.openclaw/openclaw.json` | Agent definition + tool permissions + model config |
 | Worker state | `~/.openclaw/workspace-<agent>/projects/projects.json` | Per-project DEV/QA state |
+| Role instructions | `~/.openclaw/workspace-<agent>/projects/roles/<project>/` | Per-project `dev.md` and `qa.md` |
 | Audit log | `~/.openclaw/workspace-<agent>/log/audit.log` | NDJSON event log |
 | Session transcripts | `~/.openclaw/agents/<agent>/sessions/<uuid>.jsonl` | Conversation history per session |
 | Git repos | `~/git/<project>/` | Project source code |
--- a/docs/CONFIGURATION.md
+++ b/docs/CONFIGURATION.md
@@ -0,0 +1,354 @@
+# DevClaw — Configuration Reference
+
+All DevClaw configuration lives in two places: `openclaw.json` (plugin-level settings) and `projects.json` (per-project state).
+
+## Plugin Configuration (`openclaw.json`)
+
+DevClaw is configured under `plugins.entries.devclaw.config` in `openclaw.json`.
+
+### Model Tiers
+
+Override which LLM model powers each developer level:
+
+```json
+{
+  "plugins": {
+    "entries": {
+      "devclaw": {
+        "config": {
+          "models": {
+            "dev": {
+              "junior": "anthropic/claude-haiku-4-5",
+              "medior": "anthropic/claude-sonnet-4-5",
+              "senior": "anthropic/claude-opus-4-5"
+            },
+            "qa": {
+              "reviewer": "anthropic/claude-sonnet-4-5",
+              "tester": "anthropic/claude-haiku-4-5"
+            }
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+**Resolution order** (per `lib/tiers.ts:resolveModel`):
+
+1. Plugin config `models.<role>.<level>` — explicit override
+2. `DEFAULT_MODELS[role][level]` — built-in defaults (table below)
+3. Passthrough — treat the level string as a raw model ID
+
+**Default models:**
+
+| Role | Level | Default model |
+|---|---|---|
+| dev | junior | `anthropic/claude-haiku-4-5` |
+| dev | medior | `anthropic/claude-sonnet-4-5` |
+| dev | senior | `anthropic/claude-opus-4-5` |
+| qa | reviewer | `anthropic/claude-sonnet-4-5` |
+| qa | tester | `anthropic/claude-haiku-4-5` |
+
+### Project Execution Mode
+
+Controls cross-project parallelism:
+
+```json
+{
+  "plugins": {
+    "entries": {
+      "devclaw": {
+        "config": {
+          "projectExecution": "parallel"
+        }
+      }
+    }
+  }
+}
+```
+
+| Value | Behavior |
+|---|---|
+| `"parallel"` (default) | Multiple projects can have active workers simultaneously |
+| `"sequential"` | Only one project's workers active at a time. Useful for single-agent deployments. |
+
+Enforced in `work_heartbeat` and the heartbeat service before dispatching.
+
+### Heartbeat Service
+
+Token-free interval-based health checks + queue dispatch:
+
+```json
+{
+  "plugins": {
+    "entries": {
+      "devclaw": {
+        "config": {
+          "work_heartbeat": {
+            "enabled": true,
+            "intervalSeconds": 60,
+            "maxPickupsPerTick": 4
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+| Setting | Type | Default | Description |
+|---|---|---|---|
+| `enabled` | boolean | `true` | Enable the heartbeat service |
+| `intervalSeconds` | number | `60` | Seconds between ticks |
+| `maxPickupsPerTick` | number | `4` | Maximum worker dispatches per tick (budget control) |
+
+**Source:** [`lib/services/heartbeat.ts`](../lib/services/heartbeat.ts)
+
+The heartbeat service runs as a plugin service tied to the gateway lifecycle. Every tick: health pass (auto-fix zombies, stale workers) → tick pass (fill free slots by priority). Zero LLM tokens consumed.
+
+### Notifications
+
+Control which lifecycle events send notifications:
+
+```json
+{
+  "plugins": {
+    "entries": {
+      "devclaw": {
+        "config": {
+          "notifications": {
+            "heartbeatDm": true,
+            "workerStart": true,
+            "workerComplete": true
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+| Setting | Default | Description |
+|---|---|---|
+| `heartbeatDm` | `true` | Send heartbeat summary to orchestrator DM |
+| `workerStart` | `true` | Announce when a worker picks up a task |
+| `workerComplete` | `true` | Announce when a worker finishes a task |
+
+### DevClaw Agent IDs
+
+List which agents are recognized as DevClaw orchestrators (used for context detection):
+
+```json
+{
+  "plugins": {
+    "entries": {
+      "devclaw": {
+        "config": {
+          "devClawAgentIds": ["my-orchestrator"]
+        }
+      }
+    }
+  }
+}
+```
+
+### Agent Tool Permissions
+
+Restrict DevClaw tools to your orchestrator agent:
+
+```json
+{
+  "agents": {
+    "list": [
+      {
+        "id": "my-orchestrator",
+        "tools": {
+          "allow": [
+            "work_start",
+            "work_finish",
+            "task_create",
+            "task_update",
+            "task_comment",
+            "status",
+            "health",
+            "work_heartbeat",
+            "project_register",
+            "setup",
+            "onboard"
+          ]
+        }
+      }
+    ]
+  }
+}
+```
+
+---
+
+## Project State (`projects.json`)
+
+All project state lives in `<workspace>/projects/projects.json`, keyed by group ID.
+
+**Source:** [`lib/projects.ts`](../lib/projects.ts)
+
+### Schema
+
+```json
+{
+  "projects": {
+    "<groupId>": {
+      "name": "my-webapp",
+      "repo": "~/git/my-webapp",
+      "groupName": "Dev - My Webapp",
+      "baseBranch": "development",
+      "deployBranch": "development",
+      "deployUrl": "https://my-webapp.example.com",
+      "channel": "telegram",
+      "roleExecution": "parallel",
+      "dev": {
+        "active": false,
+        "issueId": null,
+        "startTime": null,
+        "level": null,
+        "sessions": {
+          "junior": null,
+          "medior": "agent:orchestrator:subagent:my-webapp-dev-medior",
+          "senior": null
+        }
+      },
+      "qa": {
+        "active": false,
+        "issueId": null,
+        "startTime": null,
+        "level": null,
+        "sessions": {
+          "reviewer": "agent:orchestrator:subagent:my-webapp-qa-reviewer",
+          "tester": null
+        }
+      }
+    }
+  }
+}
+```
+
+### Project fields
+
+| Field | Type | Description |
+|---|---|---|
+| `name` | string | Short project name |
+| `repo` | string | Path to git repo (supports `~/` expansion) |
+| `groupName` | string | Group display name |
+| `baseBranch` | string | Base branch for development |
+| `deployBranch` | string | Branch that triggers deployment |
+| `deployUrl` | string | Deployment URL |
+| `channel` | string | Messaging channel (`"telegram"`, `"whatsapp"`, etc.) |
+| `roleExecution` | `"parallel"` \| `"sequential"` | DEV/QA parallelism for this project |
+
+### Worker state fields
+
+Each project has `dev` and `qa` worker state objects:
+
+| Field | Type | Description |
+|---|---|---|
+| `active` | boolean | Whether this role has an active worker |
+| `issueId` | string \| null | Issue being worked on (as string) |
+| `startTime` | string \| null | ISO timestamp when worker became active |
+| `level` | string \| null | Current level (`junior`, `medior`, `senior`, `reviewer`, `tester`) |
+| `sessions` | Record<string, string \| null> | Per-level session keys |
+
+**DEV session keys:** `junior`, `medior`, `senior`
+**QA session keys:** `reviewer`, `tester`
+
+### Key design decisions
+
+- **Session-per-level** — each level gets its own worker session, accumulating context independently. Level selection maps directly to a session key.
+- **Sessions preserved on completion** — when a worker completes a task, the sessions map is preserved (only `active`, `issueId`, and `startTime` are cleared). This enables session reuse.
+- **Atomic writes** — all writes go through temp-file-then-rename to prevent corruption.
+- **Sessions persist indefinitely** — no auto-cleanup. The `health` tool handles manual cleanup.
+
+---
+
+## Workspace File Layout
+
+```
+<workspace>/
+├── projects/
+│   ├── projects.json          ← Project state (auto-managed)
+│   └── roles/
+│       ├── my-webapp/         ← Per-project role instructions (editable)
+│       │   ├── dev.md
+│       │   └── qa.md
+│       ├── another-project/
+│       │   ├── dev.md
+│       │   └── qa.md
+│       └── default/           ← Fallback role instructions
+│           ├── dev.md
+│           └── qa.md
+├── log/
+│   └── audit.log              ← NDJSON event log (auto-managed)
+├── AGENTS.md                  ← Agent identity documentation
+└── HEARTBEAT.md               ← Heartbeat operation guide
+```
+
+### Role instruction files
+
+`work_start` loads role instructions from `projects/roles/<project>/<role>.md` at dispatch time, falling back to `projects/roles/default/<role>.md`. These files are appended to the task message sent to worker sessions.
+
+Edit to customize: deployment steps, test commands, acceptance criteria, coding standards.
+
+**Source:** [`lib/dispatch.ts:loadRoleInstructions`](../lib/dispatch.ts)
+
+---
+
+## Audit Log
+
+Append-only NDJSON at `<workspace>/log/audit.log`. Auto-truncated to 250 lines.
+
+**Source:** [`lib/audit.ts`](../lib/audit.ts)
+
+### Event types
+
+| Event | Trigger |
+|---|---|
+| `work_start` | Task dispatched to worker |
+| `model_selection` | Level resolved to model ID |
+| `work_finish` | Task completed |
+| `work_heartbeat` | Heartbeat tick completed |
+| `task_create` | Issue created |
+| `task_update` | Issue state changed |
+| `task_comment` | Comment added to issue |
+| `status` | Queue status queried |
+| `health` | Health scan completed |
+| `heartbeat_tick` | Heartbeat service tick (background) |
+| `project_register` | Project registered |
+
+### Querying
+
+```bash
+# All task dispatches
+cat audit.log | jq 'select(.event=="work_start")'
+
+# All completions for a project
+cat audit.log | jq 'select(.event=="work_finish" and .project=="my-webapp")'
+
+# Model selections
+cat audit.log | jq 'select(.event=="model_selection")'
+```
+
+---
+
+## Issue Provider
+
+DevClaw uses an `IssueProvider` interface (`lib/providers/provider.ts`) to abstract issue tracker operations. The provider is auto-detected from the git remote URL.
+
+**Supported providers:**
+
+| Provider | CLI | Detection |
+|---|---|---|
+| GitHub | `gh` | Remote contains `github.com` |
+| GitLab | `glab` | Remote contains `gitlab` |
+
+**Planned:** Jira (via REST API)
+
+**Source:** [`lib/providers/index.ts`](../lib/providers/index.ts)
--- a/docs/CONTEXT-AWARENESS.md
+++ b/docs/CONTEXT-AWARENESS.md
@@ -1,6 +1,6 @@
-# Context-Aware DevClaw
+# DevClaw — Context Awareness

-DevClaw now adapts its behavior based on how you interact with it.
+DevClaw adapts its behavior based on how you interact with it.

 ## Design Philosophy

@@ -12,170 +12,122 @@ DevClaw enforces strict boundaries between projects:
 - Project work happens **inside that project's group**
 - Setup and configuration happen **outside project groups**

-This design prevents:
- ❌ Cross-project contamination (workers picking up wrong project's tasks)
- ❌ Confusion about which project you're working on
- ❌ Accidental registration of wrong groups
- ❌ Setup discussions cluttering project work channels
+This prevents:
+- Cross-project contamination (workers picking up wrong project's tasks)
+- Confusion about which project you're working on
+- Accidental registration of wrong groups
+- Setup discussions cluttering project work channels

 This enables:
- ✅ Clear mental model: "This group = this project"
- ✅ Isolated work streams: Each project progresses independently
- ✅ Dedicated teams: Workers focus on one project at a time
- ✅ Clean separation: Setup vs. operational work
+- Clear mental model: "This group = this project"
+- Isolated work streams: Each project progresses independently
+- Dedicated teams: Workers focus on one project at a time
+- Clean separation: Setup vs. operational work

 ## Three Interaction Contexts

-### 1. **Via Another Agent** (Setup Mode)
-When you talk to your main agent (like Henk) about DevClaw:
- ✅ Use: `devclaw_onboard`, `devclaw_setup`
- ❌ Avoid: `task_pickup`, `queue_status` (operational tools)
+### 1. Via Another Agent (Setup Mode)
+
+When you talk to your main agent about DevClaw:
+- Use: `onboard`, `setup`
+- Avoid: `work_start`, `status` (operational tools)

 **Example:**
 ```
-User → Henk: "Can you help me set up DevClaw?"
-Henk → Calls devclaw_onboard
+User → Main Agent: "Can you help me set up DevClaw?"
+Main Agent → Calls onboard
 ```

-### 2. **Direct Message to DevClaw Agent**
+### 2. Direct Message to DevClaw Agent
+
 When you DM the DevClaw agent directly on Telegram/WhatsApp:
- ✅ Use: `queue_status` (all projects), `session_health` (system overview)
- ❌ Avoid: `task_pickup` (project-specific work), setup tools
+- Use: `status` (all projects), `health` (system overview)
+- Avoid: `work_start` (project-specific work), setup tools

 **Example:**
 ```
 User → DevClaw DM: "Show me the status of all projects"
-DevClaw → Calls queue_status (shows all projects)
+DevClaw → Calls status (shows all projects)
 ```

-### 3. **Project Group Chat**
+### 3. Project Group Chat
+
 When you message in a Telegram/WhatsApp group bound to a project:
- ✅ Use: `task_pickup`, `task_complete`, `task_create`, `queue_status` (auto-filtered)
- ❌ Avoid: Setup tools, system-wide queries
+- Use: `work_start`, `work_finish`, `task_create`, `status` (auto-filtered)
+- Avoid: Setup tools, system-wide queries

 **Example:**
 ```
-User → OpenClaw Dev Group: "@henk pick up issue #42"
-DevClaw → Calls task_pickup (only works in groups)
+User → Project Group: "pick up issue #42"
+DevClaw → Calls work_start (only works in groups)
 ```

 ## How It Works

 ### Context Detection
+
 Each tool automatically detects:
- **Agent ID** - Is this the DevClaw agent or another agent?
- **Message Channel** - Telegram, WhatsApp, or CLI?
- **Session Key** - Is this a group chat or direct message?
+- **Agent ID** — Is this the DevClaw agent or another agent?
+- **Message Channel** — Telegram, WhatsApp, or CLI?
+- **Session Key** — Is this a group chat or direct message?
  - Format: `agent:{agentId}:{channel}:{type}:{id}`
  - Telegram group: `agent:devclaw:telegram:group:-5266044536`
  - WhatsApp group: `agent:devclaw:whatsapp:group:120363123@g.us`
  - DM: `agent:devclaw:telegram:user:657120585`
- **Project Binding** - Which project is this group bound to?
+- **Project Binding** — Which project is this group bound to?

 ### Guardrails
+
 Tools include context-aware guidance in their responses:
 ```json
 {
-  "contextGuidance": "🛡️ Context: Project Group Chat (telegram)\n
-    You're in a Telegram group for project 'openclaw-core'.\n
-    Use task_pickup, task_complete for project work.",
+  "contextGuidance": "Context: Project Group Chat (telegram)\n    You're in a Telegram group for project 'my-webapp'.\n    Use work_start, work_finish for project work.",
  ...
 }
 ```

-## Integrated Tools
+## Tool Context Requirements

-### ✅ `devclaw_onboard`
- **Works best:** Via another agent or direct DM
- **Blocks:** Group chats (setup shouldn't happen in project groups)
+| Tool | Group chat | Direct DM | Via agent |
+|---|---|---|---|
+| `onboard` | Blocked | Works | Works |
+| `setup` | Works | Works | Works |
+| `work_start` | Works | Blocked | Blocked |
+| `work_finish` | Works | Works | Works |
+| `task_create` | Works | Works | Works |
+| `task_update` | Works | Works | Works |
+| `task_comment` | Works | Works | Works |
+| `status` | Auto-filtered | All projects | Suggests onboard |
+| `health` | Auto-filtered | All projects | Works |
+| `work_heartbeat` | Single project | All projects | Works |
+| `project_register` | Works (required) | Blocked | Blocked |

-### ✅ `queue_status`
- **Group context:** Auto-filters to that project
- **Direct context:** Shows all projects
- **Via-agent context:** Suggests using devclaw_onboard instead
-
-### ✅ `task_pickup`
- **ONLY works:** In project group chats
- **Blocks:** Direct DMs and setup conversations
-
-### ✅ `project_register`
- **ONLY works:** In the Telegram/WhatsApp group you're registering
- **Blocks:** Direct DMs and via-agent conversations
- **Auto-detects:** Group ID from current chat (projectGroupId parameter now optional)
-
-**Why this matters:**
- **Project Isolation**: Each group = one project = one dedicated team
- **Clear Boundaries**: Forces deliberate project registration from within the project's space
- **Team Clarity**: You're physically in the group when binding it, making the connection explicit
- **No Mistakes**: Impossible to accidentally register the wrong group when you're in it
- **Natural Workflow**: "This group is for Project X" → register Project X here
-
-## Testing
-
-### Debug Tool
-Use `context_test` to see what context is detected:
-```
-# In any context:
-context_test
-
-# Returns:
-{
-  "detectedContext": { "type": "group", "projectName": "openclaw-core" },
-  "guardrails": "🛡️ Context: Project Group Chat..."
-}
-```
-
-### Manual Testing
-1. **Setup Mode:** Message your main agent → "Help me configure DevClaw"
-2. **Status Check:** DM DevClaw agent (Telegram/WhatsApp) → "Show me the queue"
-3. **Project Work:** Post in project group (Telegram/WhatsApp) → "@henk pick up #42"
-
-Each context should trigger different guardrails.
-
-## Configuration
-
-Add to `~/.openclaw/openclaw.json`:
-```json
-"plugins": {
-  "entries": {
-    "devclaw": {
-      "config": {
-        "devClawAgentIds": ["henk-development", "devclaw-test"],
-        "models": { ... }
-      }
-    }
-  }
-}
-```
-
-The `devClawAgentIds` array lists which agents are DevClaw orchestrators.
-
-## Implementation Details
-
- **Module:** [lib/context-guard.ts](../lib/context-guard.ts)
- **Tests:** [tests/unit/context-guard.test.ts](../tests/unit/context-guard.test.ts) (15 passing)
- **Integrated tools:** 4 key tools (`devclaw_onboard`, `queue_status`, `task_pickup`, `project_register`)
- **Detection logic:** Checks agentId, messageChannel, sessionKey pattern matching
+**Why `project_register` requires group context:**
+- Forces deliberate project registration from within the project's space
+- You're physically in the group when binding it, making the connection explicit
+- Impossible to accidentally register the wrong group

 ## WhatsApp Support

-DevClaw **fully supports WhatsApp** groups with the same architecture as Telegram:
+DevClaw fully supports WhatsApp groups with the same architecture as Telegram:

- ✅ WhatsApp group detection via `sessionKey.includes("@g.us")`
- ✅ Projects keyed by WhatsApp group ID (e.g., `"120363123@g.us"`)
- ✅ Context-aware tools work identically for both channels
- ✅ One project = one group (Telegram OR WhatsApp)
+- WhatsApp group detection via `sessionKey.includes("@g.us")`
+- Projects keyed by WhatsApp group ID (e.g., `"120363123@g.us"`)
+- Context-aware tools work identically for both channels
+- One project = one group (Telegram OR WhatsApp)

 **To register a WhatsApp project:**
 1. Go to the WhatsApp group chat
 2. Call `project_register` from within the group
 3. Group ID auto-detected from context

-The architecture treats Telegram and WhatsApp identically - the only difference is the group ID format.
+## Implementation

-## Future Enhancements
+- **Module:** [`lib/context-guard.ts`](../lib/context-guard.ts)
+- **Detection logic:** Checks agentId, messageChannel, sessionKey pattern matching
+- **Configuration:** `devClawAgentIds` in plugin config lists which agents are DevClaw orchestrators

- [ ] Integrate into remaining tools (`task_complete`, `session_health`, `task_create`, `devclaw_setup`)
- [ ] System prompt injection (requires OpenClaw core support)
- [ ] Context-based tool filtering (hide irrelevant tools)
- [ ] Per-project context overrides
+## Related
+
+- [Configuration — devClawAgentIds](CONFIGURATION.md#devclaw-agent-ids)
+- [Architecture — Scope boundaries](ARCHITECTURE.md#scope-boundaries)
--- a/docs/MANAGEMENT.md
+++ b/docs/MANAGEMENT.md
@@ -12,14 +12,14 @@ DevClaw exists because of a gap that management theorists identified decades ago

 In 1969, Paul Hersey and Ken Blanchard published what would become Situational Leadership Theory. The central idea is deceptively simple: the way you delegate should match the capability and reliability of the person doing the work. You don't hand an intern the system architecture redesign. You don't ask your principal engineer to rename a CSS class.

-DevClaw's model selection does exactly this. When a task comes in, the plugin evaluates complexity from the issue title and description, then routes it to the cheapest model that can handle it:
+DevClaw's level selection does exactly this. When a task comes in, the plugin routes it to the cheapest model that can handle it:

-| Complexity                       | Model  | Analogy                     |
-| -------------------------------- | ------ | --------------------------- |
-| Simple (typos, renames, copy)    | Haiku  | Junior dev — just execute   |
-| Standard (features, bug fixes)   | Sonnet | Mid-level — think and build |
-| Complex (architecture, security) | Opus   | Senior — design and reason  |
-| Review                           | Grok   | Independent reviewer        |
+| Complexity                       | Level    | Analogy                     |
+| -------------------------------- | -------- | --------------------------- |
+| Simple (typos, renames, copy)    | Junior   | The intern — just execute   |
+| Standard (features, bug fixes)   | Medior   | Mid-level — think and build |
+| Complex (architecture, security) | Senior   | The architect — design and reason |
+| Review                           | Reviewer | Independent code reviewer   |

 This isn't just cost optimization. It mirrors what effective managers do instinctively: match the delegation level to the task, not to a fixed assumption about the delegate.

@@ -27,11 +27,11 @@ This isn't just cost optimization. It mirrors what effective managers do instinc

 Classical management theory — later formalized by Bernard Bass in his work on Transformational Leadership — introduced a concept called Management by Exception (MBE). The principle: a manager should only be pulled back into a workstream when something deviates from the expected path.

-DevClaw's task lifecycle is built on this. The orchestrator delegates a task via `task_pickup`, then steps away. It only re-engages in three scenarios:
+DevClaw's task lifecycle is built on this. The orchestrator delegates a task via `work_start`, then steps away. It only re-engages in three scenarios:

-1. **DEV completes work** → The task moves to QA automatically. No orchestrator involvement needed.
+1. **DEV completes work** → The label moves to `To Test`. The scheduler dispatches QA on the next tick. No orchestrator involvement needed.
 2. **QA passes** → The issue closes. Pipeline complete.
-3. **QA fails** → The task cycles back to DEV with a fix request. The orchestrator may need to adjust the model tier.
+3. **QA fails** → The label moves to `To Improve`. The scheduler dispatches DEV on the next tick. The orchestrator may need to adjust the model level.
 4. **QA refines** → The task enters a holding state that _requires human decision_. This is the explicit escalation boundary.

 The "refine" state is the most interesting from a delegation perspective. It's a conscious architectural decision that says: some judgments should not be automated. When the QA agent determines that a task needs rethinking rather than just fixing, it escalates to the only actor who has the full business context — the human.
@@ -61,7 +61,7 @@ One of the most common delegation failures is self-review. You don't ask the per
 DevClaw enforces structural separation between development and review by design:

 - DEV and QA are separate sub-agent sessions with separate state.
- QA uses a different model entirely (Grok), introducing genuine independence.
+- QA uses the reviewer level, which can be a different model entirely, introducing genuine independence.
 - The review happens after a clean label transition — QA picks up from `To Test`, not from watching DEV work in real time.

 This mirrors a principle from organizational design: effective controls require independence between execution and verification. It's the same reason companies separate their audit function from their operations.
@@ -72,7 +72,7 @@ Ronald Coase won a Nobel Prize for explaining why firms exist: transaction costs

 DevClaw applies the same logic to AI sessions. Spawning a new sub-agent session costs approximately 50,000 tokens of context loading — the agent needs to read the full codebase before it can do useful work. That's the onboarding cost.

-The plugin tracks session IDs across task completions. When a DEV finishes task A and task B is ready on the same project, DevClaw detects the existing session and returns `"sessionAction": "send"` instead of `"spawn"`. The orchestrator routes the new task to the running session. No re-onboarding. No context reload.
+The plugin tracks session keys across task completions. When a DEV finishes task A and task B is ready on the same project, DevClaw detects the existing session and reuses it instead of spawning a new one. No re-onboarding. No context reload.

 In management terms: keep your team stable. Reassigning the same person to the next task on their project is almost always cheaper than bringing in someone new — even if the new person is theoretically better qualified.

@@ -101,11 +101,11 @@ This is the deepest lesson from delegation theory: **good delegation isn't about

 Management research points to a few directions that could extend DevClaw's delegation model:

-**Progressive delegation.** Blanchard's model suggests increasing task complexity for delegates as they prove competent. DevClaw could track QA pass rates per model tier and automatically promote — if Haiku consistently passes QA on borderline tasks, start routing more work to it. This is how good managers develop their people, and it reduces cost over time.
+**Progressive delegation.** Blanchard's model suggests increasing task complexity for delegates as they prove competent. DevClaw could track QA pass rates per model level and automatically promote — if junior consistently passes QA on borderline tasks, start routing more work to it. This is how good managers develop their people, and it reduces cost over time.

 **Delegation authority expansion.** The Vroom-Yetton decision model maps when a leader should decide alone versus consulting the team. Currently, sub-agents have narrow authority — they execute tasks but can't restructure the backlog. Selectively expanding this (e.g., allowing a DEV agent to split a task it judges too large) would reduce orchestrator bottlenecks, mirroring how managers gradually give high-performers more autonomy.

-**Outcome-based learning.** Delegation research emphasizes that the _delegator_ learns from outcomes too. Aggregated metrics — QA fail rate by model tier, average cycles to Done, time-in-state distributions — would help both the orchestrator agent and the human calibrate their delegation patterns over time.
+**Outcome-based learning.** Delegation research emphasizes that the _delegator_ learns from outcomes too. Aggregated metrics — QA fail rate by model level, average cycles to Done, time-in-state distributions — would help both the orchestrator agent and the human calibrate their delegation patterns over time.

 ---

--- a/docs/ONBOARDING.md
+++ b/docs/ONBOARDING.md
@@ -1,18 +1,18 @@
 # DevClaw — Onboarding Guide

-## What you need before starting
+Step-by-step setup: install the plugin, configure an agent, register projects, and run your first task.
+
+## Prerequisites

 | Requirement | Why | How to check |
 |---|---|---|
 | [OpenClaw](https://openclaw.ai) installed | DevClaw is an OpenClaw plugin | `openclaw --version` |
 | Node.js >= 20 | Runtime for plugin | `node --version` |
-| [`glab`](https://gitlab.com/gitlab-org/cli) or [`gh`](https://cli.github.com) CLI | Issue tracker provider (auto-detected from remote) | `glab --version` or `gh --version` |
-| CLI authenticated | Plugin calls glab/gh for every label transition | `glab auth status` or `gh auth status` |
-| A GitLab/GitHub repo with issues | The task backlog lives in the issue tracker | `glab issue list` or `gh issue list` from your repo |
+| [`gh`](https://cli.github.com) or [`glab`](https://gitlab.com/gitlab-org/cli) CLI | Issue tracker provider (auto-detected from git remote) | `gh --version` or `glab --version` |
+| CLI authenticated | Plugin calls gh/glab for every label transition | `gh auth status` or `glab auth status` |
+| A GitHub/GitLab repo with issues | The task backlog lives in the issue tracker | `gh issue list` or `glab issue list` from your repo |

-## Setup
-
-### 1. Install the plugin
+## Step 1: Install the plugin

 ```bash
 # Copy to extensions directory (auto-discovered on next restart)
@@ -25,21 +25,21 @@ openclaw plugins list
 # Should show: DevClaw | devclaw | loaded
 ```

-### 2. Run setup
+## Step 2: Run setup

 There are three ways to set up DevClaw:

-#### Option A: Conversational onboarding (recommended)
+### Option A: Conversational onboarding (recommended)

-Call the `devclaw_onboard` tool from any agent that has the DevClaw plugin loaded. The agent will walk you through configuration step by step — asking about:
+Call the `onboard` tool from any agent that has the DevClaw plugin loaded. The agent walks you through configuration step by step — asking about:
 - Agent selection (current or create new)
 - Channel binding (telegram/whatsapp/none) — for new agents only
- Model tiers (accept defaults or customize)
+- Model levels (accept defaults or customize)
 - Optional project registration

 The tool returns instructions that guide the agent through the QA-style setup conversation.

-#### Option B: CLI wizard
+### Option B: CLI wizard

 ```bash
 openclaw devclaw setup
@@ -48,12 +48,13 @@ openclaw devclaw setup
 The setup wizard walks you through:

 1. **Agent** — Create a new orchestrator agent or configure an existing one
-2. **Developer team** — Choose which LLM model powers each developer tier:
-   - **Junior** (fast, cheap tasks) — default: `anthropic/claude-haiku-4-5`
-   - **Medior** (standard tasks) — default: `anthropic/claude-sonnet-4-5`
-   - **Senior** (complex tasks) — default: `anthropic/claude-opus-4-5`
-   - **QA** (code review) — default: `anthropic/claude-sonnet-4-5`
-3. **Workspace** — Writes AGENTS.md, HEARTBEAT.md, role templates, and initializes memory
+2. **Developer team** — Choose which LLM model powers each developer level:
+   - **DEV junior** (fast, cheap tasks) — default: `anthropic/claude-haiku-4-5`
+   - **DEV medior** (standard tasks) — default: `anthropic/claude-sonnet-4-5`
+   - **DEV senior** (complex tasks) — default: `anthropic/claude-opus-4-5`
+   - **QA reviewer** (code review) — default: `anthropic/claude-sonnet-4-5`
+   - **QA tester** (manual testing) — default: `anthropic/claude-haiku-4-5`
+3. **Workspace** — Writes AGENTS.md, HEARTBEAT.md, role templates, and initializes state

 Non-interactive mode:
 ```bash
@@ -66,45 +67,45 @@ openclaw devclaw setup --agent my-orchestrator \
  --senior "anthropic/claude-opus-4-5"
 ```

-#### Option C: Tool call (agent-driven)
+### Option C: Tool call (agent-driven)

 **Conversational onboarding via tool:**
 ```json
-devclaw_onboard({ mode: "first-run" })
+onboard({ "mode": "first-run" })
 ```

-The tool returns step-by-step instructions that guide the agent through the QA-style setup conversation.
+The tool returns step-by-step instructions that guide the agent through the setup conversation.

 **Direct setup (skip conversation):**
 ```json
-{
+setup({
  "newAgentName": "My Dev Orchestrator",
  "channelBinding": "telegram",
  "models": {
-    "junior": "anthropic/claude-haiku-4-5",
-    "senior": "anthropic/claude-opus-4-5"
+    "dev": {
+      "junior": "anthropic/claude-haiku-4-5",
+      "senior": "anthropic/claude-opus-4-5"
+    },
+    "qa": {
+      "reviewer": "anthropic/claude-sonnet-4-5"
+    }
  }
-}
+})
 ```

-This calls `devclaw_setup` directly without conversational prompts.
+## Step 3: Channel binding (optional, for new agents)

-### 3. Channel binding (optional, for new agents)
-
-If you created a new agent during conversational onboarding and selected a channel binding (telegram/whatsapp), the agent is automatically bound and will receive messages from that channel. **Skip to step 4.**
+If you created a new agent during conversational onboarding and selected a channel binding (telegram/whatsapp), the agent is automatically bound. **Skip to step 4.**

 **Smart Migration**: If an existing agent already has a channel-wide binding (e.g., the old orchestrator receives all telegram messages), the onboarding agent will:
-1. Call `analyze_channel_bindings` to detect the conflict
+1. Detect the conflict
 2. Ask if you want to migrate the binding from the old agent to the new one
 3. If you confirm, the binding is automatically moved — no manual config edit needed

-This is useful when you're replacing an old orchestrator with a new one.
+If you didn't bind a channel during setup:

-If you didn't bind a channel during setup, you have two options:
+**Option A: Manually edit `openclaw.json`**

-**Option A: Manually edit `openclaw.json`** (for existing agents or post-creation binding)
-
-Add an entry to the `bindings` array:
 ```json
 {
  "bindings": [
@@ -136,131 +137,115 @@ Restart OpenClaw after editing.

 **Option B: Add bot to Telegram/WhatsApp group**

-If using a channel-wide binding (no peer filter), the agent will receive all messages from that channel. Add your orchestrator bot to the relevant Telegram group for the project.
+If using a channel-wide binding (no peer filter), the agent receives all messages from that channel. Add your orchestrator bot to the relevant Telegram group.

-### 4. Register your project
+## Step 4: Register your project

-Tell the orchestrator agent to register a new project:
+Go to the Telegram/WhatsApp group for the project and tell the orchestrator agent:

-> "Register project my-project at ~/git/my-project for group -1234567890 with base branch development"
+> "Register project my-project at ~/git/my-project with base branch development"

 The agent calls `project_register`, which atomically:
 - Validates the repo and auto-detects GitHub/GitLab from remote
 - Creates all 8 state labels (idempotent)
- Scaffolds prompt instruction files (`projects/prompts/<project>/dev.md` and `qa.md`)
- Adds the project entry to `projects.json` with `autoChain: false`
+- Scaffolds role instruction files (`projects/roles/<project>/dev.md` and `qa.md`)
+- Adds the project entry to `projects.json`
 - Logs the registration event

+**Initial state in `projects.json`:**
+
 ```json
 {
  "projects": {
    "-1234567890": {
      "name": "my-project",
      "repo": "~/git/my-project",
-      "groupName": "Dev - My Project",
-      "deployUrl": "",
+      "groupName": "Project: my-project",
      "baseBranch": "development",
      "deployBranch": "development",
-      "autoChain": false,
+      "channel": "telegram",
+      "roleExecution": "parallel",
      "dev": {
        "active": false,
        "issueId": null,
        "startTime": null,
-        "model": null,
+        "level": null,
        "sessions": { "junior": null, "medior": null, "senior": null }
      },
      "qa": {
        "active": false,
        "issueId": null,
        "startTime": null,
-        "model": null,
-        "sessions": { "qa": null }
+        "level": null,
+        "sessions": { "reviewer": null, "tester": null }
      }
    }
  }
 }
 ```

-**Manual fallback:** If you prefer CLI control, you can still create labels manually with `glab label create` and edit `projects.json` directly. See the [Architecture docs](ARCHITECTURE.md) for label names and colors.
+**Finding the Telegram group ID:** The group ID is the numeric ID of your Telegram supergroup (a negative number like `-1234567890`). When you call `project_register` from within the group, the ID is auto-detected from context.

-**Finding the Telegram group ID:** The group ID is the numeric ID of your Telegram supergroup (a negative number like `-1234567890`). You can find it via the Telegram bot API or from message metadata in OpenClaw logs.
-
-### 5. Create your first issue
+## Step 5: Create your first issue

 Issues can be created in multiple ways:
 - **Via the agent** — Ask the orchestrator in the Telegram group: "Create an issue for adding a login page" (uses `task_create`)
 - **Via workers** — DEV/QA workers can call `task_create` to file follow-up bugs they discover
- **Via CLI** — `cd ~/git/my-project && glab issue create --title "My first task" --label "To Do"` (or `gh issue create`)
+- **Via CLI** — `cd ~/git/my-project && gh issue create --title "My first task" --label "To Do"` (or `glab issue create`)
 - **Via web UI** — Create an issue and add the "To Do" label

-### 6. Test the pipeline
+Note: `task_create` defaults to the "Planning" label. Use "To Do" explicitly when the task is ready for immediate work.
+
+## Step 6: Test the pipeline

 Ask the agent in the Telegram group:

 > "Check the queue status"

-The agent should call `queue_status` and report the "To Do" issue. Then:
+The agent should call `status` and report the "To Do" issue. Then:

 > "Pick up issue #1 for DEV"

-The agent calls `task_pickup`, which assigns a developer tier, transitions the label to "Doing", creates or reuses a worker session, and dispatches the task — all in one call. The agent just posts the announcement.
+The agent calls `work_start`, which assigns a developer level, transitions the label to "Doing", creates or reuses a worker session, and dispatches the task — all in one call. The agent posts the announcement.

 ## Adding more projects

-Tell the agent to register a new project (step 3) and add the bot to the new Telegram group (step 4). That's it — `project_register` handles labels and state setup.
+Tell the agent to register a new project (step 4) from within the new project's Telegram group. That's it — `project_register` handles labels and state setup.

 Each project is fully isolated — separate queue, separate workers, separate state.

-## Developer tiers
+## Developer levels

-DevClaw assigns tasks to developer tiers instead of raw model names. This makes the system intuitive — you're assigning a "junior dev" to fix a typo, not configuring model parameters.
+DevClaw assigns tasks to developer levels instead of raw model names. This makes the system intuitive — you're assigning a "junior dev" to fix a typo, not configuring model parameters.

-| Tier | Role | Default model | When to assign |
-|------|------|---------------|----------------|
-| **junior** | Junior developer | `anthropic/claude-haiku-4-5` | Typos, single-file fixes, CSS changes |
-| **medior** | Mid-level developer | `anthropic/claude-sonnet-4-5` | Features, bug fixes, multi-file changes |
-| **senior** | Senior developer | `anthropic/claude-opus-4-5` | Architecture, migrations, system-wide refactoring |
-| **qa** | QA engineer | `anthropic/claude-sonnet-4-5` | Code review, test validation |
+| Role | Level | Default model | When to assign |
+|------|-------|---------------|----------------|
+| DEV | **junior** | `anthropic/claude-haiku-4-5` | Typos, single-file fixes, CSS changes |
+| DEV | **medior** | `anthropic/claude-sonnet-4-5` | Features, bug fixes, multi-file changes |
+| DEV | **senior** | `anthropic/claude-opus-4-5` | Architecture, migrations, system-wide refactoring |
+| QA | **reviewer** | `anthropic/claude-sonnet-4-5` | Code review, test validation |
+| QA | **tester** | `anthropic/claude-haiku-4-5` | Manual testing, smoke tests |

-Change which model powers each tier in `openclaw.json`:
-```json
-{
-  "plugins": {
-    "entries": {
-      "devclaw": {
-        "config": {
-          "models": {
-            "junior": "anthropic/claude-haiku-4-5",
-            "medior": "anthropic/claude-sonnet-4-5",
-            "senior": "anthropic/claude-opus-4-5",
-            "qa": "anthropic/claude-sonnet-4-5"
-          }
-        }
-      }
-    }
-  }
-}
-```
+Change which model powers each level in `openclaw.json` — see [Configuration](CONFIGURATION.md#model-tiers).

 ## What the plugin handles vs. what you handle

 | Responsibility | Who | Details |
 |---|---|---|
 | Plugin installation | You (once) | `cp -r devclaw ~/.openclaw/extensions/` |
-| Agent + workspace setup | Plugin (`devclaw_setup`) | Creates agent, configures models, writes workspace files |
-| Channel binding analysis | Plugin (`analyze_channel_bindings`) | Detects channel conflicts, validates channel configuration |
-| Channel binding migration | Plugin (`devclaw_setup` with `migrateFrom`) | Automatically moves channel-wide bindings between agents |
-| Label setup | Plugin (`project_register`) | 8 labels, created idempotently via `IssueProvider` |
-| Prompt file scaffolding | Plugin (`project_register`) | Creates `projects/prompts/<project>/dev.md` and `qa.md` |
+| Agent + workspace setup | Plugin (`setup`) | Creates agent, configures models, writes workspace files |
+| Channel binding migration | Plugin (`setup` with `migrateFrom`) | Automatically moves channel-wide bindings between agents |
+| Label setup | Plugin (`project_register`) | 8 labels, created idempotently via IssueProvider |
+| Prompt file scaffolding | Plugin (`project_register`) | Creates `projects/roles/<project>/dev.md` and `qa.md` |
 | Project registration | Plugin (`project_register`) | Entry in `projects.json` with empty worker state |
 | Telegram group setup | You (once per project) | Add bot to group |
 | Issue creation | Plugin (`task_create`) | Orchestrator or workers create issues from chat |
-| Label transitions | Plugin | Atomic label transitions via issue tracker CLI |
-| Developer assignment | Plugin | LLM-selected tier by orchestrator, keyword heuristic fallback |
+| Label transitions | Plugin | Atomic transitions via issue tracker CLI |
+| Developer assignment | Plugin | LLM-selected level by orchestrator, keyword heuristic fallback |
 | State management | Plugin | Atomic read/write to `projects.json` |
 | Session management | Plugin | Creates, reuses, and dispatches to sessions via CLI. Agent never touches session tools. |
-| Task completion | Plugin (`task_complete`) | Workers self-report. Auto-chains if enabled. |
-| Prompt instructions | Plugin (`task_pickup`) | Loaded from `projects/prompts/<project>/<role>.md`, appended to task message |
+| Task completion | Plugin (`work_finish`) | Workers self-report. Scheduler dispatches next role. |
+| Prompt instructions | Plugin (`work_start`) | Loaded from `projects/roles/<project>/<role>.md`, appended to task message |
 | Audit logging | Plugin | Automatic NDJSON append per tool call |
-| Zombie detection | Plugin | `session_health` checks active vs alive |
-| Queue scanning | Plugin | `queue_status` queries issue tracker per project |
+| Zombie detection | Plugin | `health` checks active vs alive |
+| Queue scanning | Plugin | `status` queries issue tracker per project |
--- a/docs/QA_WORKFLOW.md
+++ b/docs/QA_WORKFLOW.md
@@ -1,8 +1,6 @@
-# QA Workflow
+# DevClaw — QA Workflow

-## Overview
-
-Quality Assurance (QA) in DevClaw follows a structured workflow that ensures every review is documented and traceable.
+Quality Assurance in DevClaw follows a structured workflow that ensures every review is documented and traceable.

 ## Required Steps

@@ -28,10 +26,10 @@ task_comment({

 ### 3. Complete the Task

-After posting your comment, call `task_complete`:
+After posting your comment, call `work_finish`:

 ```javascript
-task_complete({
+work_finish({
  role: "qa",
  projectGroupId: "<group-id>",
  result: "pass",  // or "fail", "refine", "blocked"
@@ -39,15 +37,24 @@ task_complete({
 })
 ```

+## QA Results
+
+| Result | Label transition | Meaning |
+|---|---|---|
+| `"pass"` | Testing → Done | Approved. Issue closed. |
+| `"fail"` | Testing → To Improve | Issues found. Issue reopened, sent back to DEV. |
+| `"refine"` | Testing → Refining | Needs human decision. Pipeline pauses. |
+| `"blocked"` | Testing → To Test | Cannot complete (env issues, etc.). Returns to QA queue. |
+
 ## Why Comments Are Required

-1. **Audit Trail**: Every review decision is documented
-2. **Knowledge Sharing**: Future reviewers understand what was tested
-3. **Quality Metrics**: Enables tracking of test coverage
-4. **Debugging**: When issues arise later, we know what was checked
-5. **Compliance**: Some projects require documented QA evidence
+1. **Audit Trail** — Every review decision is documented in the issue tracker
+2. **Knowledge Sharing** — Future reviewers understand what was tested
+3. **Quality Metrics** — Enables tracking of test coverage
+4. **Debugging** — When issues arise later, we know what was checked
+5. **Compliance** — Some projects require documented QA evidence

-## Comment Template
+## Comment Templates

 ### For Passing Reviews

@@ -61,7 +68,7 @@ task_complete({

 **Results:** All tests passed. No regressions found.

-**Environment:** 
+**Environment:**
 - Browser/Platform: [details]
 - Version: [details]
 - Test data: [if relevant]
@@ -72,15 +79,14 @@ task_complete({
 ### For Failing Reviews

 ```markdown
-## QA Review - Issues Found
+## QA Review — Issues Found

 **Tested:**
 - [What you tested]

 **Issues Found:**
 1. [Issue description with steps to reproduce]
-2. [Issue description with steps to reproduce]
-3. [Issue description with expected vs actual behavior]
+2. [Issue description with expected vs actual behavior]

 **Environment:**
 - [Test environment details]
@@ -90,25 +96,25 @@ task_complete({

 ## Enforcement

-As of [current date], QA workers are instructed via role templates to:
- Always call `task_comment` BEFORE `task_complete`
+QA workers receive instructions via role templates to:
+- Always call `task_comment` BEFORE `work_finish`
 - Include specific details about what was tested
 - Document results, environment, and any notes

 Prompt templates affected:
- `projects/prompts/<project>/qa.md`
+- `projects/roles/<project>/qa.md`
 - All project-specific QA templates should follow this pattern

 ## Best Practices

-1. **Be Specific**: Don't just say "tested the feature" - list what you tested
-2. **Include Environment**: Version numbers, browser, OS can matter
-3. **Document Edge Cases**: If you tested special scenarios, note them
-4. **Use Screenshots**: For UI issues, screenshots help (link in comment)
-5. **Reference Requirements**: Link back to acceptance criteria from the issue
+1. **Be Specific** — Don't just say "tested the feature" — list what you tested
+2. **Include Environment** — Version numbers, browser, OS can matter
+3. **Document Edge Cases** — If you tested special scenarios, note them
+4. **Reference Requirements** — Link back to acceptance criteria from the issue
+5. **Use Screenshots** — For UI issues, screenshots help (link in comment)

 ## Related

- Issue #103: Enforce QA comment on every review (pass or fail)
- Tool: `task_comment` - Add comments to issues
- Tool: `task_complete` - Complete QA tasks
+- Tool: [`task_comment`](TOOLS.md#task_comment) — Add comments to issues
+- Tool: [`work_finish`](TOOLS.md#work_finish) — Complete QA tasks
+- Config: [`projects/roles/<project>/qa.md`](CONFIGURATION.md#role-instruction-files) — QA role instructions
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -15,35 +15,35 @@ This works for the common case but breaks down when you want:

 Roles become a configurable list instead of a hardcoded pair. Each role defines:
 - **Name** — e.g. `design`, `dev`, `qa`, `devops`
- **Tiers** — which developer tiers can be assigned (e.g. design only needs `medior`)
+- **Levels** — which developer levels can be assigned (e.g. design only needs `medior`)
 - **Pipeline position** — where it sits in the task lifecycle
 - **Worker count** — how many concurrent workers (default: 1)

 ```json
 {
  "roles": {
-    "dev": { "tiers": ["junior", "medior", "senior"], "workers": 1 },
-    "qa": { "tiers": ["qa"], "workers": 1 },
-    "devops": { "tiers": ["medior", "senior"], "workers": 1 }
+    "dev": { "levels": ["junior", "medior", "senior"], "workers": 1 },
+    "qa": { "levels": ["reviewer", "tester"], "workers": 1 },
+    "devops": { "levels": ["medior", "senior"], "workers": 1 }
  },
  "pipeline": ["dev", "qa", "devops"]
 }
 ```

-The pipeline definition replaces the hardcoded `Doing → To Test → Testing → Done` flow. Labels and transitions are generated from the pipeline config. Auto-chaining follows the pipeline order.
+The pipeline definition replaces the hardcoded `Doing → To Test → Testing → Done` flow. Labels and transitions are generated from the pipeline config. The scheduler follows the pipeline order when filling free slots.

 ### Open questions

 - How do custom labels map? Generate from role names, or let users define?
- Should roles have their own instruction files (`projects/prompts/<project>/<role>.md`) — yes, this already works
+- Should roles have their own instruction files (`projects/roles/<project>/<role>.md`) — yes, this already works
 - How to handle parallel roles (e.g. frontend + backend DEV in parallel before QA)?

 ---

-## Channel-agnostic groups
+## Channel-agnostic Groups

 Currently DevClaw maps projects to **Telegram group IDs**. The `projectGroupId` is a Telegram-specific negative number. This means:
- WhatsApp groups can't be used as project channels
+- WhatsApp groups can't be used as project channels (partially supported now via `channel` field)
 - Discord, Slack, or other channels are excluded
 - The naming (`groupId`, `groupName`) is Telegram-specific

@@ -77,19 +77,20 @@ Key changes:
 - All tool params, state keys, and docs updated accordingly
 - Backward compatible: existing Telegram-only keys migrated on read

-This enables any OpenClaw channel (Telegram, WhatsApp, Discord, Slack, etc.) to host a project — each group chat becomes an autonomous dev team regardless of platform.
+This enables any OpenClaw channel (Telegram, WhatsApp, Discord, Slack, etc.) to host a project.

 ### Open questions

 - Should one project be bindable to multiple channels? (e.g. Telegram for devs, WhatsApp for stakeholder updates)
- How does the orchestrator agent handle cross-channel context? (OpenClaw bindings already route by channel)
+- How does the orchestrator agent handle cross-channel context?

 ---

-## Other ideas
+## Other Ideas

 - **Jira provider** — `IssueProvider` interface already abstracts GitHub/GitLab; Jira is the obvious next addition
- **Deployment integration** — `task_complete` QA pass could trigger a deploy step via webhook or CLI
- **Cost tracking** — log token usage per task/tier, surface in `queue_status`
+- **Deployment integration** — `work_finish` QA pass could trigger a deploy step via webhook or CLI
+- **Cost tracking** — log token usage per task/level, surface in `status`
 - **Priority scoring** — automatic priority assignment based on labels, age, and dependencies
 - **Session archival** — auto-archive idle sessions after configurable timeout (currently indefinite)
+- **Progressive delegation** — track QA pass rates per level and auto-promote (see [Management Theory](MANAGEMENT.md))
--- a/docs/TESTING.md
+++ b/docs/TESTING.md
@@ -59,10 +59,15 @@ npm run test:ui
      "devclaw": {
        "config": {
          "models": {
-            "junior": "anthropic/claude-haiku-4-5",
-            "medior": "anthropic/claude-sonnet-4-5",
-            "senior": "anthropic/claude-opus-4-5",
-            "qa": "anthropic/claude-sonnet-4-5"
+            "dev": {
+              "junior": "anthropic/claude-haiku-4-5",
+              "medior": "anthropic/claude-sonnet-4-5",
+              "senior": "anthropic/claude-opus-4-5"
+            },
+            "qa": {
+              "reviewer": "anthropic/claude-sonnet-4-5",
+              "tester": "anthropic/claude-haiku-4-5"
+            }
          }
        }
      }
--- a/docs/TOOLS.md
+++ b/docs/TOOLS.md
@@ -0,0 +1,361 @@
+# DevClaw — Tools Reference
+
+Complete reference for all 11 tools registered by DevClaw. See [`index.ts`](../index.ts) for registration.
+
+## Worker Lifecycle
+
+### `work_start`
+
+Pick up a task from the issue queue. Handles level assignment, label transition, session creation/reuse, task dispatch, and audit logging — all in one call.
+
+**Source:** [`lib/tools/work-start.ts`](../lib/tools/work-start.ts)
+
+**Context:** Only works in project group chats.
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `issueId` | number | No | Issue ID. If omitted, picks next by priority. |
+| `role` | `"dev"` \| `"qa"` | No | Worker role. Auto-detected from issue label if omitted. |
+| `projectGroupId` | string | No | Project group ID. Auto-detected from group context. |
+| `level` | string | No | Developer level (`junior`, `medior`, `senior`, `reviewer`). Auto-detected if omitted. |
+
+**What it does atomically:**
+
+1. Resolves project from `projects.json`
+2. Validates no active worker for this role
+3. Fetches issue from tracker, verifies correct label state
+4. Assigns level (LLM-chosen via `level` param → label detection → keyword heuristic fallback)
+5. Resolves level to model ID via config or defaults
+6. Loads prompt instructions from `projects/roles/<project>/<role>.md`
+7. Looks up existing session for assigned level (session-per-level)
+8. Transitions label (e.g. `To Do` → `Doing`)
+9. Creates session via Gateway RPC if new (`sessions.patch`)
+10. Dispatches task to worker session via CLI (`openclaw gateway call agent`)
+11. Updates `projects.json` state (active, issueId, level, session key)
+12. Writes audit log entries (work_start + model_selection)
+13. Sends notification
+14. Returns announcement text
+
+**Level selection priority:**
+
+1. `level` parameter (LLM-selected) — highest priority
+2. Issue label (e.g. a label named "junior" or "senior")
+3. Keyword heuristic from `model-selector.ts` — fallback
+
+**Execution guards:**
+
+- Rejects if role already has an active worker
+- Respects `roleExecution` (sequential: rejects if other role is active)
+
+**On failure:** Rolls back label transition. No orphaned state.
+
+---
+
+### `work_finish`
+
+Complete a task with a result. Called by workers (DEV/QA sub-agent sessions) directly, or by the orchestrator.
+
+**Source:** [`lib/tools/work-finish.ts`](../lib/tools/work-finish.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `role` | `"dev"` \| `"qa"` | Yes | Worker role |
+| `result` | string | Yes | Completion result (see table below) |
+| `projectGroupId` | string | Yes | Project group ID |
+| `summary` | string | No | Brief summary for the announcement |
+| `prUrl` | string | No | PR/MR URL (auto-detected if omitted) |
+
+**Valid results by role:**
+
+| Role | Result | Label transition | Side effects |
+|---|---|---|---|
+| DEV | `"done"` | Doing → To Test | git pull, auto-detect PR URL |
+| DEV | `"blocked"` | Doing → To Do | Task returns to queue |
+| QA | `"pass"` | Testing → Done | Issue closed |
+| QA | `"fail"` | Testing → To Improve | Issue reopened |
+| QA | `"refine"` | Testing → Refining | Awaits human decision |
+| QA | `"blocked"` | Testing → To Test | Task returns to QA queue |
+
+**What it does atomically:**
+
+1. Validates role:result combination
+2. Resolves project and active worker
+3. Executes completion via pipeline service (label transition + side effects)
+4. Deactivates worker (sessions map preserved for reuse)
+5. Sends notification
+6. Ticks queue to fill free worker slots
+7. Writes audit log
+
+**Scheduling:** After completion, `work_finish` ticks the queue. The scheduler sees the new label (`To Test` or `To Improve`) and dispatches the next worker if a slot is free.
+
+---
+
+## Task Management
+
+### `task_create`
+
+Create a new issue in the project's issue tracker.
+
+**Source:** [`lib/tools/task-create.ts`](../lib/tools/task-create.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | Yes | Project group ID |
+| `title` | string | Yes | Issue title |
+| `description` | string | No | Full issue body (markdown) |
+| `label` | StateLabel | No | State label. Defaults to `"Planning"`. |
+| `assignees` | string[] | No | GitHub/GitLab usernames to assign |
+| `pickup` | boolean | No | If true, immediately pick up for DEV after creation |
+
+**Use cases:**
+
+- Orchestrator creates tasks from chat messages
+- Workers file follow-up bugs discovered during development
+- Breaking down epics into smaller tasks
+
+**Default behavior:** Creates issues in `"Planning"` state. Only use `"To Do"` when the user explicitly requests immediate work.
+
+---
+
+### `task_update`
+
+Change an issue's state label manually without going through the full pickup/complete flow.
+
+**Source:** [`lib/tools/task-update.ts`](../lib/tools/task-update.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | Yes | Project group ID |
+| `issueId` | number | Yes | Issue ID to update |
+| `state` | StateLabel | Yes | New state label |
+| `reason` | string | No | Audit log reason for the change |
+
+**Valid states:** `Planning`, `To Do`, `Doing`, `To Test`, `Testing`, `Done`, `To Improve`, `Refining`
+
+**Use cases:**
+
+- Manual state adjustments (e.g. `Planning → To Do` after approval)
+- Failed auto-transitions that need correction
+- Bulk state changes by orchestrator
+
+---
+
+### `task_comment`
+
+Add a comment to an issue for feedback, notes, or discussion.
+
+**Source:** [`lib/tools/task-comment.ts`](../lib/tools/task-comment.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | Yes | Project group ID |
+| `issueId` | number | Yes | Issue ID to comment on |
+| `body` | string | Yes | Comment body (markdown) |
+| `authorRole` | `"dev"` \| `"qa"` \| `"orchestrator"` | No | Attribution role prefix |
+
+**Use cases:**
+
+- QA adds review feedback before pass/fail decision
+- DEV posts implementation notes or progress updates
+- Orchestrator adds summary comments
+
+When `authorRole` is provided, the comment is prefixed with a role emoji and attribution label.
+
+---
+
+## Operations
+
+### `status`
+
+Lightweight queue + worker state dashboard.
+
+**Source:** [`lib/tools/status.ts`](../lib/tools/status.ts)
+
+**Context:** Auto-filters to project in group chats. Shows all projects in DMs.
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | No | Filter to specific project. Omit for all. |
+
+**Returns per project:**
+
+- Worker state: active/idle, current issue, level, start time
+- Queue counts: To Do, To Test, To Improve
+- Role execution mode
+
+---
+
+### `health`
+
+Worker health scan with optional auto-fix.
+
+**Source:** [`lib/tools/health.ts`](../lib/tools/health.ts)
+
+**Context:** Auto-filters to project in group chats.
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | No | Filter to specific project. Omit for all. |
+| `fix` | boolean | No | Apply fixes for detected issues. Default: `false` (read-only). |
+| `activeSessions` | string[] | No | Active session IDs for zombie detection. |
+
+**Health checks:**
+
+| Issue | Severity | Detection | Auto-fix |
+|---|---|---|---|
+| Active worker with no session key | Critical | `active=true` but no session in map | Deactivate worker |
+| Active worker whose session is dead | Critical | Session key not in active sessions list | Deactivate worker, revert label |
+| Worker active >2 hours | Warning | `startTime` older than 2h | Deactivate worker, revert label to queue |
+| Inactive worker with lingering issue ID | Warning | `active=false` but `issueId` still set | Clear issueId |
+
+---
+
+### `work_heartbeat`
+
+Manual trigger for heartbeat: health fix + queue dispatch. Same logic as the background heartbeat service, but invoked on demand.
+
+**Source:** [`lib/tools/work-heartbeat.ts`](../lib/tools/work-heartbeat.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | No | Target single project. Omit for all. |
+| `dryRun` | boolean | No | Report only, don't dispatch. Default: `false`. |
+| `maxPickups` | number | No | Max worker dispatches per tick. |
+| `activeSessions` | string[] | No | Active session IDs for zombie detection. |
+
+**Two-pass sweep:**
+
+1. **Health pass** — Runs `checkWorkerHealth` per project per role. Auto-fixes zombies, stale workers, orphaned state.
+2. **Tick pass** — Calls `projectTick` per project. Fills free worker slots by priority (To Improve > To Test > To Do).
+
+**Execution guards:**
+
+- `projectExecution: "sequential"` — only one project active at a time
+- `roleExecution: "sequential"` — only one role (DEV or QA) active at a time per project (enforced in `projectTick`)
+
+---
+
+## Setup
+
+### `project_register`
+
+One-time project setup. Creates state labels, scaffolds prompt files, adds project to state.
+
+**Source:** [`lib/tools/project-register.ts`](../lib/tools/project-register.ts)
+
+**Context:** Only works in the Telegram/WhatsApp group being registered.
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | No | Auto-detected from current group if omitted |
+| `name` | string | Yes | Short project name (e.g. `my-webapp`) |
+| `repo` | string | Yes | Path to git repo (e.g. `~/git/my-project`) |
+| `groupName` | string | No | Display name. Defaults to `Project: {name}`. |
+| `baseBranch` | string | Yes | Base branch for development |
+| `deployBranch` | string | No | Deploy branch. Defaults to baseBranch. |
+| `deployUrl` | string | No | Deployment URL |
+| `roleExecution` | `"parallel"` \| `"sequential"` | No | DEV/QA parallelism. Default: `"parallel"`. |
+
+**What it does atomically:**
+
+1. Validates project not already registered
+2. Resolves repo path, auto-detects GitHub/GitLab from git remote
+3. Verifies provider health (CLI installed and authenticated)
+4. Creates all 8 state labels (idempotent — safe to run again)
+5. Adds project entry to `projects.json` with empty worker state
+   - DEV sessions: `{ junior: null, medior: null, senior: null }`
+   - QA sessions: `{ reviewer: null, tester: null }`
+6. Scaffolds prompt files: `projects/roles/<project>/dev.md` and `qa.md`
+7. Writes audit log
+
+---
+
+### `setup`
+
+Agent + workspace initialization.
+
+**Source:** [`lib/tools/setup.ts`](../lib/tools/setup.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `newAgentName` | string | No | Create a new agent. Omit to configure current workspace. |
+| `channelBinding` | `"telegram"` \| `"whatsapp"` | No | Channel to bind (with `newAgentName` only) |
+| `migrateFrom` | string | No | Agent ID to migrate channel binding from |
+| `models` | object | No | Model overrides per role and level (see [Configuration](CONFIGURATION.md#model-tiers)) |
+| `projectExecution` | `"parallel"` \| `"sequential"` | No | Project execution mode |
+
+**What it does:**
+
+1. Creates a new agent or configures existing workspace
+2. Optionally binds messaging channel (Telegram/WhatsApp)
+3. Optionally migrates channel binding from another agent
+4. Writes workspace files: AGENTS.md, HEARTBEAT.md, `projects/projects.json`
+5. Configures model tiers in `openclaw.json`
+
+---
+
+### `onboard`
+
+Conversational onboarding guide. Returns step-by-step instructions for the agent to walk the user through setup.
+
+**Source:** [`lib/tools/onboard.ts`](../lib/tools/onboard.ts)
+
+**Context:** Works in DMs and via-agent. Blocks group chats (setup should not happen in project groups).
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `mode` | `"first-run"` \| `"reconfigure"` | No | Auto-detected from current state |
+
+**Flow:**
+
+1. Call `onboard` — returns QA-style step-by-step instructions
+2. Agent walks user through: agent selection, channel binding, model tiers
+3. Agent calls `setup` with collected answers
+4. User registers projects via `project_register` in group chats
+
+---
+
+## Completion Rules Reference
+
+The pipeline service (`lib/services/pipeline.ts`) defines declarative completion rules:
+
+```
+dev:done    → Doing    → To Test     (git pull, detect PR)
+dev:blocked → Doing    → To Do       (return to queue)
+qa:pass     → Testing  → Done        (close issue)
+qa:fail     → Testing  → To Improve  (reopen issue)
+qa:refine   → Testing  → Refining    (await human decision)
+qa:blocked  → Testing  → To Test     (return to QA queue)
+```
+
+## Issue Priority Order
+
+When the heartbeat or `work_heartbeat` fills free worker slots, issues are prioritized:
+
+1. **To Improve** — QA failures get fixed first (highest priority)
+2. **To Test** — Completed DEV work gets reviewed next
+3. **To Do** — Fresh tasks are picked up last
+
+This ensures the pipeline clears its backlog before starting new work.
--- a/lib/templates.ts
+++ b/lib/templates.ts
@@ -102,7 +102,7 @@ All orchestration goes through these tools. You do NOT manually manage sessions,
 | \`status\` | Task queue and worker state per project (lightweight dashboard) |
 | \`health\` | Scan worker health: zombies, stale workers, orphaned state. Pass fix=true to auto-fix |
 | \`work_start\` | End-to-end: label transition, level assignment, session create/reuse, dispatch with role instructions |
-| \`work_finish\` | End-to-end: label transition, state update, issue close/reopen. Auto-ticks queue after completion. |
+| \`work_finish\` | End-to-end: label transition, state update, issue close/reopen. Ticks scheduler after completion. |

 ### Pipeline Flow

@@ -135,10 +135,10 @@ Evaluate each task and pass the appropriate developer level to \`work_start\`:

 ### When Work Completes

-Workers call \`work_finish\` themselves — the label transition, state update, and audit log happen atomically. After completion, \`work_finish\` auto-ticks the queue to fill free slots:
+Workers call \`work_finish\` themselves — the label transition, state update, and audit log happen atomically. After completion, \`work_finish\` ticks the scheduler to fill free slots:

- DEV "done" → issue moves to "To Test" → tick dispatches QA
- QA "fail" → issue moves to "To Improve" → tick dispatches DEV
+- DEV "done" → issue moves to "To Test" → scheduler dispatches QA
+- QA "fail" → issue moves to "To Improve" → scheduler dispatches DEV
 - QA "pass" → Done, no further dispatch
 - QA "refine" / blocked → needs human input