From 553efcc1463b7ae731f1203d9d6f32fa68d28277 Mon Sep 17 00:00:00 2001
From: Claude
Date: Tue, 10 Feb 2026 20:13:22 +0000
Subject: [PATCH 01/14] docs: overhaul documentation for consistency with
implementation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Complete documentation rewrite to match the current codebase:
- README: add benefits section (process consistency, token savings with
estimates, project isolation, continuous planning, feedback loops,
role-based prompts, atomic operations, audit trail), task workflow
with state diagram, model-to-role mapping tables, installation guide
- New TOOLS.md: complete reference for all 11 tools with parameters,
behavior, and execution guards
- New CONFIGURATION.md: full config reference for openclaw.json,
projects.json, heartbeat, notifications, workspace layout
- Fix tool names across all docs: task_pickup→work_start,
task_complete→work_finish
- Fix tier model: QA has reviewer/tester levels, not flat "qa"
- Fix config schema: nested models.dev.*/models.qa.* structure
- Fix prompt path: projects/roles/ not projects/prompts/
- Fix worker state: uses "level" field not "model"/"tier"
- Fix MANAGEMENT.md: remove incorrect model references
- Fix TESTING.md: update model config example to nested structure
- Remove VERIFICATION.md (one-off checklist, no longer needed)
- Add cross-references between all docs pages
https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
---
README.md | 664 +++++++++++++++-----------------------
VERIFICATION.md | 45 ---
docs/ARCHITECTURE.md | 382 +++++++++++-----------
docs/CONFIGURATION.md | 354 ++++++++++++++++++++
docs/CONTEXT-AWARENESS.md | 182 ++++-------
docs/MANAGEMENT.md | 26 +-
docs/ONBOARDING.md | 173 +++++-----
docs/QA_WORKFLOW.md | 60 ++--
docs/ROADMAP.md | 25 +-
docs/TESTING.md | 13 +-
docs/TOOLS.md | 361 +++++++++++++++++++++
11 files changed, 1388 insertions(+), 897 deletions(-)
delete mode 100644 VERIFICATION.md
create mode 100644 docs/CONFIGURATION.md
create mode 100644 docs/TOOLS.md
diff --git a/README.md b/README.md
index a0196e5..5f8c3a5 100644
--- a/README.md
+++ b/README.md
@@ -2,38 +2,223 @@
-# DevClaw - Development Plugin for OpenClaw
+# DevClaw — Development Plugin for OpenClaw
**Every group chat becomes an autonomous development team.**
-Add the agent to a Telegram/WhatsApp group, point it at a GitLab/GitHub repo — that group now has an **orchestrator** managing the backlog, a **DEV** worker session writing code, and a **QA** worker session reviewing it. All autonomous. Add another group, get another team. Each project runs in complete isolation with its own task queue, workers, and session state.
+Add an agent to a Telegram/WhatsApp group, point it at a GitHub/GitLab repo — that group now has an **orchestrator** managing the backlog, a **DEV** worker writing code, and a **QA** worker reviewing it. All autonomous. Add another group, get another team. Each project runs in complete isolation with its own task queue, workers, and session state.
DevClaw is the [OpenClaw](https://openclaw.ai) plugin that makes this work.
-## Why
+## Benefits
-[OpenClaw](https://openclaw.ai) is great at giving AI agents the ability to develop software — spawn worker sessions, manage sessions, work with code. But running a real multi-project development pipeline exposes a gap: the orchestration layer between "agent can write code" and "agent reliably manages multiple projects" is brittle. Every task involves 10+ coordinated steps across GitLab labels, session state, model selection, and audit logging. Agents forget steps, corrupt state, null out session IDs they should preserve, or pick the wrong model for the job.
+### Process consistency
-DevClaw fills that gap with guardrails. It gives the orchestrator atomic tools that make it impossible to forget a label transition, lose a session reference, or skip an audit log entry. The complexity of multi-project orchestration moves from agent instructions (that LLMs follow imperfectly) into deterministic code (that runs the same way every time).
+Every task follows the same fixed pipeline — `Planning → To Do → Doing → To Test → Testing → Done` — across every project. Label transitions, state updates, session dispatch, and audit logging happen atomically inside the plugin. The orchestrator agent **cannot** skip a step, forget a label, or corrupt session state. Hundreds of lines of manual orchestration logic collapse into a single `work_start` call.
-## The idea
+### Token savings
-One orchestrator agent manages all your projects. It reads task backlogs, creates issues, decides priorities, and delegates work. For each task, DevClaw assigns a developer from your **team** — a junior, medior, or senior dev writes the code, then a QA engineer reviews it. Every Telegram/WhatsApp group is a separate project — the orchestrator keeps them completely isolated while managing them all from a single process.
+DevClaw reduces token consumption at three levels:
-DevClaw gives the orchestrator nine tools that replace hundreds of lines of manual orchestration logic. Instead of following a 10-step checklist per task (fetch issue, check labels, pick model, check for existing session, transition label, dispatch task, update state, log audit event...), it calls `task_pickup` and the plugin handles everything atomically — including session dispatch. Workers call `task_complete` themselves for atomic state updates, and can file follow-up issues via `task_create`.
+| Mechanism | How it works | Estimated savings |
+|---|---|---|
+| **Shared sessions** | Each developer level per role maintains one persistent session per project. When a medior dev finishes task A and picks up task B, the plugin reuses the existing session — no codebase re-reading. | **~40-60%** per task (~50K tokens saved per session reuse) |
+| **Tier selection** | Junior for typos (Haiku), medior for features (Sonnet), senior for architecture (Opus). The right model for the job means you're not burning Opus tokens on a CSS fix. | **~30-50%** on simple tasks vs. always using the largest model |
+| **Token-free heartbeat** | The heartbeat service runs every 60s doing health checks and queue dispatch using pure deterministic code + CLI calls. Zero LLM tokens consumed. Workers only use tokens when they actually process tasks. | **100%** savings on orchestration overhead |
-## Developer tiers
+### Project isolation and parallelization
-DevClaw uses a developer seniority model. Each tier maps to a configurable LLM model:
+Each project is fully isolated — separate task queue, separate worker state, separate sessions. No cross-project contamination. Two execution modes control parallelism:
-| Tier | Role | Default model | Assigns to |
-| ---------- | ------------------- | ----------------------------- | ------------------------------------------------- |
-| **junior** | Junior developer | `anthropic/claude-haiku-4-5` | Typos, single-file fixes, simple changes |
-| **medior** | Mid-level developer | `anthropic/claude-sonnet-4-5` | Features, bug fixes, multi-file changes |
-| **senior** | Senior developer | `anthropic/claude-opus-4-5` | Architecture, migrations, system-wide refactoring |
-| **qa** | QA engineer | `anthropic/claude-sonnet-4-5` | Code review, test validation |
+- **Project-level**: DEV and QA can work simultaneously on different tasks (parallel, default) or one role at a time (sequential)
+- **Plugin-level**: Multiple projects can have active workers at once (parallel, default) or only one project active at a time (sequential)
-Configure which model each tier uses during setup or in `openclaw.json` plugin config.
+### Continuous planning
+
+The heartbeat service runs a continuous loop: health check → queue scan → dispatch. It detects stale workers (>2 hours), auto-reverts stuck labels, and fills free worker slots — all without human intervention or agent LLM tokens. The orchestrator agent only gets involved when a decision requires judgment.
+
+### Feedback loops
+
+Three automated feedback loops keep the pipeline self-correcting:
+
+1. **Auto-chaining** — DEV "done" automatically dispatches QA. QA "fail" automatically re-dispatches DEV. No orchestrator action needed.
+2. **Stale worker watchdog** — Workers active >2 hours are auto-detected. Labels revert to queue, workers deactivated. Tasks available for retry.
+3. **Completion enforcement** — Every task message includes a mandatory `work_finish` section. Workers use `"blocked"` if stuck. Three-layer guarantee prevents tasks from getting stuck forever.
+
+### Role-based instruction prompts
+
+Workers receive customizable, project-specific instructions loaded at dispatch time:
+
+```
+workspace/projects/roles/
+├── my-webapp/
+│ ├── dev.md ← "Run npm test before committing. Deploy URL: ..."
+│ └── qa.md ← "Check OAuth flow. Verify mobile responsiveness."
+└── default/
+ ├── dev.md ← Fallback for projects without custom instructions
+ └── qa.md
+```
+
+Edit these files to inject deployment steps, test commands, acceptance criteria, or coding standards — per project, per role.
+
+### Atomic operations with rollback
+
+Every tool call wraps multiple operations (label transition + state update + session dispatch + audit log) into a single atomic action. If session dispatch fails, the label transition is rolled back. No orphaned state. No half-completed operations.
+
+### Full audit trail
+
+Every tool call automatically appends an NDJSON entry to `log/audit.log`. Query with `jq` to trace any task's full history. No manual logging required from the orchestrator.
+
+---
+
+## The model-to-role mapping
+
+DevClaw doesn't expose raw model names. You're assigning a _junior developer_ to fix a typo, not configuring `anthropic/claude-haiku-4-5`. Each developer level maps to a configurable LLM:
+
+### DEV levels
+
+| Level | Who they are | Default model | Assigns to |
+|---|---|---|---|
+| `junior` | The intern | `anthropic/claude-haiku-4-5` | Typos, single-file fixes, CSS changes |
+| `medior` | The reliable mid-level | `anthropic/claude-sonnet-4-5` | Features, bug fixes, multi-file changes |
+| `senior` | The architect | `anthropic/claude-opus-4-5` | Architecture, migrations, system-wide refactoring |
+
+### QA levels
+
+| Level | Who they are | Default model | Assigns to |
+|---|---|---|---|
+| `reviewer` | The code reviewer | `anthropic/claude-sonnet-4-5` | Code review, test validation, PR inspection |
+| `tester` | The QA tester | `anthropic/claude-haiku-4-5` | Manual testing, smoke tests |
+
+The orchestrator LLM evaluates each issue and picks the appropriate level. A keyword-based heuristic in `model-selector.ts` serves as fallback when the orchestrator omits the level. Override which model powers each level in [`openclaw.json`](docs/CONFIGURATION.md#model-tiers).
+
+---
+
+## Task workflow
+
+Every task (issue) moves through a fixed pipeline of label states. DevClaw tools handle every transition atomically.
+
+```mermaid
+stateDiagram-v2
+ [*] --> Planning
+ Planning --> ToDo: Ready for development
+
+ ToDo --> Doing: work_start (DEV) ⇄ blocked
+ Doing --> ToTest: work_finish (DEV done)
+
+ ToTest --> Testing: work_start (QA) / auto-chain ⇄ blocked
+ Testing --> Done: work_finish (QA pass)
+ Testing --> ToImprove: work_finish (QA fail)
+ Testing --> Refining: work_finish (QA refine)
+
+ ToImprove --> Doing: work_start (DEV fix) or auto-chain
+ Refining --> ToDo: Human decision
+
+ Done --> [*]
+```
+
+### The eight state labels
+
+| Label | Color | Meaning |
+|---|---|---|
+| **Planning** | Blue-grey | Pre-work review — issue exists but not ready for development |
+| **To Do** | Blue | Ready for DEV pickup |
+| **Doing** | Orange | DEV actively working |
+| **To Test** | Cyan | Ready for QA pickup |
+| **Testing** | Purple | QA actively reviewing |
+| **Done** | Green | Complete — issue closed |
+| **To Improve** | Red | QA failed — back to DEV |
+| **Refining** | Yellow | Awaiting human decision |
+
+### Worker self-reporting
+
+Workers call `work_finish` directly when they're done — no orchestrator involvement needed for the state transition. Workers can also call `task_create` to file follow-up issues they discover during work.
+
+### Auto-chaining
+
+When a project has auto-chaining enabled:
+
+- **DEV "done"** → QA is dispatched immediately (using the reviewer level)
+- **QA "fail"** → DEV fix is dispatched immediately (reuses previous DEV level)
+- **QA "pass" / "refine" / "blocked"** → no chaining (pipeline done, needs human input, or returned to queue)
+- **DEV "blocked"** → no chaining (returned to queue for retry)
+
+### Completion enforcement
+
+Three layers guarantee tasks never get stuck:
+
+1. **Completion contract** — Every task message includes a mandatory section requiring `work_finish`, even on failure. Workers use `"blocked"` if stuck.
+2. **Blocked result** — Both DEV and QA can gracefully put a task back in queue (`Doing → To Do`, `Testing → To Test`).
+3. **Stale worker watchdog** — Heartbeat detects workers active >2 hours and auto-reverts labels to queue.
+
+---
+
+## Installation
+
+### Requirements
+
+| Requirement | Why | Verify |
+|---|---|---|
+| [OpenClaw](https://openclaw.ai) | DevClaw is an OpenClaw plugin | `openclaw --version` |
+| Node.js >= 20 | Plugin runtime | `node --version` |
+| [`gh`](https://cli.github.com) or [`glab`](https://gitlab.com/gitlab-org/cli) CLI | Issue tracker provider (auto-detected from git remote) | `gh --version` / `glab --version` |
+| CLI authenticated | Plugin calls gh/glab for every label transition | `gh auth status` / `glab auth status` |
+
+### Install the plugin
+
+```bash
+cp -r devclaw ~/.openclaw/extensions/
+```
+
+Verify:
+
+```bash
+openclaw plugins list
+# Should show: DevClaw | devclaw | loaded
+```
+
+### Run setup
+
+Three options — pick one:
+
+**Option A: Conversational onboarding (recommended)**
+
+Call the `onboard` tool from any agent with DevClaw loaded. It walks through configuration step by step.
+
+**Option B: CLI wizard**
+
+```bash
+openclaw devclaw setup
+```
+
+**Option C: Non-interactive CLI**
+
+```bash
+openclaw devclaw setup --new-agent "My Orchestrator"
+```
+
+Setup creates an agent, configures model tiers, writes workspace files (AGENTS.md, HEARTBEAT.md, role templates), and optionally binds a messaging channel.
+
+### Register a project
+
+In the Telegram/WhatsApp group for the project:
+
+> "Register project my-app at ~/git/my-app with base branch main"
+
+The agent calls `project_register`, which atomically creates all 8 state labels, scaffolds role instruction files, and adds the project to `projects.json`.
+
+### Start working
+
+```
+"Check the queue" → agent calls status
+"Pick up issue #1 for DEV" → agent calls work_start
+[DEV works autonomously] → calls work_finish when done
+[Heartbeat fills next slot] → QA dispatched automatically
+```
+
+See the [Onboarding Guide](docs/ONBOARDING.md) for detailed step-by-step instructions.
+
+---
## How it works
@@ -41,429 +226,114 @@ Configure which model each tier uses during setup or in `openclaw.json` plugin c
graph TB
subgraph "Group Chat A"
direction TB
- A_O["🎯 Orchestrator"]
- A_GL[GitLab Issues]
- A_DEV["🔧 DEV (worker session)"]
- A_QA["🔍 QA (worker session)"]
- A_O -->|task_pickup| A_GL
- A_O -->|task_pickup dispatches| A_DEV
- A_O -->|task_pickup dispatches| A_QA
+ A_O["Orchestrator"]
+ A_GL[GitHub/GitLab Issues]
+ A_DEV["DEV (worker session)"]
+ A_QA["QA (worker session)"]
+ A_O -->|work_start| A_GL
+ A_O -->|dispatches| A_DEV
+ A_O -->|dispatches| A_QA
end
subgraph "Group Chat B"
direction TB
- B_O["🎯 Orchestrator"]
- B_GL[GitLab Issues]
- B_DEV["🔧 DEV (worker session)"]
- B_QA["🔍 QA (worker session)"]
- B_O -->|task_pickup| B_GL
- B_O -->|task_pickup dispatches| B_DEV
- B_O -->|task_pickup dispatches| B_QA
- end
-
- subgraph "Group Chat C"
- direction TB
- C_O["🎯 Orchestrator"]
- C_GL[GitLab Issues]
- C_DEV["🔧 DEV (worker session)"]
- C_QA["🔍 QA (worker session)"]
- C_O -->|task_pickup| C_GL
- C_O -->|task_pickup dispatches| C_DEV
- C_O -->|task_pickup dispatches| C_QA
+ B_O["Orchestrator"]
+ B_GL[GitHub/GitLab Issues]
+ B_DEV["DEV (worker session)"]
+ B_QA["QA (worker session)"]
+ B_O -->|work_start| B_GL
+ B_O -->|dispatches| B_DEV
+ B_O -->|dispatches| B_QA
end
AGENT["Single OpenClaw Agent"]
AGENT --- A_O
AGENT --- B_O
- AGENT --- C_O
```
-It's the same agent process — but each group chat gives it a different project context. The orchestrator role, the workers, the task queue, and all state are fully isolated per group.
+Same agent process — each group chat gives it a different project context. The orchestrator role, the workers, the task queue, and all state are fully isolated per group.
-## Task lifecycle
-
-Every task (GitLab issue) moves through a fixed pipeline of label states. Issues are created by the orchestrator agent or by worker sessions — not manually. DevClaw tools handle every transition atomically — label change, state update, audit log, and session management in a single call.
-
-```mermaid
-stateDiagram-v2
- [*] --> Planning
- Planning --> ToDo: Ready for development
-
- ToDo --> Doing: task_pickup (DEV) ⇄ blocked
- Doing --> ToTest: task_complete (DEV done)
-
- ToTest --> Testing: task_pickup (QA) / auto-chain ⇄ blocked
- Testing --> Done: task_complete (QA pass)
- Testing --> ToImprove: task_complete (QA fail)
- Testing --> Refining: task_complete (QA refine)
-
- ToImprove --> Doing: task_pickup (DEV fix) or auto-chain
- Refining --> ToDo: Human decision
-
- Done --> [*]
-```
-
-### Worker self-reporting
-
-Workers (DEV/QA sub-agent sessions) call `task_complete` directly when they finish — no orchestrator involvement needed for the state transition. Workers can also call `task_create` to file follow-up issues they discover during work.
-
-### Completion enforcement
-
-Three layers guarantee that `task_complete` always runs, preventing tasks from getting stuck in "Doing" or "Testing" forever:
-
-1. **Completion contract** — Every task message includes a mandatory section requiring the worker to call `task_complete`, even on failure. Workers use `"blocked"` if stuck.
-2. **Blocked result** — Both DEV and QA can return `"blocked"` to gracefully put a task back in queue (`Doing → To Do`, `Testing → To Test`) instead of silently dying.
-3. **Stale worker watchdog** — The heartbeat health check detects workers active >2 hours and auto-reverts labels to queue, catching sessions that crashed or ran out of context.
-
-### Auto-chaining
-
-When a project has `autoChain: true`, `task_complete` automatically dispatches the next step:
-
-- **DEV "done"** → QA is dispatched immediately (using the qa tier)
-- **QA "fail"** → DEV fix is dispatched immediately (reuses previous DEV tier)
-- **QA "pass" / "refine" / "blocked"** → no chaining (pipeline done, needs human input, or returned to queue)
-- **DEV "blocked"** → no chaining (returned to queue for retry)
-
-When `autoChain` is false, `task_complete` returns a `nextAction` hint for the orchestrator to act on.
+---
## Session reuse
-Worker sessions are expensive to start — each new spawn requires the session to read the full codebase (~50K tokens). DevClaw maintains **separate sessions per tier per role** (session-per-tier design). When a medior dev finishes task A and picks up task B on the same project, the plugin detects the existing session and sends the task directly — no new session needed.
+Worker sessions are expensive to start — each new spawn reads the full codebase (~50K tokens). DevClaw maintains **separate sessions per level per role** (session-per-level design). When a medior dev finishes task A and picks up task B on the same project, the plugin detects the existing session and sends the task directly.
-The plugin handles session dispatch internally via OpenClaw CLI. The orchestrator agent never calls `sessions_spawn` or `sessions_send` — it just calls `task_pickup` and the plugin does the rest.
+The plugin handles session dispatch internally via OpenClaw CLI. The orchestrator agent never calls `sessions_spawn` or `sessions_send` — it calls `work_start` and the plugin does the rest.
```mermaid
sequenceDiagram
participant O as Orchestrator
participant DC as DevClaw Plugin
- participant GL as GitLab
+ participant IT as Issue Tracker
participant S as Worker Session
- O->>DC: task_pickup({ issueId: 42, role: "dev" })
- DC->>GL: Fetch issue, verify label
- DC->>DC: Assign tier (junior/medior/senior)
- DC->>DC: Check existing session for assigned tier
- DC->>GL: Transition label (To Do → Doing)
+ O->>DC: work_start({ issueId: 42, role: "dev" })
+ DC->>IT: Fetch issue, verify label
+ DC->>DC: Assign level (junior/medior/senior)
+ DC->>DC: Check existing session for assigned level
+ DC->>IT: Transition label (To Do → Doing)
DC->>S: Dispatch task via CLI (create or reuse session)
DC->>DC: Update projects.json, write audit log
- DC-->>O: { success: true, announcement: "🔧 DEV (medior) picking up #42" }
+ DC-->>O: { success: true, announcement: "..." }
```
-## Developer assignment
-
-The orchestrator LLM evaluates each issue's title, description, and labels to assign the appropriate developer tier, then passes it to `task_pickup` via the `model` parameter. This gives the LLM full context for the decision — it can weigh factors like codebase familiarity, task dependencies, and recent failure history that keyword matching would miss.
-
-The keyword heuristic in `model-selector.ts` serves as a **fallback only**, used when the orchestrator omits the `model` parameter.
-
-| Tier | Role | When |
-| ------ | ------------------- | ----------------------------------------------------------- |
-| junior | Junior developer | Typos, CSS, renames, copy changes |
-| medior | Mid-level developer | Features, bug fixes, multi-file changes |
-| senior | Senior developer | Architecture, migrations, security, system-wide refactoring |
-| qa | QA engineer | All QA tasks (code review, test validation) |
-
-## State management
-
-All project state lives in a single `projects/projects.json` file in the orchestrator's workspace, keyed by Telegram group ID:
-
-```json
-{
- "projects": {
- "-1234567890": {
- "name": "my-webapp",
- "repo": "~/git/my-webapp",
- "groupName": "Dev - My Webapp",
- "baseBranch": "development",
- "autoChain": true,
- "dev": {
- "active": false,
- "issueId": null,
- "model": "medior",
- "sessions": {
- "junior": "agent:orchestrator:subagent:a9e4d078-...",
- "medior": "agent:orchestrator:subagent:b3f5c912-...",
- "senior": null
- }
- },
- "qa": {
- "active": false,
- "issueId": null,
- "model": "qa",
- "sessions": {
- "qa": "agent:orchestrator:subagent:18707821-..."
- }
- }
- }
- }
-}
-```
-
-Key design decisions:
-
-- **Session-per-tier** — each tier gets its own worker session, accumulating context independently. Tier selection maps directly to a session key.
-- **Sessions preserved on completion** — when a worker completes a task, `sessions` map is **preserved** (only `active` and `issueId` are cleared). This enables session reuse on the next pickup.
-- **Plugin-controlled dispatch** — the plugin creates and dispatches to sessions via OpenClaw CLI (`sessions.patch` + `openclaw agent`). The orchestrator agent never calls `sessions_spawn` or `sessions_send`.
-- **Sessions persist indefinitely** — no auto-cleanup. `session_health` handles manual cleanup when needed.
-
-All writes go through atomic temp-file-then-rename to prevent corruption.
+---
## Tools
-### `devclaw_setup`
+DevClaw registers **11 tools**, grouped by function:
-Set up DevClaw in an agent's workspace. Creates AGENTS.md, HEARTBEAT.md, role templates, and configures models. Can optionally create a new agent.
+### Worker lifecycle
-**Parameters:**
+| Tool | Description |
+|---|---|
+| [`work_start`](docs/TOOLS.md#work_start) | Pick up a task — handles level assignment, label transition, session dispatch, audit |
+| [`work_finish`](docs/TOOLS.md#work_finish) | Complete a task — handles label transition, state update, auto-chaining, queue tick |
-- `newAgentName` (string, optional) — Create a new agent with this name
-- `models` (object, optional) — Model overrides per tier: `{ junior, medior, senior, qa }`
+### Task management
-### `task_pickup`
+| Tool | Description |
+|---|---|
+| [`task_create`](docs/TOOLS.md#task_create) | Create a new issue in the tracker |
+| [`task_update`](docs/TOOLS.md#task_update) | Change an issue's state label manually |
+| [`task_comment`](docs/TOOLS.md#task_comment) | Add a comment to an issue |
-Pick up a task from the issue queue for a DEV or QA worker.
+### Operations
-**Parameters:**
+| Tool | Description |
+|---|---|
+| [`status`](docs/TOOLS.md#status) | Queue counts + worker state dashboard |
+| [`health`](docs/TOOLS.md#health) | Worker health checks + zombie detection |
+| [`work_heartbeat`](docs/TOOLS.md#work_heartbeat) | Manual trigger for health + queue dispatch |
-- `issueId` (number, required) — Issue ID
-- `role` ("dev" | "qa", required) — Worker role
-- `projectGroupId` (string, required) — Telegram group ID
-- `model` (string, optional) — Developer tier (junior, medior, senior, qa). The orchestrator should evaluate the task complexity and choose. Falls back to keyword heuristic if omitted.
+### Setup
-**What it does atomically:**
+| Tool | Description |
+|---|---|
+| [`project_register`](docs/TOOLS.md#project_register) | One-time project setup (labels, prompts, state) |
+| [`setup`](docs/TOOLS.md#setup) | Agent + workspace initialization |
+| [`onboard`](docs/TOOLS.md#onboard) | Conversational onboarding guide |
-1. Resolves project from `projects.json`
-2. Validates no active worker for this role
-3. Fetches issue from issue tracker, verifies correct label state
-4. Assigns tier (LLM-chosen via `model` param, keyword heuristic fallback)
-5. Loads prompt instructions from `projects/prompts//.md`
-6. Looks up existing session for assigned tier (session-per-tier)
-7. Transitions label (e.g. `To Do` → `Doing`)
-8. Creates session via Gateway RPC if new (`sessions.patch`)
-9. Dispatches task to worker session via CLI (`openclaw agent`) with role instructions appended
-10. Updates `projects.json` state (active, issueId, tier, session key)
-11. Writes audit log entry
-12. Returns announcement text for the orchestrator to post
+See the [Tools Reference](docs/TOOLS.md) for full parameters and usage.
-### `task_complete`
+---
-Complete a task with a result. Called by workers (DEV/QA sub-agent sessions) directly, or by the orchestrator.
+## Documentation
-**Parameters:**
+| Document | Description |
+|---|---|
+| [Architecture](docs/ARCHITECTURE.md) | System design, session-per-level model, data flow, component interactions |
+| [Tools Reference](docs/TOOLS.md) | Complete reference for all 11 tools with parameters and examples |
+| [Configuration](docs/CONFIGURATION.md) | Full config reference — `openclaw.json`, `projects.json`, heartbeat, notifications |
+| [Onboarding Guide](docs/ONBOARDING.md) | Step-by-step setup: install, configure, register projects, test the pipeline |
+| [QA Workflow](docs/QA_WORKFLOW.md) | QA process: review documentation, comment templates, enforcement |
+| [Context Awareness](docs/CONTEXT-AWARENESS.md) | How DevClaw adapts behavior based on interaction context |
+| [Testing Guide](docs/TESTING.md) | Automated test suite: scenarios, fixtures, CI/CD integration |
+| [Management Theory](docs/MANAGEMENT.md) | The delegation theory behind DevClaw's design |
+| [Roadmap](docs/ROADMAP.md) | Planned features: configurable roles, channel-agnostic groups, Jira |
-- `role` ("dev" | "qa", required)
-- `result` ("done" | "pass" | "fail" | "refine" | "blocked", required)
-- `projectGroupId` (string, required)
-- `summary` (string, optional) — For the Telegram announcement
-
-**Results:**
-
-- **DEV "done"** — Pulls latest code, moves label `Doing` → `To Test`, deactivates worker. If `autoChain` enabled, automatically dispatches QA.
-- **DEV "blocked"** — Moves label `Doing` → `To Do`, deactivates worker. Task returns to queue for retry.
-- **QA "pass"** — Moves label `Testing` → `Done`, closes issue, deactivates worker
-- **QA "fail"** — Moves label `Testing` → `To Improve`, reopens issue. If `autoChain` enabled, automatically dispatches DEV fix (reuses previous DEV tier).
-- **QA "refine"** — Moves label `Testing` → `Refining`, awaits human decision
-- **QA "blocked"** — Moves label `Testing` → `To Test`, deactivates worker. Task returns to QA queue for retry.
-
-### `task_update`
-
-Change an issue's state label programmatically without going through the full pickup/complete flow.
-
-**Parameters:**
-
-- `projectGroupId` (string, required) — Telegram/WhatsApp group ID
-- `issueId` (number, required) — Issue ID to update
-- `state` (string, required) — New state label (Planning, To Do, Doing, To Test, Testing, Done, To Improve, Refining)
-- `reason` (string, optional) — Audit log reason for the change
-
-**Use cases:**
-- Manual state adjustments (e.g., Planning → To Do after approval)
-- Failed auto-transitions that need correction
-- Bulk state changes by orchestrator
-
-### `task_comment`
-
-Add a comment to an issue for feedback, notes, or discussion.
-
-**Parameters:**
-
-- `projectGroupId` (string, required) — Telegram/WhatsApp group ID
-- `issueId` (number, required) — Issue ID to comment on
-- `body` (string, required) — Comment body in markdown
-- `authorRole` ("dev" | "qa" | "orchestrator", optional) — Attribution role
-
-**Use cases:**
-- QA adds review feedback without blocking pass/fail
-- DEV posts implementation notes or progress updates
-- Orchestrator adds summary comments
-
-### `task_create`
-
-Create a new issue in the project's issue tracker. Used by workers to file follow-up bugs, or by the orchestrator to create tasks from chat.
-
-**Parameters:**
-
-- `projectGroupId` (string, required) — Telegram group ID
-- `title` (string, required) — Issue title
-- `description` (string, optional) — Full issue body in markdown
-- `label` (string, optional) — State label (defaults to "Planning")
-- `assignees` (string[], optional) — Usernames to assign
-- `pickup` (boolean, optional) — If true, immediately pick up for DEV after creation
-
-### `queue_status`
-
-Returns task queue counts and worker status across all projects (or a specific one).
-
-**Parameters:**
-
-- `projectGroupId` (string, optional) — Omit for all projects
-
-### `session_health`
-
-Detects and optionally fixes state inconsistencies.
-
-**Parameters:**
-
-- `autoFix` (boolean, optional) — Auto-fix zombies and stale state
-
-**What it does:**
-
-- Queries live sessions via Gateway RPC (`sessions.list`)
-- Cross-references with `projects.json` worker state
-
-**Checks:**
-
-- Active worker with no session key (critical, auto-fixable)
-- Active worker whose session is dead — zombie (critical, auto-fixable)
-- Worker active for >2 hours — stale watchdog (warning, auto-fixable: reverts label to queue)
-- Inactive worker with lingering issue ID (warning, auto-fixable)
-
-### `project_register`
-
-Register a new project with DevClaw. Creates all required issue tracker labels (idempotent), scaffolds role instruction files, and adds the project to `projects.json`. One-time setup per project. Auto-detects GitHub/GitLab from git remote.
-
-**Parameters:**
-
-- `projectGroupId` (string, required) — Telegram group ID (key in projects.json)
-- `name` (string, required) — Short project name
-- `repo` (string, required) — Path to git repo (e.g. `~/git/my-project`)
-- `groupName` (string, required) — Telegram group display name
-- `baseBranch` (string, required) — Base branch for development
-- `deployBranch` (string, optional) — Defaults to baseBranch
-- `deployUrl` (string, optional) — Deployment URL
-
-**What it does atomically:**
-
-1. Validates project not already registered
-2. Resolves repo path, auto-detects GitHub/GitLab, and verifies access
-3. Creates all 8 state labels (idempotent — safe to run on existing projects)
-4. Adds project entry to `projects.json` with empty worker state and `autoChain: false`
-5. Scaffolds prompt instruction files: `projects/prompts//dev.md` and `projects/prompts//qa.md`
-6. Writes audit log entry
-7. Returns announcement text
-
-## Audit logging
-
-Every tool call automatically appends an NDJSON entry to `log/audit.log`. No manual logging required from the orchestrator agent.
-
-```jsonl
-{"ts":"2026-02-08T10:30:00Z","event":"task_pickup","project":"my-webapp","issue":42,"role":"dev","tier":"medior","sessionAction":"send"}
-{"ts":"2026-02-08T10:30:01Z","event":"model_selection","issue":42,"role":"dev","tier":"medior","reason":"Standard dev task"}
-{"ts":"2026-02-08T10:45:00Z","event":"task_complete","project":"my-webapp","issue":42,"role":"dev","result":"done"}
-```
-
-## Quick start
-
-```bash
-# 1. Install the plugin
-cp -r devclaw ~/.openclaw/extensions/
-
-# 2. Run setup (interactive — creates agent, configures models, writes workspace files)
-openclaw devclaw setup
-
-# 3. Add bot to Telegram group, then register a project
-# (via the agent in Telegram)
-```
-
-See the [Onboarding Guide](docs/ONBOARDING.md) for detailed instructions.
-
-## Configuration
-
-Model tier configuration in `openclaw.json`:
-
-```json
-{
- "plugins": {
- "entries": {
- "devclaw": {
- "config": {
- "models": {
- "junior": "anthropic/claude-haiku-4-5",
- "medior": "anthropic/claude-sonnet-4-5",
- "senior": "anthropic/claude-opus-4-5",
- "qa": "anthropic/claude-sonnet-4-5"
- }
- }
- }
- }
- }
-}
-```
-
-Restrict tools to your orchestrator agent only:
-
-```json
-{
- "agents": {
- "list": [
- {
- "id": "my-orchestrator",
- "tools": {
- "allow": [
- "devclaw_setup",
- "task_pickup",
- "task_complete",
- "task_update",
- "task_comment",
- "task_create",
- "queue_status",
- "session_health",
- "project_register"
- ]
- }
- }
- ]
- }
-}
-```
-
-> DevClaw uses an `IssueProvider` interface to abstract issue tracker operations. GitLab (via `glab` CLI) and GitHub (via `gh` CLI) are supported — the provider is auto-detected from the git remote URL. Jira is planned.
-
-## Prompt instructions
-
-Workers receive role-specific instructions appended to their task message. `project_register` scaffolds editable files:
-
-```
-workspace/
-├── projects/
-│ ├── projects.json ← project state
-│ └── prompts/
-│ ├── my-webapp/ ← per-project prompts (edit to customize)
-│ │ ├── dev.md
-│ │ └── qa.md
-│ └── another-project/
-│ ├── dev.md
-│ └── qa.md
-├── log/
-│ └── audit.log ← NDJSON event log
-```
-
-`task_pickup` loads `projects/prompts//.md`. Edit these files to customize worker behavior per project — for example, adding project-specific deployment steps or test commands.
-
-## Requirements
-
-- [OpenClaw](https://openclaw.ai)
-- Node.js >= 20
-- [`glab`](https://gitlab.com/gitlab-org/cli) CLI installed and authenticated (GitLab provider), or [`gh`](https://cli.github.com) CLI (GitHub provider)
+---
## License
diff --git a/VERIFICATION.md b/VERIFICATION.md
deleted file mode 100644
index 5b5c92b..0000000
--- a/VERIFICATION.md
+++ /dev/null
@@ -1,45 +0,0 @@
-# Verification: task_create Default State
-
-## Issue #115 Request
-Change default state for new tasks from "To Do" to "Planning"
-
-## Current Implementation Status
-**Already implemented** - The default has been "Planning" since initial commit.
-
-### Code Evidence
-File: `lib/tools/task-create.ts` (line 68)
-```typescript
-const label = (params.label as StateLabel) ?? "Planning";
-```
-
-### Documentation Evidence
-File: `README.md` (line 308)
-```
-- `label` (string, optional) — State label (defaults to "Planning")
-```
-
-### Tool Description
-The tool description itself states:
-```
-The issue is created with a state label (defaults to "Planning").
-```
-
-## Timeline
-- **Feb 9, 2026** (commit 8a79755e): Initial task_create implementation with "Planning" default
-- **Feb 10, 2026**: Issue #115 created requesting this change (already done)
-
-## Verification Test
-Default behavior can be verified by calling task_create without specifying a label:
-
-```javascript
-task_create({
- projectGroupId: "-5239235162",
- title: "Test Issue"
- // label parameter omitted - should default to "Planning"
-})
-```
-
-Expected result: Issue created with "Planning" label, NOT "To Do"
-
-## Conclusion
-The requested feature is already fully implemented. No code changes needed.
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
index fa102a2..92b8251 100644
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -6,59 +6,59 @@ Understanding the OpenClaw model is key to understanding how DevClaw works:
- **Agent** — A configured entity in `openclaw.json`. Has a workspace, model, identity files (SOUL.md, IDENTITY.md), and tool permissions. Persists across restarts.
- **Session** — A runtime conversation instance. Each session has its own context window and conversation history, stored as a `.jsonl` transcript file.
-- **Sub-agent session** — A session created under the orchestrator agent for a specific worker role. NOT a separate agent — it's a child session running under the same agent, with its own isolated context. Format: `agent::subagent:`.
+- **Sub-agent session** — A session created under the orchestrator agent for a specific worker role. NOT a separate agent — it's a child session running under the same agent, with its own isolated context. Format: `agent::subagent:--`.
-### Session-per-tier design
+### Session-per-level design
-Each project maintains **separate sessions per developer tier per role**. A project's DEV might have a junior session, a medior session, and a senior session — each accumulating its own codebase context over time.
+Each project maintains **separate sessions per developer level per role**. A project's DEV might have a junior session, a medior session, and a senior session — each accumulating its own codebase context over time.
```
Orchestrator Agent (configured in openclaw.json)
└─ Main session (long-lived, handles all projects)
│
├─ Project A
- │ ├─ DEV sessions: { junior: , medior: , senior: null }
- │ └─ QA sessions: { qa: }
+ │ ├─ DEV sessions: { junior: , medior: , senior: null }
+ │ └─ QA sessions: { reviewer: , tester: null }
│
└─ Project B
- ├─ DEV sessions: { junior: null, medior: , senior: null }
- └─ QA sessions: { qa: }
+ ├─ DEV sessions: { junior: null, medior: , senior: null }
+ └─ QA sessions: { reviewer: , tester: null }
```
-Why per-tier instead of switching models on one session:
+Why per-level instead of switching models on one session:
- **No model switching overhead** — each session always uses the same model
- **Accumulated context** — a junior session that's done 20 typo fixes knows the project well; a medior session that's done 5 features knows it differently
- **No cross-model confusion** — conversation history stays with the model that generated it
-- **Deterministic reuse** — tier selection directly maps to a session key, no patching needed
+- **Deterministic reuse** — level selection directly maps to a session key, no patching needed
### Plugin-controlled session lifecycle
DevClaw controls the **full** session lifecycle end-to-end. The orchestrator agent never calls `sessions_spawn` or `sessions_send` — the plugin handles session creation and task dispatch internally using the OpenClaw CLI:
```
-Plugin dispatch (inside task_pickup):
- 1. Assign tier, look up session, decide spawn vs send
+Plugin dispatch (inside work_start):
+ 1. Assign level, look up session, decide spawn vs send
2. New session: openclaw gateway call sessions.patch → create entry + set model
- openclaw agent --session-id --message "task..."
- 3. Existing: openclaw agent --session-id --message "task..."
+ openclaw gateway call agent → dispatch task
+ 3. Existing: openclaw gateway call agent → dispatch task to existing session
4. Return result to orchestrator (announcement text, no session instructions)
```
-The agent's only job after `task_pickup` returns is to post the announcement to Telegram. Everything else — tier assignment, session creation, task dispatch, state update, audit logging — is deterministic plugin code.
+The agent's only job after `work_start` returns is to post the announcement to Telegram. Everything else — level assignment, session creation, task dispatch, state update, audit logging — is deterministic plugin code.
**Why this matters:** Previously the plugin returned instructions like `{ sessionAction: "spawn", model: "sonnet" }` and the agent had to correctly call `sessions_spawn` with the right params. This was the fragile handoff point where agents would forget `cleanup: "keep"`, use wrong models, or corrupt session state. Moving dispatch into the plugin eliminates that entire class of errors.
-**Session persistence:** Sessions created via `sessions.patch` persist indefinitely (no auto-cleanup). The plugin manages lifecycle explicitly through `session_health`.
+**Session persistence:** Sessions created via `sessions.patch` persist indefinitely (no auto-cleanup). The plugin manages lifecycle explicitly through the `health` tool.
**What we trade off vs. registered sub-agents:**
| Feature | Sub-agent system | Plugin-controlled | DevClaw equivalent |
|---|---|---|---|
| Auto-reporting | Sub-agent reports to parent | No | Heartbeat polls for completion |
-| Concurrency control | `maxConcurrent` | No | `task_pickup` checks `active` flag |
+| Concurrency control | `maxConcurrent` | No | `work_start` checks `active` flag |
| Lifecycle tracking | Parent-child registry | No | `projects.json` tracks all sessions |
-| Timeout detection | `runTimeoutSeconds` | No | `session_health` flags stale >2h |
-| Cleanup | Auto-archive | No | `session_health` manual cleanup |
+| Timeout detection | `runTimeoutSeconds` | No | `health` flags stale >2h |
+| Cleanup | Auto-archive | No | `health` manual cleanup |
DevClaw provides equivalent guardrails for everything except auto-reporting, which the heartbeat handles.
@@ -74,22 +74,22 @@ graph TB
subgraph "OpenClaw Runtime"
MS[Main Session
orchestrator agent]
GW[Gateway RPC
sessions.patch / sessions.list]
- CLI[openclaw agent CLI]
+ CLI[openclaw gateway call agent]
DEV_J[DEV session
junior]
DEV_M[DEV session
medior]
DEV_S[DEV session
senior]
- QA_E[QA session
qa]
+ QA_R[QA session
reviewer]
end
subgraph "DevClaw Plugin"
- TP[task_pickup]
- TC[task_complete]
+ WS[work_start]
+ WF[work_finish]
TCR[task_create]
- QS[queue_status]
- SH[session_health]
+ ST[status]
+ SH[health]
PR[project_register]
- DS[devclaw_setup]
- TIER[Tier Resolver]
+ DS[setup]
+ TIER[Level Resolver]
PJ[projects.json]
AL[audit.log]
end
@@ -103,34 +103,34 @@ graph TB
TG -->|delivers| MS
MS -->|announces| TG
- MS -->|calls| TP
- MS -->|calls| TC
+ MS -->|calls| WS
+ MS -->|calls| WF
MS -->|calls| TCR
- MS -->|calls| QS
+ MS -->|calls| ST
MS -->|calls| SH
MS -->|calls| PR
MS -->|calls| DS
- TP -->|resolves tier| TIER
- TP -->|transitions labels| GL
- TP -->|reads/writes| PJ
- TP -->|appends| AL
- TP -->|creates session| GW
- TP -->|dispatches task| CLI
+ WS -->|resolves level| TIER
+ WS -->|transitions labels| GL
+ WS -->|reads/writes| PJ
+ WS -->|appends| AL
+ WS -->|creates session| GW
+ WS -->|dispatches task| CLI
- TC -->|transitions labels| GL
- TC -->|closes/reopens| GL
- TC -->|reads/writes| PJ
- TC -->|git pull| REPO
- TC -->|auto-chain dispatch| CLI
- TC -->|appends| AL
+ WF -->|transitions labels| GL
+ WF -->|closes/reopens| GL
+ WF -->|reads/writes| PJ
+ WF -->|git pull| REPO
+ WF -->|auto-chain dispatch| CLI
+ WF -->|appends| AL
TCR -->|creates issue| GL
TCR -->|appends| AL
- QS -->|lists issues by label| GL
- QS -->|reads| PJ
- QS -->|appends| AL
+ ST -->|lists issues by label| GL
+ ST -->|reads| PJ
+ ST -->|appends| AL
SH -->|reads/writes| PJ
SH -->|checks sessions| GW
@@ -144,12 +144,12 @@ graph TB
CLI -->|sends task| DEV_J
CLI -->|sends task| DEV_M
CLI -->|sends task| DEV_S
- CLI -->|sends task| QA_E
+ CLI -->|sends task| QA_R
DEV_J -->|writes code, creates MRs| REPO
DEV_M -->|writes code, creates MRs| REPO
DEV_S -->|writes code, creates MRs| REPO
- QA_E -->|reviews code, tests| REPO
+ QA_R -->|reviews code, tests| REPO
```
## End-to-end flow: human to sub-agent
@@ -163,7 +163,7 @@ sequenceDiagram
participant MS as Main Session
(orchestrator)
participant DC as DevClaw Plugin
participant GW as Gateway RPC
- participant CLI as openclaw agent CLI
+ participant CLI as openclaw gateway call agent
participant DEV as DEV Session
(medior)
participant GL as Issue Tracker
@@ -171,34 +171,34 @@ sequenceDiagram
H->>TG: "check status" (or heartbeat triggers)
TG->>MS: delivers message
- MS->>DC: queue_status()
- DC->>GL: glab issue list --label "To Do"
+ MS->>DC: status()
+ DC->>GL: list issues by label "To Do"
DC-->>MS: { toDo: [#42], dev: idle }
Note over MS: Decides to pick up #42 for DEV as medior
- MS->>DC: task_pickup({ issueId: 42, role: "dev", model: "medior", ... })
- DC->>DC: resolve tier "medior" → model ID
+ MS->>DC: work_start({ issueId: 42, role: "dev", level: "medior", ... })
+ DC->>DC: resolve level "medior" → model ID
DC->>DC: lookup dev.sessions.medior → null (first time)
- DC->>GL: glab issue update 42 --unlabel "To Do" --label "Doing"
+ DC->>GL: transition label "To Do" → "Doing"
DC->>GW: sessions.patch({ key: new-session-key, model: "anthropic/claude-sonnet-4-5" })
- DC->>CLI: openclaw agent --session-id --message "Build login page for #42..."
+ DC->>CLI: openclaw gateway call agent --params { sessionKey, message }
CLI->>DEV: creates session, delivers task
DC->>DC: store session key in projects.json + append audit.log
- DC-->>MS: { success: true, announcement: "🔧 DEV (medior) picking up #42" }
+ DC-->>MS: { success: true, announcement: "🔧 Spawning DEV (medior) for #42" }
- MS->>TG: "🔧 DEV (medior) picking up #42: Add login page"
+ MS->>TG: "🔧 Spawning DEV (medior) for #42: Add login page"
TG->>H: sees announcement
Note over DEV: Works autonomously — reads code, writes code, creates MR
- Note over DEV: Calls task_complete when done
+ Note over DEV: Calls work_finish when done
- DEV->>DC: task_complete({ role: "dev", result: "done", ... })
- DC->>GL: glab issue update 42 --unlabel "Doing" --label "To Test"
+ DEV->>DC: work_finish({ role: "dev", result: "done", ... })
+ DC->>GL: transition label "Doing" → "To Test"
DC->>DC: deactivate worker (sessions preserved)
- DC-->>DEV: { announcement: "✅ DEV done #42" }
+ DC-->>DEV: { announcement: "✅ DEV DONE #42" }
- MS->>TG: "✅ DEV done #42 — moved to QA queue"
+ MS->>TG: "✅ DEV DONE #42 — moved to QA queue"
TG->>H: sees announcement
```
@@ -208,16 +208,16 @@ On the **next DEV task** for this project that also assigns medior:
sequenceDiagram
participant MS as Main Session
participant DC as DevClaw Plugin
- participant CLI as openclaw agent CLI
+ participant CLI as openclaw gateway call agent
participant DEV as DEV Session
(medior, existing)
- MS->>DC: task_pickup({ issueId: 57, role: "dev", model: "medior", ... })
- DC->>DC: resolve tier "medior" → model ID
+ MS->>DC: work_start({ issueId: 57, role: "dev", level: "medior", ... })
+ DC->>DC: resolve level "medior" → model ID
DC->>DC: lookup dev.sessions.medior → existing key!
Note over DC: No sessions.patch needed — session already exists
- DC->>CLI: openclaw agent --session-id --message "Fix validation for #57..."
+ DC->>CLI: openclaw gateway call agent --params { sessionKey, message }
CLI->>DEV: delivers task to existing session (has full codebase context)
- DC-->>MS: { success: true, announcement: "⚡ DEV (medior) picking up #57" }
+ DC-->>MS: { success: true, announcement: "⚡ Sending DEV (medior) for #57" }
```
Session reuse saves ~50K tokens per task by not re-reading the codebase.
@@ -228,118 +228,118 @@ This traces a single issue from creation to completion, showing every component
### Phase 1: Issue created
-Issues are created by the orchestrator agent or by sub-agent sessions via `glab`. The orchestrator can create issues based on user requests in Telegram, backlog planning, or QA feedback. Sub-agents can also create issues when they discover bugs or related work during development.
+Issues are created by the orchestrator agent or by sub-agent sessions via `task_create` or directly via `gh`/`glab`. The orchestrator can create issues based on user requests in Telegram, backlog planning, or QA feedback. Sub-agents can also create issues when they discover bugs during development.
```
-Orchestrator Agent → Issue Tracker: creates issue #42 with label "To Do"
+Orchestrator Agent → Issue Tracker: creates issue #42 with label "Planning"
```
-**State:** Issue tracker has issue #42 labeled "To Do". Nothing in DevClaw yet.
+**State:** Issue tracker has issue #42 labeled "Planning". Nothing in DevClaw yet.
### Phase 2: Heartbeat detects work
```
-Heartbeat triggers → Orchestrator calls queue_status()
+Heartbeat triggers → Orchestrator calls status()
```
```mermaid
sequenceDiagram
participant A as Orchestrator
- participant QS as queue_status
+ participant QS as status
participant GL as Issue Tracker
participant PJ as projects.json
participant AL as audit.log
- A->>QS: queue_status({ projectGroupId: "-123" })
+ A->>QS: status({ projectGroupId: "-123" })
QS->>PJ: readProjects()
PJ-->>QS: { dev: idle, qa: idle }
- QS->>GL: glab issue list --label "To Do"
+ QS->>GL: list issues by label "To Do"
GL-->>QS: [{ id: 42, title: "Add login page" }]
- QS->>GL: glab issue list --label "To Test"
+ QS->>GL: list issues by label "To Test"
GL-->>QS: []
- QS->>GL: glab issue list --label "To Improve"
+ QS->>GL: list issues by label "To Improve"
GL-->>QS: []
- QS->>AL: append { event: "queue_status", ... }
+ QS->>AL: append { event: "status", ... }
QS-->>A: { dev: idle, queue: { toDo: [#42] } }
```
-**Orchestrator decides:** DEV is idle, issue #42 is in To Do → pick it up. Evaluates complexity → assigns medior tier.
+**Orchestrator decides:** DEV is idle, issue #42 is in To Do → pick it up. Evaluates complexity → assigns medior level.
### Phase 3: DEV pickup
-The plugin handles everything end-to-end — tier resolution, session lookup, label transition, state update, **and** task dispatch to the worker session. The agent's only job after is to post the announcement.
+The plugin handles everything end-to-end — level resolution, session lookup, label transition, state update, **and** task dispatch to the worker session. The agent's only job after is to post the announcement.
```mermaid
sequenceDiagram
participant A as Orchestrator
- participant TP as task_pickup
+ participant WS as work_start
participant GL as Issue Tracker
- participant TIER as Tier Resolver
+ participant TIER as Level Resolver
participant GW as Gateway RPC
- participant CLI as openclaw agent CLI
+ participant CLI as openclaw gateway call agent
participant PJ as projects.json
participant AL as audit.log
- A->>TP: task_pickup({ issueId: 42, role: "dev", projectGroupId: "-123", model: "medior" })
- TP->>PJ: readProjects()
- TP->>GL: glab issue view 42 --output json
- GL-->>TP: { title: "Add login page", labels: ["To Do"] }
- TP->>TP: Verify label is "To Do" ✓
- TP->>TIER: resolve "medior" → "anthropic/claude-sonnet-4-5"
- TP->>PJ: lookup dev.sessions.medior
- TP->>GL: glab issue update 42 --unlabel "To Do" --label "Doing"
+ A->>WS: work_start({ issueId: 42, role: "dev", projectGroupId: "-123", level: "medior" })
+ WS->>PJ: readProjects()
+ WS->>GL: getIssue(42)
+ GL-->>WS: { title: "Add login page", labels: ["To Do"] }
+ WS->>WS: Verify label is "To Do"
+ WS->>TIER: resolve "medior" → "anthropic/claude-sonnet-4-5"
+ WS->>PJ: lookup dev.sessions.medior
+ WS->>GL: transitionLabel(42, "To Do", "Doing")
alt New session
- TP->>GW: sessions.patch({ key: new-key, model: "anthropic/claude-sonnet-4-5" })
+ WS->>GW: sessions.patch({ key: new-key, model: "anthropic/claude-sonnet-4-5" })
end
- TP->>CLI: openclaw agent --session-id --message "task..."
- TP->>PJ: activateWorker + store session key
- TP->>AL: append task_pickup + model_selection
- TP-->>A: { success: true, announcement: "🔧 ..." }
+ WS->>CLI: openclaw gateway call agent --params { sessionKey, message }
+ WS->>PJ: activateWorker + store session key
+ WS->>AL: append work_start + model_selection
+ WS-->>A: { success: true, announcement: "🔧 ..." }
```
**Writes:**
- `Issue Tracker`: label "To Do" → "Doing"
-- `projects.json`: dev.active=true, dev.issueId="42", dev.model="medior", dev.sessions.medior=key
-- `audit.log`: 2 entries (task_pickup, model_selection)
+- `projects.json`: dev.active=true, dev.issueId="42", dev.level="medior", dev.sessions.medior=key
+- `audit.log`: 2 entries (work_start, model_selection)
- `Session`: task message delivered to worker session via CLI
### Phase 4: DEV works
```
DEV sub-agent session → reads codebase, writes code, creates MR
-DEV sub-agent session → calls task_complete({ role: "dev", result: "done", ... })
+DEV sub-agent session → calls work_finish({ role: "dev", result: "done", ... })
```
-This happens inside the OpenClaw session. The worker calls `task_complete` directly for atomic state updates. If the worker discovers unrelated bugs, it calls `task_create` to file them.
+This happens inside the OpenClaw session. The worker calls `work_finish` directly for atomic state updates. If the worker discovers unrelated bugs, it calls `task_create` to file them.
### Phase 5: DEV complete (worker self-reports)
```mermaid
sequenceDiagram
participant DEV as DEV Session
- participant TC as task_complete
+ participant WF as work_finish
participant GL as Issue Tracker
participant PJ as projects.json
participant AL as audit.log
participant REPO as Git Repo
participant QA as QA Session (auto-chain)
- DEV->>TC: task_complete({ role: "dev", result: "done", projectGroupId: "-123", summary: "Login page with OAuth" })
- TC->>PJ: readProjects()
- PJ-->>TC: { dev: { active: true, issueId: "42" } }
- TC->>REPO: git pull
- TC->>PJ: deactivateWorker(-123, dev)
+ DEV->>WF: work_finish({ role: "dev", result: "done", projectGroupId: "-123", summary: "Login page with OAuth" })
+ WF->>PJ: readProjects()
+ PJ-->>WF: { dev: { active: true, issueId: "42" } }
+ WF->>REPO: git pull
+ WF->>PJ: deactivateWorker(-123, dev)
Note over PJ: active→false, issueId→null
sessions map PRESERVED
- TC->>GL: transition label "Doing" → "To Test"
- TC->>AL: append { event: "task_complete", role: "dev", result: "done" }
+ WF->>GL: transitionLabel "Doing" → "To Test"
+ WF->>AL: append { event: "work_finish", role: "dev", result: "done" }
alt autoChain enabled
- TC->>GL: transition label "To Test" → "Testing"
- TC->>QA: dispatchTask(role: "qa", tier: "qa")
- TC->>PJ: activateWorker(-123, qa)
- TC-->>DEV: { announcement: "✅ DEV done #42", autoChain: { dispatched: true, role: "qa" } }
+ WF->>GL: transitionLabel "To Test" → "Testing"
+ WF->>QA: dispatchTask(role: "qa", level: "reviewer")
+ WF->>PJ: activateWorker(-123, qa)
+ WF-->>DEV: { announcement: "✅ DEV DONE #42", autoChain: { dispatched: true, role: "qa" } }
else autoChain disabled
- TC-->>DEV: { announcement: "✅ DEV done #42", nextAction: "qa_pickup" }
+ WF-->>DEV: { announcement: "✅ DEV DONE #42", nextAction: "qa_pickup" }
end
```
@@ -347,30 +347,30 @@ sequenceDiagram
- `Git repo`: pulled latest (has DEV's merged code)
- `projects.json`: dev.active=false, dev.issueId=null (sessions map preserved for reuse)
- `Issue Tracker`: label "Doing" → "To Test" (+ "To Test" → "Testing" if auto-chain)
-- `audit.log`: 1 entry (task_complete) + optional auto-chain entries
+- `audit.log`: 1 entry (work_finish) + optional auto-chain entries
### Phase 6: QA pickup
-Same as Phase 3, but with `role: "qa"`. Label transitions "To Test" → "Testing". Uses the qa tier.
+Same as Phase 3, but with `role: "qa"`. Label transitions "To Test" → "Testing". Uses the reviewer level.
-### Phase 7: QA result (3 possible outcomes)
+### Phase 7: QA result (4 possible outcomes)
#### 7a. QA Pass
```mermaid
sequenceDiagram
- participant A as Orchestrator
- participant TC as task_complete
+ participant QA as QA Session
+ participant WF as work_finish
participant GL as Issue Tracker
participant PJ as projects.json
participant AL as audit.log
- A->>TC: task_complete({ role: "qa", result: "pass", projectGroupId: "-123" })
- TC->>PJ: deactivateWorker(-123, qa)
- TC->>GL: glab issue update 42 --unlabel "Testing" --label "Done"
- TC->>GL: glab issue close 42
- TC->>AL: append { event: "task_complete", role: "qa", result: "pass" }
- TC-->>A: { announcement: "🎉 QA PASS #42. Issue closed." }
+ QA->>WF: work_finish({ role: "qa", result: "pass", projectGroupId: "-123" })
+ WF->>PJ: deactivateWorker(-123, qa)
+ WF->>GL: transitionLabel(42, "Testing", "Done")
+ WF->>GL: closeIssue(42)
+ WF->>AL: append { event: "work_finish", role: "qa", result: "pass" }
+ WF-->>QA: { announcement: "🎉 QA PASS #42. Issue closed." }
```
**Ticket complete.** Issue closed, label "Done".
@@ -379,18 +379,18 @@ sequenceDiagram
```mermaid
sequenceDiagram
- participant A as Orchestrator
- participant TC as task_complete
+ participant QA as QA Session
+ participant WF as work_finish
participant GL as Issue Tracker
participant PJ as projects.json
participant AL as audit.log
- A->>TC: task_complete({ role: "qa", result: "fail", projectGroupId: "-123", summary: "OAuth redirect broken" })
- TC->>PJ: deactivateWorker(-123, qa)
- TC->>GL: glab issue update 42 --unlabel "Testing" --label "To Improve"
- TC->>GL: glab issue reopen 42
- TC->>AL: append { event: "task_complete", role: "qa", result: "fail" }
- TC-->>A: { announcement: "❌ QA FAIL #42 — OAuth redirect broken. Sent back to DEV." }
+ QA->>WF: work_finish({ role: "qa", result: "fail", projectGroupId: "-123", summary: "OAuth redirect broken" })
+ WF->>PJ: deactivateWorker(-123, qa)
+ WF->>GL: transitionLabel(42, "Testing", "To Improve")
+ WF->>GL: reopenIssue(42)
+ WF->>AL: append { event: "work_finish", role: "qa", result: "fail" }
+ WF-->>QA: { announcement: "❌ QA FAIL #42 — OAuth redirect broken. Sent back to DEV." }
```
**Cycle restarts:** Issue goes to "To Improve". Next heartbeat, DEV picks it up again (Phase 3, but from "To Improve" instead of "To Do").
@@ -414,39 +414,35 @@ Worker cannot complete (missing info, environment errors, etc.). Issue returns t
### Completion enforcement
-Three layers guarantee that `task_complete` always runs:
+Three layers guarantee that `work_finish` always runs:
-1. **Completion contract** — Every task message sent to a worker session includes a mandatory `## MANDATORY: Task Completion` section listing available results and requiring `task_complete` even on failure. Workers are instructed to use `"blocked"` if stuck.
+1. **Completion contract** — Every task message sent to a worker session includes a mandatory `## MANDATORY: Task Completion` section listing available results and requiring `work_finish` even on failure. Workers are instructed to use `"blocked"` if stuck.
2. **Blocked result** — Both DEV and QA can use `"blocked"` to gracefully return a task to queue without losing work. DEV blocked: `Doing → To Do`. QA blocked: `Testing → To Test`. This gives workers an escape hatch instead of silently dying.
-3. **Stale worker watchdog** — The heartbeat's health check detects workers active for >2 hours. With `autoFix=true`, it deactivates the worker and reverts the label back to queue. This catches sessions that crashed, ran out of context, or otherwise failed without calling `task_complete`. The `session_health` tool provides the same check for manual invocation.
+3. **Stale worker watchdog** — The heartbeat's health check detects workers active for >2 hours. With `fix=true`, it deactivates the worker and reverts the label back to queue. This catches sessions that crashed, ran out of context, or otherwise failed without calling `work_finish`. The `health` tool provides the same check for manual invocation.
### Phase 8: Heartbeat (continuous)
-The heartbeat runs periodically (triggered by the agent or a scheduled message). It combines health check + queue scan:
+The heartbeat runs periodically (via background service or manual `work_heartbeat` trigger). It combines health check + queue scan:
```mermaid
sequenceDiagram
- participant A as Orchestrator
- participant SH as session_health
- participant QS as queue_status
- participant TP as task_pickup
- Note over A: Heartbeat triggered
+ participant HB as Heartbeat Service
+ participant SH as health check
+ participant TK as projectTick
+ participant WS as work_start (dispatch)
+ Note over HB: Tick triggered (every 60s)
- A->>SH: session_health({ autoFix: true })
- Note over SH: Checks sessions via Gateway RPC (sessions.list)
- SH-->>A: { healthy: true }
+ HB->>SH: checkWorkerHealth per project per role
+ Note over SH: Checks for zombies, stale workers
+ SH-->>HB: { fixes applied }
- A->>QS: queue_status()
- QS-->>A: { projects: [{ dev: idle, queue: { toDo: [#43], toTest: [#44] } }] }
-
- Note over A: DEV idle + To Do #43 → assign medior
- A->>TP: task_pickup({ issueId: 43, role: "dev", model: "medior", ... })
- Note over TP: Plugin handles everything:
tier resolve → session lookup →
label transition → dispatch task →
state update → audit log
-
- Note over A: QA idle + To Test #44 → assign qa
- A->>TP: task_pickup({ issueId: 44, role: "qa", model: "qa", ... })
+ HB->>TK: projectTick per project
+ Note over TK: Scans queue: To Improve > To Test > To Do
+ TK->>WS: dispatchTask (fill free slots)
+ WS-->>TK: { dispatched }
+ TK-->>HB: { pickups, skipped }
```
## Data flow map
@@ -455,25 +451,27 @@ Every piece of data and where it lives:
```
┌─────────────────────────────────────────────────────────────────┐
-│ Issue Tracker (source of truth for tasks) │
+│ Issue Tracker (source of truth for tasks) │
│ │
│ Issue #42: "Add login page" │
-│ Labels: [To Do | Doing | To Test | Testing | Done | ...] │
+│ Labels: [Planning | To Do | Doing | To Test | Testing | ...] │
│ State: open / closed │
│ MRs/PRs: linked merge/pull requests │
│ Created by: orchestrator (task_create), workers, or humans │
└─────────────────────────────────────────────────────────────────┘
- ↕ glab/gh CLI (read/write, auto-detected)
+ ↕ gh/glab CLI (read/write, auto-detected)
┌─────────────────────────────────────────────────────────────────┐
│ DevClaw Plugin (orchestration logic) │
│ │
-│ devclaw_setup → agent creation + workspace + model config │
-│ task_pickup → tier + label + dispatch + role instr (e2e) │
-│ task_complete → label + state + git pull + auto-chain │
-│ task_create → create issue in tracker │
-│ queue_status → read labels + read state │
-│ session_health → check sessions + fix zombies │
-│ project_register → labels + prompts + state init (one-time) │
+│ setup → agent creation + workspace + model config │
+│ work_start → level + label + dispatch + role instr (e2e) │
+│ work_finish → label + state + git pull + auto-chain │
+│ task_create → create issue in tracker │
+│ task_update → manual label state change │
+│ task_comment → add comment to issue │
+│ status → read labels + read state │
+│ health → check sessions + fix zombies │
+│ project_register → labels + prompts + state init (one-time) │
└─────────────────────────────────────────────────────────────────┘
↕ atomic file I/O ↕ OpenClaw CLI (plugin shells out)
┌────────────────────────────────┐ ┌──────────────────────────────┐
@@ -481,39 +479,40 @@ Every piece of data and where it lives:
│ │ │ (called by plugin, not agent)│
│ Per project: │ │ │
│ dev: │ │ openclaw gateway call │
-│ active, issueId, model │ │ sessions.patch → create │
+│ active, issueId, level │ │ sessions.patch → create │
│ sessions: │ │ sessions.list → health │
│ junior: │ │ sessions.delete → cleanup │
│ medior: │ │ │
-│ senior: │ │ openclaw agent │
-│ qa: │ │ --session-id │
-│ active, issueId, model │ │ --message "task..." │
+│ senior: │ │ openclaw gateway call agent │
+│ qa: │ │ --params { sessionKey, │
+│ active, issueId, level │ │ message, agentId } │
│ sessions: │ │ → dispatches to session │
-│ qa: │ │ │
+│ reviewer: │ │ │
+│ tester: │ │ │
└────────────────────────────────┘ └──────────────────────────────┘
↕ append-only
┌─────────────────────────────────────────────────────────────────┐
│ log/audit.log (observability) │
│ │
│ NDJSON, one line per event: │
-│ task_pickup, task_complete, model_selection, │
-│ queue_status, health_check, session_spawn, session_reuse, │
-│ project_register, devclaw_setup │
+│ work_start, work_finish, model_selection, │
+│ status, health, task_create, task_update, │
+│ task_comment, project_register, setup, heartbeat_tick │
│ │
-│ Query with: cat audit.log | jq 'select(.event=="task_pickup")' │
+│ Query: cat audit.log | jq 'select(.event=="work_start")' │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
-│ Telegram (user-facing messages) │
+│ Telegram / WhatsApp (user-facing messages) │
│ │
│ Per group chat: │
-│ "🔧 Spawning DEV (medior) for #42: Add login page" │
+│ "🔧 Spawning DEV (medior) for #42: Add login page" │
│ "⚡ Sending DEV (medior) for #57: Fix validation" │
-│ "✅ DEV done #42 — Login page with OAuth. Moved to QA queue."│
+│ "✅ DEV DONE #42 — Login page with OAuth." │
│ "🎉 QA PASS #42. Issue closed." │
-│ "❌ QA FAIL #42 — OAuth redirect broken. Sent back to DEV." │
-│ "🚫 DEV BLOCKED #42 — Missing dependencies. Returned to queue."│
-│ "🚫 QA BLOCKED #42 — Env not available. Returned to QA queue."│
+│ "❌ QA FAIL #42 — OAuth redirect broken." │
+│ "🚫 DEV BLOCKED #42 — Missing dependencies." │
+│ "🚫 QA BLOCKED #42 — Env not available." │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
@@ -521,7 +520,7 @@ Every piece of data and where it lives:
│ │
│ DEV sub-agent sessions: read code, write code, create MRs │
│ QA sub-agent sessions: read code, run tests, review MRs │
-│ task_complete (DEV done): git pull to sync latest │
+│ work_finish (DEV done): git pull to sync latest │
└─────────────────────────────────────────────────────────────────┘
```
@@ -553,7 +552,7 @@ graph LR
subgraph "Sub-agent sessions handle"
CR[Code writing]
MR[MR creation/review]
- TC_W[Task completion
via task_complete]
+ WF_W[Task completion
via work_finish]
BUG[Bug filing
via task_create]
end
@@ -565,20 +564,22 @@ graph LR
## IssueProvider abstraction
-All issue tracker operations go through the `IssueProvider` interface, defined in `lib/issue-provider.ts`. This abstraction allows DevClaw to support multiple issue trackers without changing tool logic.
+All issue tracker operations go through the `IssueProvider` interface, defined in `lib/providers/provider.ts`. This abstraction allows DevClaw to support multiple issue trackers without changing tool logic.
**Interface methods:**
- `ensureLabel` / `ensureAllStateLabels` — idempotent label creation
+- `createIssue` — create issue with label and assignees
- `listIssuesByLabel` / `getIssue` — issue queries
- `transitionLabel` — atomic label state transition (unlabel + label)
- `closeIssue` / `reopenIssue` — issue lifecycle
- `hasStateLabel` / `getCurrentStateLabel` — label inspection
-- `hasMergedMR` — MR/PR verification
+- `hasMergedMR` / `getMergedMRUrl` — MR/PR verification
+- `addComment` — add comment to issue
- `healthCheck` — verify provider connectivity
**Current providers:**
-- **GitLab** (`lib/providers/gitlab.ts`) — wraps `glab` CLI
- **GitHub** (`lib/providers/github.ts`) — wraps `gh` CLI
+- **GitLab** (`lib/providers/gitlab.ts`) — wraps `glab` CLI
**Planned providers:**
- **Jira** — via REST API
@@ -589,16 +590,16 @@ Provider selection is handled by `createProvider()` in `lib/providers/index.ts`.
| Failure | Detection | Recovery |
|---|---|---|
-| Session dies mid-task | `session_health` checks via `sessions.list` Gateway RPC | `autoFix`: reverts label, clears active state, removes dead session from sessions map. Next heartbeat picks up task again (creates fresh session for that tier). |
-| glab command fails | Plugin tool throws error, returns to agent | Agent retries or reports to Telegram group |
-| `openclaw agent` CLI fails | Plugin catches error during dispatch | Plugin rolls back: reverts label, clears active state. Returns error to agent for reporting. |
-| `sessions.patch` fails | Plugin catches error during session creation | Plugin rolls back label transition. Returns error. No orphaned state. |
+| Session dies mid-task | `health` checks via `sessions.list` Gateway RPC | `fix=true`: reverts label, clears active state. Next heartbeat picks up task again (creates fresh session for that level). |
+| gh/glab command fails | Plugin tool throws error, returns to agent | Agent retries or reports to Telegram group |
+| `openclaw gateway call agent` fails | Plugin catches error during dispatch | Plugin rolls back: reverts label, clears active state. Returns error. No orphaned state. |
+| `sessions.patch` fails | Plugin catches error during session creation | Plugin rolls back label transition. Returns error. |
| projects.json corrupted | Tool can't parse JSON | Manual fix needed. Atomic writes (temp+rename) prevent partial writes. |
-| Label out of sync | `task_pickup` verifies label before transitioning | Throws error if label doesn't match expected state. Agent reports mismatch. |
-| Worker already active | `task_pickup` checks `active` flag | Throws error: "DEV worker already active on project". Must complete current task first. |
-| Stale worker (>2h) | `session_health` and heartbeat health check | `autoFix`: deactivates worker, reverts label to queue (To Do / To Test). Task available for next pickup. |
-| Worker stuck/blocked | Worker calls `task_complete` with `"blocked"` | Deactivates worker, reverts label to queue. Issue available for retry. |
-| `project_register` fails | Plugin catches error during label creation or state write | Clean error returned. No partial state — labels are idempotent, projects.json not written until all labels succeed. |
+| Label out of sync | `work_start` verifies label before transitioning | Throws error if label doesn't match expected state. |
+| Worker already active | `work_start` checks `active` flag | Throws error: "DEV already active on project". Must complete current task first. |
+| Stale worker (>2h) | `health` and heartbeat health check | `fix=true`: deactivates worker, reverts label to queue. Task available for next pickup. |
+| Worker stuck/blocked | Worker calls `work_finish` with `"blocked"` | Deactivates worker, reverts label to queue. Issue available for retry. |
+| `project_register` fails | Plugin catches error during label creation or state write | Clean error returned. Labels are idempotent, projects.json not written until all labels succeed. |
## File locations
@@ -606,8 +607,9 @@ Provider selection is handled by `createProvider()` in `lib/providers/index.ts`.
|---|---|---|
| Plugin source | `~/.openclaw/extensions/devclaw/` | Plugin code |
| Plugin manifest | `~/.openclaw/extensions/devclaw/openclaw.plugin.json` | Plugin registration |
-| Agent config | `~/.openclaw/openclaw.json` | Agent definition + tool permissions + tier config |
+| Agent config | `~/.openclaw/openclaw.json` | Agent definition + tool permissions + model config |
| Worker state | `~/.openclaw/workspace-/projects/projects.json` | Per-project DEV/QA state |
+| Role instructions | `~/.openclaw/workspace-/projects/roles//` | Per-project `dev.md` and `qa.md` |
| Audit log | `~/.openclaw/workspace-/log/audit.log` | NDJSON event log |
| Session transcripts | `~/.openclaw/agents//sessions/.jsonl` | Conversation history per session |
| Git repos | `~/git//` | Project source code |
diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md
new file mode 100644
index 0000000..7bc4a98
--- /dev/null
+++ b/docs/CONFIGURATION.md
@@ -0,0 +1,354 @@
+# DevClaw — Configuration Reference
+
+All DevClaw configuration lives in two places: `openclaw.json` (plugin-level settings) and `projects.json` (per-project state).
+
+## Plugin Configuration (`openclaw.json`)
+
+DevClaw is configured under `plugins.entries.devclaw.config` in `openclaw.json`.
+
+### Model Tiers
+
+Override which LLM model powers each developer level:
+
+```json
+{
+ "plugins": {
+ "entries": {
+ "devclaw": {
+ "config": {
+ "models": {
+ "dev": {
+ "junior": "anthropic/claude-haiku-4-5",
+ "medior": "anthropic/claude-sonnet-4-5",
+ "senior": "anthropic/claude-opus-4-5"
+ },
+ "qa": {
+ "reviewer": "anthropic/claude-sonnet-4-5",
+ "tester": "anthropic/claude-haiku-4-5"
+ }
+ }
+ }
+ }
+ }
+ }
+}
+```
+
+**Resolution order** (per `lib/tiers.ts:resolveModel`):
+
+1. Plugin config `models..` — explicit override
+2. `DEFAULT_MODELS[role][level]` — built-in defaults (table below)
+3. Passthrough — treat the level string as a raw model ID
+
+**Default models:**
+
+| Role | Level | Default model |
+|---|---|---|
+| dev | junior | `anthropic/claude-haiku-4-5` |
+| dev | medior | `anthropic/claude-sonnet-4-5` |
+| dev | senior | `anthropic/claude-opus-4-5` |
+| qa | reviewer | `anthropic/claude-sonnet-4-5` |
+| qa | tester | `anthropic/claude-haiku-4-5` |
+
+### Project Execution Mode
+
+Controls cross-project parallelism:
+
+```json
+{
+ "plugins": {
+ "entries": {
+ "devclaw": {
+ "config": {
+ "projectExecution": "parallel"
+ }
+ }
+ }
+ }
+}
+```
+
+| Value | Behavior |
+|---|---|
+| `"parallel"` (default) | Multiple projects can have active workers simultaneously |
+| `"sequential"` | Only one project's workers active at a time. Useful for single-agent deployments. |
+
+Enforced in `work_heartbeat` and the heartbeat service before dispatching.
+
+### Heartbeat Service
+
+Token-free interval-based health checks + queue dispatch:
+
+```json
+{
+ "plugins": {
+ "entries": {
+ "devclaw": {
+ "config": {
+ "work_heartbeat": {
+ "enabled": true,
+ "intervalSeconds": 60,
+ "maxPickupsPerTick": 4
+ }
+ }
+ }
+ }
+ }
+}
+```
+
+| Setting | Type | Default | Description |
+|---|---|---|---|
+| `enabled` | boolean | `true` | Enable the heartbeat service |
+| `intervalSeconds` | number | `60` | Seconds between ticks |
+| `maxPickupsPerTick` | number | `4` | Maximum worker dispatches per tick (budget control) |
+
+**Source:** [`lib/services/heartbeat.ts`](../lib/services/heartbeat.ts)
+
+The heartbeat service runs as a plugin service tied to the gateway lifecycle. Every tick: health pass (auto-fix zombies, stale workers) → tick pass (fill free slots by priority). Zero LLM tokens consumed.
+
+### Notifications
+
+Control which lifecycle events send notifications:
+
+```json
+{
+ "plugins": {
+ "entries": {
+ "devclaw": {
+ "config": {
+ "notifications": {
+ "heartbeatDm": true,
+ "workerStart": true,
+ "workerComplete": true
+ }
+ }
+ }
+ }
+ }
+}
+```
+
+| Setting | Default | Description |
+|---|---|---|
+| `heartbeatDm` | `true` | Send heartbeat summary to orchestrator DM |
+| `workerStart` | `true` | Announce when a worker picks up a task |
+| `workerComplete` | `true` | Announce when a worker finishes a task |
+
+### DevClaw Agent IDs
+
+List which agents are recognized as DevClaw orchestrators (used for context detection):
+
+```json
+{
+ "plugins": {
+ "entries": {
+ "devclaw": {
+ "config": {
+ "devClawAgentIds": ["my-orchestrator"]
+ }
+ }
+ }
+ }
+}
+```
+
+### Agent Tool Permissions
+
+Restrict DevClaw tools to your orchestrator agent:
+
+```json
+{
+ "agents": {
+ "list": [
+ {
+ "id": "my-orchestrator",
+ "tools": {
+ "allow": [
+ "work_start",
+ "work_finish",
+ "task_create",
+ "task_update",
+ "task_comment",
+ "status",
+ "health",
+ "work_heartbeat",
+ "project_register",
+ "setup",
+ "onboard"
+ ]
+ }
+ }
+ ]
+ }
+}
+```
+
+---
+
+## Project State (`projects.json`)
+
+All project state lives in `/projects/projects.json`, keyed by group ID.
+
+**Source:** [`lib/projects.ts`](../lib/projects.ts)
+
+### Schema
+
+```json
+{
+ "projects": {
+ "": {
+ "name": "my-webapp",
+ "repo": "~/git/my-webapp",
+ "groupName": "Dev - My Webapp",
+ "baseBranch": "development",
+ "deployBranch": "development",
+ "deployUrl": "https://my-webapp.example.com",
+ "channel": "telegram",
+ "roleExecution": "parallel",
+ "dev": {
+ "active": false,
+ "issueId": null,
+ "startTime": null,
+ "level": null,
+ "sessions": {
+ "junior": null,
+ "medior": "agent:orchestrator:subagent:my-webapp-dev-medior",
+ "senior": null
+ }
+ },
+ "qa": {
+ "active": false,
+ "issueId": null,
+ "startTime": null,
+ "level": null,
+ "sessions": {
+ "reviewer": "agent:orchestrator:subagent:my-webapp-qa-reviewer",
+ "tester": null
+ }
+ }
+ }
+ }
+}
+```
+
+### Project fields
+
+| Field | Type | Description |
+|---|---|---|
+| `name` | string | Short project name |
+| `repo` | string | Path to git repo (supports `~/` expansion) |
+| `groupName` | string | Group display name |
+| `baseBranch` | string | Base branch for development |
+| `deployBranch` | string | Branch that triggers deployment |
+| `deployUrl` | string | Deployment URL |
+| `channel` | string | Messaging channel (`"telegram"`, `"whatsapp"`, etc.) |
+| `roleExecution` | `"parallel"` \| `"sequential"` | DEV/QA parallelism for this project |
+
+### Worker state fields
+
+Each project has `dev` and `qa` worker state objects:
+
+| Field | Type | Description |
+|---|---|---|
+| `active` | boolean | Whether this role has an active worker |
+| `issueId` | string \| null | Issue being worked on (as string) |
+| `startTime` | string \| null | ISO timestamp when worker became active |
+| `level` | string \| null | Current level (`junior`, `medior`, `senior`, `reviewer`, `tester`) |
+| `sessions` | Record | Per-level session keys |
+
+**DEV session keys:** `junior`, `medior`, `senior`
+**QA session keys:** `reviewer`, `tester`
+
+### Key design decisions
+
+- **Session-per-level** — each level gets its own worker session, accumulating context independently. Level selection maps directly to a session key.
+- **Sessions preserved on completion** — when a worker completes a task, the sessions map is preserved (only `active`, `issueId`, and `startTime` are cleared). This enables session reuse.
+- **Atomic writes** — all writes go through temp-file-then-rename to prevent corruption.
+- **Sessions persist indefinitely** — no auto-cleanup. The `health` tool handles manual cleanup.
+
+---
+
+## Workspace File Layout
+
+```
+/
+├── projects/
+│ ├── projects.json ← Project state (auto-managed)
+│ └── roles/
+│ ├── my-webapp/ ← Per-project role instructions (editable)
+│ │ ├── dev.md
+│ │ └── qa.md
+│ ├── another-project/
+│ │ ├── dev.md
+│ │ └── qa.md
+│ └── default/ ← Fallback role instructions
+│ ├── dev.md
+│ └── qa.md
+├── log/
+│ └── audit.log ← NDJSON event log (auto-managed)
+├── AGENTS.md ← Agent identity documentation
+└── HEARTBEAT.md ← Heartbeat operation guide
+```
+
+### Role instruction files
+
+`work_start` loads role instructions from `projects/roles//.md` at dispatch time, falling back to `projects/roles/default/.md`. These files are appended to the task message sent to worker sessions.
+
+Edit to customize: deployment steps, test commands, acceptance criteria, coding standards.
+
+**Source:** [`lib/dispatch.ts:loadRoleInstructions`](../lib/dispatch.ts)
+
+---
+
+## Audit Log
+
+Append-only NDJSON at `/log/audit.log`. Auto-truncated to 250 lines.
+
+**Source:** [`lib/audit.ts`](../lib/audit.ts)
+
+### Event types
+
+| Event | Trigger |
+|---|---|
+| `work_start` | Task dispatched to worker |
+| `model_selection` | Level resolved to model ID |
+| `work_finish` | Task completed |
+| `work_heartbeat` | Heartbeat tick completed |
+| `task_create` | Issue created |
+| `task_update` | Issue state changed |
+| `task_comment` | Comment added to issue |
+| `status` | Queue status queried |
+| `health` | Health scan completed |
+| `heartbeat_tick` | Heartbeat service tick (background) |
+| `project_register` | Project registered |
+
+### Querying
+
+```bash
+# All task dispatches
+cat audit.log | jq 'select(.event=="work_start")'
+
+# All completions for a project
+cat audit.log | jq 'select(.event=="work_finish" and .project=="my-webapp")'
+
+# Model selections
+cat audit.log | jq 'select(.event=="model_selection")'
+```
+
+---
+
+## Issue Provider
+
+DevClaw uses an `IssueProvider` interface (`lib/providers/provider.ts`) to abstract issue tracker operations. The provider is auto-detected from the git remote URL.
+
+**Supported providers:**
+
+| Provider | CLI | Detection |
+|---|---|---|
+| GitHub | `gh` | Remote contains `github.com` |
+| GitLab | `glab` | Remote contains `gitlab` |
+
+**Planned:** Jira (via REST API)
+
+**Source:** [`lib/providers/index.ts`](../lib/providers/index.ts)
diff --git a/docs/CONTEXT-AWARENESS.md b/docs/CONTEXT-AWARENESS.md
index bb132ed..2070e64 100644
--- a/docs/CONTEXT-AWARENESS.md
+++ b/docs/CONTEXT-AWARENESS.md
@@ -1,6 +1,6 @@
-# Context-Aware DevClaw
+# DevClaw — Context Awareness
-DevClaw now adapts its behavior based on how you interact with it.
+DevClaw adapts its behavior based on how you interact with it.
## Design Philosophy
@@ -12,170 +12,122 @@ DevClaw enforces strict boundaries between projects:
- Project work happens **inside that project's group**
- Setup and configuration happen **outside project groups**
-This design prevents:
-- ❌ Cross-project contamination (workers picking up wrong project's tasks)
-- ❌ Confusion about which project you're working on
-- ❌ Accidental registration of wrong groups
-- ❌ Setup discussions cluttering project work channels
+This prevents:
+- Cross-project contamination (workers picking up wrong project's tasks)
+- Confusion about which project you're working on
+- Accidental registration of wrong groups
+- Setup discussions cluttering project work channels
This enables:
-- ✅ Clear mental model: "This group = this project"
-- ✅ Isolated work streams: Each project progresses independently
-- ✅ Dedicated teams: Workers focus on one project at a time
-- ✅ Clean separation: Setup vs. operational work
+- Clear mental model: "This group = this project"
+- Isolated work streams: Each project progresses independently
+- Dedicated teams: Workers focus on one project at a time
+- Clean separation: Setup vs. operational work
## Three Interaction Contexts
-### 1. **Via Another Agent** (Setup Mode)
-When you talk to your main agent (like Henk) about DevClaw:
-- ✅ Use: `devclaw_onboard`, `devclaw_setup`
-- ❌ Avoid: `task_pickup`, `queue_status` (operational tools)
+### 1. Via Another Agent (Setup Mode)
+
+When you talk to your main agent about DevClaw:
+- Use: `onboard`, `setup`
+- Avoid: `work_start`, `status` (operational tools)
**Example:**
```
-User → Henk: "Can you help me set up DevClaw?"
-Henk → Calls devclaw_onboard
+User → Main Agent: "Can you help me set up DevClaw?"
+Main Agent → Calls onboard
```
-### 2. **Direct Message to DevClaw Agent**
+### 2. Direct Message to DevClaw Agent
+
When you DM the DevClaw agent directly on Telegram/WhatsApp:
-- ✅ Use: `queue_status` (all projects), `session_health` (system overview)
-- ❌ Avoid: `task_pickup` (project-specific work), setup tools
+- Use: `status` (all projects), `health` (system overview)
+- Avoid: `work_start` (project-specific work), setup tools
**Example:**
```
User → DevClaw DM: "Show me the status of all projects"
-DevClaw → Calls queue_status (shows all projects)
+DevClaw → Calls status (shows all projects)
```
-### 3. **Project Group Chat**
+### 3. Project Group Chat
+
When you message in a Telegram/WhatsApp group bound to a project:
-- ✅ Use: `task_pickup`, `task_complete`, `task_create`, `queue_status` (auto-filtered)
-- ❌ Avoid: Setup tools, system-wide queries
+- Use: `work_start`, `work_finish`, `task_create`, `status` (auto-filtered)
+- Avoid: Setup tools, system-wide queries
**Example:**
```
-User → OpenClaw Dev Group: "@henk pick up issue #42"
-DevClaw → Calls task_pickup (only works in groups)
+User → Project Group: "pick up issue #42"
+DevClaw → Calls work_start (only works in groups)
```
## How It Works
### Context Detection
+
Each tool automatically detects:
-- **Agent ID** - Is this the DevClaw agent or another agent?
-- **Message Channel** - Telegram, WhatsApp, or CLI?
-- **Session Key** - Is this a group chat or direct message?
+- **Agent ID** — Is this the DevClaw agent or another agent?
+- **Message Channel** — Telegram, WhatsApp, or CLI?
+- **Session Key** — Is this a group chat or direct message?
- Format: `agent:{agentId}:{channel}:{type}:{id}`
- Telegram group: `agent:devclaw:telegram:group:-5266044536`
- WhatsApp group: `agent:devclaw:whatsapp:group:120363123@g.us`
- DM: `agent:devclaw:telegram:user:657120585`
-- **Project Binding** - Which project is this group bound to?
+- **Project Binding** — Which project is this group bound to?
### Guardrails
+
Tools include context-aware guidance in their responses:
```json
{
- "contextGuidance": "🛡️ Context: Project Group Chat (telegram)\n
- You're in a Telegram group for project 'openclaw-core'.\n
- Use task_pickup, task_complete for project work.",
+ "contextGuidance": "Context: Project Group Chat (telegram)\n You're in a Telegram group for project 'my-webapp'.\n Use work_start, work_finish for project work.",
...
}
```
-## Integrated Tools
+## Tool Context Requirements
-### ✅ `devclaw_onboard`
-- **Works best:** Via another agent or direct DM
-- **Blocks:** Group chats (setup shouldn't happen in project groups)
+| Tool | Group chat | Direct DM | Via agent |
+|---|---|---|---|
+| `onboard` | Blocked | Works | Works |
+| `setup` | Works | Works | Works |
+| `work_start` | Works | Blocked | Blocked |
+| `work_finish` | Works | Works | Works |
+| `task_create` | Works | Works | Works |
+| `task_update` | Works | Works | Works |
+| `task_comment` | Works | Works | Works |
+| `status` | Auto-filtered | All projects | Suggests onboard |
+| `health` | Auto-filtered | All projects | Works |
+| `work_heartbeat` | Single project | All projects | Works |
+| `project_register` | Works (required) | Blocked | Blocked |
-### ✅ `queue_status`
-- **Group context:** Auto-filters to that project
-- **Direct context:** Shows all projects
-- **Via-agent context:** Suggests using devclaw_onboard instead
-
-### ✅ `task_pickup`
-- **ONLY works:** In project group chats
-- **Blocks:** Direct DMs and setup conversations
-
-### ✅ `project_register`
-- **ONLY works:** In the Telegram/WhatsApp group you're registering
-- **Blocks:** Direct DMs and via-agent conversations
-- **Auto-detects:** Group ID from current chat (projectGroupId parameter now optional)
-
-**Why this matters:**
-- **Project Isolation**: Each group = one project = one dedicated team
-- **Clear Boundaries**: Forces deliberate project registration from within the project's space
-- **Team Clarity**: You're physically in the group when binding it, making the connection explicit
-- **No Mistakes**: Impossible to accidentally register the wrong group when you're in it
-- **Natural Workflow**: "This group is for Project X" → register Project X here
-
-## Testing
-
-### Debug Tool
-Use `context_test` to see what context is detected:
-```
-# In any context:
-context_test
-
-# Returns:
-{
- "detectedContext": { "type": "group", "projectName": "openclaw-core" },
- "guardrails": "🛡️ Context: Project Group Chat..."
-}
-```
-
-### Manual Testing
-1. **Setup Mode:** Message your main agent → "Help me configure DevClaw"
-2. **Status Check:** DM DevClaw agent (Telegram/WhatsApp) → "Show me the queue"
-3. **Project Work:** Post in project group (Telegram/WhatsApp) → "@henk pick up #42"
-
-Each context should trigger different guardrails.
-
-## Configuration
-
-Add to `~/.openclaw/openclaw.json`:
-```json
-"plugins": {
- "entries": {
- "devclaw": {
- "config": {
- "devClawAgentIds": ["henk-development", "devclaw-test"],
- "models": { ... }
- }
- }
- }
-}
-```
-
-The `devClawAgentIds` array lists which agents are DevClaw orchestrators.
-
-## Implementation Details
-
-- **Module:** [lib/context-guard.ts](../lib/context-guard.ts)
-- **Tests:** [tests/unit/context-guard.test.ts](../tests/unit/context-guard.test.ts) (15 passing)
-- **Integrated tools:** 4 key tools (`devclaw_onboard`, `queue_status`, `task_pickup`, `project_register`)
-- **Detection logic:** Checks agentId, messageChannel, sessionKey pattern matching
+**Why `project_register` requires group context:**
+- Forces deliberate project registration from within the project's space
+- You're physically in the group when binding it, making the connection explicit
+- Impossible to accidentally register the wrong group
## WhatsApp Support
-DevClaw **fully supports WhatsApp** groups with the same architecture as Telegram:
+DevClaw fully supports WhatsApp groups with the same architecture as Telegram:
-- ✅ WhatsApp group detection via `sessionKey.includes("@g.us")`
-- ✅ Projects keyed by WhatsApp group ID (e.g., `"120363123@g.us"`)
-- ✅ Context-aware tools work identically for both channels
-- ✅ One project = one group (Telegram OR WhatsApp)
+- WhatsApp group detection via `sessionKey.includes("@g.us")`
+- Projects keyed by WhatsApp group ID (e.g., `"120363123@g.us"`)
+- Context-aware tools work identically for both channels
+- One project = one group (Telegram OR WhatsApp)
**To register a WhatsApp project:**
1. Go to the WhatsApp group chat
2. Call `project_register` from within the group
3. Group ID auto-detected from context
-The architecture treats Telegram and WhatsApp identically - the only difference is the group ID format.
+## Implementation
-## Future Enhancements
+- **Module:** [`lib/context-guard.ts`](../lib/context-guard.ts)
+- **Detection logic:** Checks agentId, messageChannel, sessionKey pattern matching
+- **Configuration:** `devClawAgentIds` in plugin config lists which agents are DevClaw orchestrators
-- [ ] Integrate into remaining tools (`task_complete`, `session_health`, `task_create`, `devclaw_setup`)
-- [ ] System prompt injection (requires OpenClaw core support)
-- [ ] Context-based tool filtering (hide irrelevant tools)
-- [ ] Per-project context overrides
+## Related
+
+- [Configuration — devClawAgentIds](CONFIGURATION.md#devclaw-agent-ids)
+- [Architecture — Scope boundaries](ARCHITECTURE.md#scope-boundaries)
diff --git a/docs/MANAGEMENT.md b/docs/MANAGEMENT.md
index 86f1a93..c99431e 100644
--- a/docs/MANAGEMENT.md
+++ b/docs/MANAGEMENT.md
@@ -12,14 +12,14 @@ DevClaw exists because of a gap that management theorists identified decades ago
In 1969, Paul Hersey and Ken Blanchard published what would become Situational Leadership Theory. The central idea is deceptively simple: the way you delegate should match the capability and reliability of the person doing the work. You don't hand an intern the system architecture redesign. You don't ask your principal engineer to rename a CSS class.
-DevClaw's model selection does exactly this. When a task comes in, the plugin evaluates complexity from the issue title and description, then routes it to the cheapest model that can handle it:
+DevClaw's level selection does exactly this. When a task comes in, the plugin routes it to the cheapest model that can handle it:
-| Complexity | Model | Analogy |
-| -------------------------------- | ------ | --------------------------- |
-| Simple (typos, renames, copy) | Haiku | Junior dev — just execute |
-| Standard (features, bug fixes) | Sonnet | Mid-level — think and build |
-| Complex (architecture, security) | Opus | Senior — design and reason |
-| Review | Grok | Independent reviewer |
+| Complexity | Level | Analogy |
+| -------------------------------- | -------- | --------------------------- |
+| Simple (typos, renames, copy) | Junior | The intern — just execute |
+| Standard (features, bug fixes) | Medior | Mid-level — think and build |
+| Complex (architecture, security) | Senior | The architect — design and reason |
+| Review | Reviewer | Independent code reviewer |
This isn't just cost optimization. It mirrors what effective managers do instinctively: match the delegation level to the task, not to a fixed assumption about the delegate.
@@ -27,11 +27,11 @@ This isn't just cost optimization. It mirrors what effective managers do instinc
Classical management theory — later formalized by Bernard Bass in his work on Transformational Leadership — introduced a concept called Management by Exception (MBE). The principle: a manager should only be pulled back into a workstream when something deviates from the expected path.
-DevClaw's task lifecycle is built on this. The orchestrator delegates a task via `task_pickup`, then steps away. It only re-engages in three scenarios:
+DevClaw's task lifecycle is built on this. The orchestrator delegates a task via `work_start`, then steps away. It only re-engages in three scenarios:
1. **DEV completes work** → The task moves to QA automatically. No orchestrator involvement needed.
2. **QA passes** → The issue closes. Pipeline complete.
-3. **QA fails** → The task cycles back to DEV with a fix request. The orchestrator may need to adjust the model tier.
+3. **QA fails** → The task cycles back to DEV with a fix request. The orchestrator may need to adjust the model level.
4. **QA refines** → The task enters a holding state that _requires human decision_. This is the explicit escalation boundary.
The "refine" state is the most interesting from a delegation perspective. It's a conscious architectural decision that says: some judgments should not be automated. When the QA agent determines that a task needs rethinking rather than just fixing, it escalates to the only actor who has the full business context — the human.
@@ -61,7 +61,7 @@ One of the most common delegation failures is self-review. You don't ask the per
DevClaw enforces structural separation between development and review by design:
- DEV and QA are separate sub-agent sessions with separate state.
-- QA uses a different model entirely (Grok), introducing genuine independence.
+- QA uses the reviewer level, which can be a different model entirely, introducing genuine independence.
- The review happens after a clean label transition — QA picks up from `To Test`, not from watching DEV work in real time.
This mirrors a principle from organizational design: effective controls require independence between execution and verification. It's the same reason companies separate their audit function from their operations.
@@ -72,7 +72,7 @@ Ronald Coase won a Nobel Prize for explaining why firms exist: transaction costs
DevClaw applies the same logic to AI sessions. Spawning a new sub-agent session costs approximately 50,000 tokens of context loading — the agent needs to read the full codebase before it can do useful work. That's the onboarding cost.
-The plugin tracks session IDs across task completions. When a DEV finishes task A and task B is ready on the same project, DevClaw detects the existing session and returns `"sessionAction": "send"` instead of `"spawn"`. The orchestrator routes the new task to the running session. No re-onboarding. No context reload.
+The plugin tracks session keys across task completions. When a DEV finishes task A and task B is ready on the same project, DevClaw detects the existing session and reuses it instead of spawning a new one. No re-onboarding. No context reload.
In management terms: keep your team stable. Reassigning the same person to the next task on their project is almost always cheaper than bringing in someone new — even if the new person is theoretically better qualified.
@@ -101,11 +101,11 @@ This is the deepest lesson from delegation theory: **good delegation isn't about
Management research points to a few directions that could extend DevClaw's delegation model:
-**Progressive delegation.** Blanchard's model suggests increasing task complexity for delegates as they prove competent. DevClaw could track QA pass rates per model tier and automatically promote — if Haiku consistently passes QA on borderline tasks, start routing more work to it. This is how good managers develop their people, and it reduces cost over time.
+**Progressive delegation.** Blanchard's model suggests increasing task complexity for delegates as they prove competent. DevClaw could track QA pass rates per model level and automatically promote — if junior consistently passes QA on borderline tasks, start routing more work to it. This is how good managers develop their people, and it reduces cost over time.
**Delegation authority expansion.** The Vroom-Yetton decision model maps when a leader should decide alone versus consulting the team. Currently, sub-agents have narrow authority — they execute tasks but can't restructure the backlog. Selectively expanding this (e.g., allowing a DEV agent to split a task it judges too large) would reduce orchestrator bottlenecks, mirroring how managers gradually give high-performers more autonomy.
-**Outcome-based learning.** Delegation research emphasizes that the _delegator_ learns from outcomes too. Aggregated metrics — QA fail rate by model tier, average cycles to Done, time-in-state distributions — would help both the orchestrator agent and the human calibrate their delegation patterns over time.
+**Outcome-based learning.** Delegation research emphasizes that the _delegator_ learns from outcomes too. Aggregated metrics — QA fail rate by model level, average cycles to Done, time-in-state distributions — would help both the orchestrator agent and the human calibrate their delegation patterns over time.
---
diff --git a/docs/ONBOARDING.md b/docs/ONBOARDING.md
index acc0a45..00e7747 100644
--- a/docs/ONBOARDING.md
+++ b/docs/ONBOARDING.md
@@ -1,18 +1,18 @@
# DevClaw — Onboarding Guide
-## What you need before starting
+Step-by-step setup: install the plugin, configure an agent, register projects, and run your first task.
+
+## Prerequisites
| Requirement | Why | How to check |
|---|---|---|
| [OpenClaw](https://openclaw.ai) installed | DevClaw is an OpenClaw plugin | `openclaw --version` |
| Node.js >= 20 | Runtime for plugin | `node --version` |
-| [`glab`](https://gitlab.com/gitlab-org/cli) or [`gh`](https://cli.github.com) CLI | Issue tracker provider (auto-detected from remote) | `glab --version` or `gh --version` |
-| CLI authenticated | Plugin calls glab/gh for every label transition | `glab auth status` or `gh auth status` |
-| A GitLab/GitHub repo with issues | The task backlog lives in the issue tracker | `glab issue list` or `gh issue list` from your repo |
+| [`gh`](https://cli.github.com) or [`glab`](https://gitlab.com/gitlab-org/cli) CLI | Issue tracker provider (auto-detected from git remote) | `gh --version` or `glab --version` |
+| CLI authenticated | Plugin calls gh/glab for every label transition | `gh auth status` or `glab auth status` |
+| A GitHub/GitLab repo with issues | The task backlog lives in the issue tracker | `gh issue list` or `glab issue list` from your repo |
-## Setup
-
-### 1. Install the plugin
+## Step 1: Install the plugin
```bash
# Copy to extensions directory (auto-discovered on next restart)
@@ -25,21 +25,21 @@ openclaw plugins list
# Should show: DevClaw | devclaw | loaded
```
-### 2. Run setup
+## Step 2: Run setup
There are three ways to set up DevClaw:
-#### Option A: Conversational onboarding (recommended)
+### Option A: Conversational onboarding (recommended)
-Call the `devclaw_onboard` tool from any agent that has the DevClaw plugin loaded. The agent will walk you through configuration step by step — asking about:
+Call the `onboard` tool from any agent that has the DevClaw plugin loaded. The agent walks you through configuration step by step — asking about:
- Agent selection (current or create new)
- Channel binding (telegram/whatsapp/none) — for new agents only
-- Model tiers (accept defaults or customize)
+- Model levels (accept defaults or customize)
- Optional project registration
The tool returns instructions that guide the agent through the QA-style setup conversation.
-#### Option B: CLI wizard
+### Option B: CLI wizard
```bash
openclaw devclaw setup
@@ -48,12 +48,13 @@ openclaw devclaw setup
The setup wizard walks you through:
1. **Agent** — Create a new orchestrator agent or configure an existing one
-2. **Developer team** — Choose which LLM model powers each developer tier:
- - **Junior** (fast, cheap tasks) — default: `anthropic/claude-haiku-4-5`
- - **Medior** (standard tasks) — default: `anthropic/claude-sonnet-4-5`
- - **Senior** (complex tasks) — default: `anthropic/claude-opus-4-5`
- - **QA** (code review) — default: `anthropic/claude-sonnet-4-5`
-3. **Workspace** — Writes AGENTS.md, HEARTBEAT.md, role templates, and initializes memory
+2. **Developer team** — Choose which LLM model powers each developer level:
+ - **DEV junior** (fast, cheap tasks) — default: `anthropic/claude-haiku-4-5`
+ - **DEV medior** (standard tasks) — default: `anthropic/claude-sonnet-4-5`
+ - **DEV senior** (complex tasks) — default: `anthropic/claude-opus-4-5`
+ - **QA reviewer** (code review) — default: `anthropic/claude-sonnet-4-5`
+ - **QA tester** (manual testing) — default: `anthropic/claude-haiku-4-5`
+3. **Workspace** — Writes AGENTS.md, HEARTBEAT.md, role templates, and initializes state
Non-interactive mode:
```bash
@@ -66,45 +67,45 @@ openclaw devclaw setup --agent my-orchestrator \
--senior "anthropic/claude-opus-4-5"
```
-#### Option C: Tool call (agent-driven)
+### Option C: Tool call (agent-driven)
**Conversational onboarding via tool:**
```json
-devclaw_onboard({ mode: "first-run" })
+onboard({ "mode": "first-run" })
```
-The tool returns step-by-step instructions that guide the agent through the QA-style setup conversation.
+The tool returns step-by-step instructions that guide the agent through the setup conversation.
**Direct setup (skip conversation):**
```json
-{
+setup({
"newAgentName": "My Dev Orchestrator",
"channelBinding": "telegram",
"models": {
- "junior": "anthropic/claude-haiku-4-5",
- "senior": "anthropic/claude-opus-4-5"
+ "dev": {
+ "junior": "anthropic/claude-haiku-4-5",
+ "senior": "anthropic/claude-opus-4-5"
+ },
+ "qa": {
+ "reviewer": "anthropic/claude-sonnet-4-5"
+ }
}
-}
+})
```
-This calls `devclaw_setup` directly without conversational prompts.
+## Step 3: Channel binding (optional, for new agents)
-### 3. Channel binding (optional, for new agents)
-
-If you created a new agent during conversational onboarding and selected a channel binding (telegram/whatsapp), the agent is automatically bound and will receive messages from that channel. **Skip to step 4.**
+If you created a new agent during conversational onboarding and selected a channel binding (telegram/whatsapp), the agent is automatically bound. **Skip to step 4.**
**Smart Migration**: If an existing agent already has a channel-wide binding (e.g., the old orchestrator receives all telegram messages), the onboarding agent will:
-1. Call `analyze_channel_bindings` to detect the conflict
+1. Detect the conflict
2. Ask if you want to migrate the binding from the old agent to the new one
3. If you confirm, the binding is automatically moved — no manual config edit needed
-This is useful when you're replacing an old orchestrator with a new one.
+If you didn't bind a channel during setup:
-If you didn't bind a channel during setup, you have two options:
+**Option A: Manually edit `openclaw.json`**
-**Option A: Manually edit `openclaw.json`** (for existing agents or post-creation binding)
-
-Add an entry to the `bindings` array:
```json
{
"bindings": [
@@ -136,131 +137,115 @@ Restart OpenClaw after editing.
**Option B: Add bot to Telegram/WhatsApp group**
-If using a channel-wide binding (no peer filter), the agent will receive all messages from that channel. Add your orchestrator bot to the relevant Telegram group for the project.
+If using a channel-wide binding (no peer filter), the agent receives all messages from that channel. Add your orchestrator bot to the relevant Telegram group.
-### 4. Register your project
+## Step 4: Register your project
-Tell the orchestrator agent to register a new project:
+Go to the Telegram/WhatsApp group for the project and tell the orchestrator agent:
-> "Register project my-project at ~/git/my-project for group -1234567890 with base branch development"
+> "Register project my-project at ~/git/my-project with base branch development"
The agent calls `project_register`, which atomically:
- Validates the repo and auto-detects GitHub/GitLab from remote
- Creates all 8 state labels (idempotent)
-- Scaffolds prompt instruction files (`projects/prompts//dev.md` and `qa.md`)
-- Adds the project entry to `projects.json` with `autoChain: false`
+- Scaffolds role instruction files (`projects/roles//dev.md` and `qa.md`)
+- Adds the project entry to `projects.json`
- Logs the registration event
+**Initial state in `projects.json`:**
+
```json
{
"projects": {
"-1234567890": {
"name": "my-project",
"repo": "~/git/my-project",
- "groupName": "Dev - My Project",
- "deployUrl": "",
+ "groupName": "Project: my-project",
"baseBranch": "development",
"deployBranch": "development",
- "autoChain": false,
+ "channel": "telegram",
+ "roleExecution": "parallel",
"dev": {
"active": false,
"issueId": null,
"startTime": null,
- "model": null,
+ "level": null,
"sessions": { "junior": null, "medior": null, "senior": null }
},
"qa": {
"active": false,
"issueId": null,
"startTime": null,
- "model": null,
- "sessions": { "qa": null }
+ "level": null,
+ "sessions": { "reviewer": null, "tester": null }
}
}
}
}
```
-**Manual fallback:** If you prefer CLI control, you can still create labels manually with `glab label create` and edit `projects.json` directly. See the [Architecture docs](ARCHITECTURE.md) for label names and colors.
+**Finding the Telegram group ID:** The group ID is the numeric ID of your Telegram supergroup (a negative number like `-1234567890`). When you call `project_register` from within the group, the ID is auto-detected from context.
-**Finding the Telegram group ID:** The group ID is the numeric ID of your Telegram supergroup (a negative number like `-1234567890`). You can find it via the Telegram bot API or from message metadata in OpenClaw logs.
-
-### 5. Create your first issue
+## Step 5: Create your first issue
Issues can be created in multiple ways:
- **Via the agent** — Ask the orchestrator in the Telegram group: "Create an issue for adding a login page" (uses `task_create`)
- **Via workers** — DEV/QA workers can call `task_create` to file follow-up bugs they discover
-- **Via CLI** — `cd ~/git/my-project && glab issue create --title "My first task" --label "To Do"` (or `gh issue create`)
+- **Via CLI** — `cd ~/git/my-project && gh issue create --title "My first task" --label "To Do"` (or `glab issue create`)
- **Via web UI** — Create an issue and add the "To Do" label
-### 6. Test the pipeline
+Note: `task_create` defaults to the "Planning" label. Use "To Do" explicitly when the task is ready for immediate work.
+
+## Step 6: Test the pipeline
Ask the agent in the Telegram group:
> "Check the queue status"
-The agent should call `queue_status` and report the "To Do" issue. Then:
+The agent should call `status` and report the "To Do" issue. Then:
> "Pick up issue #1 for DEV"
-The agent calls `task_pickup`, which assigns a developer tier, transitions the label to "Doing", creates or reuses a worker session, and dispatches the task — all in one call. The agent just posts the announcement.
+The agent calls `work_start`, which assigns a developer level, transitions the label to "Doing", creates or reuses a worker session, and dispatches the task — all in one call. The agent posts the announcement.
## Adding more projects
-Tell the agent to register a new project (step 3) and add the bot to the new Telegram group (step 4). That's it — `project_register` handles labels and state setup.
+Tell the agent to register a new project (step 4) from within the new project's Telegram group. That's it — `project_register` handles labels and state setup.
Each project is fully isolated — separate queue, separate workers, separate state.
-## Developer tiers
+## Developer levels
-DevClaw assigns tasks to developer tiers instead of raw model names. This makes the system intuitive — you're assigning a "junior dev" to fix a typo, not configuring model parameters.
+DevClaw assigns tasks to developer levels instead of raw model names. This makes the system intuitive — you're assigning a "junior dev" to fix a typo, not configuring model parameters.
-| Tier | Role | Default model | When to assign |
-|------|------|---------------|----------------|
-| **junior** | Junior developer | `anthropic/claude-haiku-4-5` | Typos, single-file fixes, CSS changes |
-| **medior** | Mid-level developer | `anthropic/claude-sonnet-4-5` | Features, bug fixes, multi-file changes |
-| **senior** | Senior developer | `anthropic/claude-opus-4-5` | Architecture, migrations, system-wide refactoring |
-| **qa** | QA engineer | `anthropic/claude-sonnet-4-5` | Code review, test validation |
+| Role | Level | Default model | When to assign |
+|------|-------|---------------|----------------|
+| DEV | **junior** | `anthropic/claude-haiku-4-5` | Typos, single-file fixes, CSS changes |
+| DEV | **medior** | `anthropic/claude-sonnet-4-5` | Features, bug fixes, multi-file changes |
+| DEV | **senior** | `anthropic/claude-opus-4-5` | Architecture, migrations, system-wide refactoring |
+| QA | **reviewer** | `anthropic/claude-sonnet-4-5` | Code review, test validation |
+| QA | **tester** | `anthropic/claude-haiku-4-5` | Manual testing, smoke tests |
-Change which model powers each tier in `openclaw.json`:
-```json
-{
- "plugins": {
- "entries": {
- "devclaw": {
- "config": {
- "models": {
- "junior": "anthropic/claude-haiku-4-5",
- "medior": "anthropic/claude-sonnet-4-5",
- "senior": "anthropic/claude-opus-4-5",
- "qa": "anthropic/claude-sonnet-4-5"
- }
- }
- }
- }
- }
-}
-```
+Change which model powers each level in `openclaw.json` — see [Configuration](CONFIGURATION.md#model-tiers).
## What the plugin handles vs. what you handle
| Responsibility | Who | Details |
|---|---|---|
| Plugin installation | You (once) | `cp -r devclaw ~/.openclaw/extensions/` |
-| Agent + workspace setup | Plugin (`devclaw_setup`) | Creates agent, configures models, writes workspace files |
-| Channel binding analysis | Plugin (`analyze_channel_bindings`) | Detects channel conflicts, validates channel configuration |
-| Channel binding migration | Plugin (`devclaw_setup` with `migrateFrom`) | Automatically moves channel-wide bindings between agents |
-| Label setup | Plugin (`project_register`) | 8 labels, created idempotently via `IssueProvider` |
-| Prompt file scaffolding | Plugin (`project_register`) | Creates `projects/prompts//dev.md` and `qa.md` |
+| Agent + workspace setup | Plugin (`setup`) | Creates agent, configures models, writes workspace files |
+| Channel binding migration | Plugin (`setup` with `migrateFrom`) | Automatically moves channel-wide bindings between agents |
+| Label setup | Plugin (`project_register`) | 8 labels, created idempotently via IssueProvider |
+| Prompt file scaffolding | Plugin (`project_register`) | Creates `projects/roles//dev.md` and `qa.md` |
| Project registration | Plugin (`project_register`) | Entry in `projects.json` with empty worker state |
| Telegram group setup | You (once per project) | Add bot to group |
| Issue creation | Plugin (`task_create`) | Orchestrator or workers create issues from chat |
-| Label transitions | Plugin | Atomic label transitions via issue tracker CLI |
-| Developer assignment | Plugin | LLM-selected tier by orchestrator, keyword heuristic fallback |
+| Label transitions | Plugin | Atomic transitions via issue tracker CLI |
+| Developer assignment | Plugin | LLM-selected level by orchestrator, keyword heuristic fallback |
| State management | Plugin | Atomic read/write to `projects.json` |
| Session management | Plugin | Creates, reuses, and dispatches to sessions via CLI. Agent never touches session tools. |
-| Task completion | Plugin (`task_complete`) | Workers self-report. Auto-chains if enabled. |
-| Prompt instructions | Plugin (`task_pickup`) | Loaded from `projects/prompts//.md`, appended to task message |
+| Task completion | Plugin (`work_finish`) | Workers self-report. Auto-chains if enabled. |
+| Prompt instructions | Plugin (`work_start`) | Loaded from `projects/roles//.md`, appended to task message |
| Audit logging | Plugin | Automatic NDJSON append per tool call |
-| Zombie detection | Plugin | `session_health` checks active vs alive |
-| Queue scanning | Plugin | `queue_status` queries issue tracker per project |
+| Zombie detection | Plugin | `health` checks active vs alive |
+| Queue scanning | Plugin | `status` queries issue tracker per project |
diff --git a/docs/QA_WORKFLOW.md b/docs/QA_WORKFLOW.md
index d0abfe9..27fd659 100644
--- a/docs/QA_WORKFLOW.md
+++ b/docs/QA_WORKFLOW.md
@@ -1,8 +1,6 @@
-# QA Workflow
+# DevClaw — QA Workflow
-## Overview
-
-Quality Assurance (QA) in DevClaw follows a structured workflow that ensures every review is documented and traceable.
+Quality Assurance in DevClaw follows a structured workflow that ensures every review is documented and traceable.
## Required Steps
@@ -28,10 +26,10 @@ task_comment({
### 3. Complete the Task
-After posting your comment, call `task_complete`:
+After posting your comment, call `work_finish`:
```javascript
-task_complete({
+work_finish({
role: "qa",
projectGroupId: "",
result: "pass", // or "fail", "refine", "blocked"
@@ -39,15 +37,24 @@ task_complete({
})
```
+## QA Results
+
+| Result | Label transition | Meaning |
+|---|---|---|
+| `"pass"` | Testing → Done | Approved. Issue closed. |
+| `"fail"` | Testing → To Improve | Issues found. Issue reopened, sent back to DEV. |
+| `"refine"` | Testing → Refining | Needs human decision. Pipeline pauses. |
+| `"blocked"` | Testing → To Test | Cannot complete (env issues, etc.). Returns to QA queue. |
+
## Why Comments Are Required
-1. **Audit Trail**: Every review decision is documented
-2. **Knowledge Sharing**: Future reviewers understand what was tested
-3. **Quality Metrics**: Enables tracking of test coverage
-4. **Debugging**: When issues arise later, we know what was checked
-5. **Compliance**: Some projects require documented QA evidence
+1. **Audit Trail** — Every review decision is documented in the issue tracker
+2. **Knowledge Sharing** — Future reviewers understand what was tested
+3. **Quality Metrics** — Enables tracking of test coverage
+4. **Debugging** — When issues arise later, we know what was checked
+5. **Compliance** — Some projects require documented QA evidence
-## Comment Template
+## Comment Templates
### For Passing Reviews
@@ -61,7 +68,7 @@ task_complete({
**Results:** All tests passed. No regressions found.
-**Environment:**
+**Environment:**
- Browser/Platform: [details]
- Version: [details]
- Test data: [if relevant]
@@ -72,15 +79,14 @@ task_complete({
### For Failing Reviews
```markdown
-## QA Review - Issues Found
+## QA Review — Issues Found
**Tested:**
- [What you tested]
**Issues Found:**
1. [Issue description with steps to reproduce]
-2. [Issue description with steps to reproduce]
-3. [Issue description with expected vs actual behavior]
+2. [Issue description with expected vs actual behavior]
**Environment:**
- [Test environment details]
@@ -90,25 +96,25 @@ task_complete({
## Enforcement
-As of [current date], QA workers are instructed via role templates to:
-- Always call `task_comment` BEFORE `task_complete`
+QA workers receive instructions via role templates to:
+- Always call `task_comment` BEFORE `work_finish`
- Include specific details about what was tested
- Document results, environment, and any notes
Prompt templates affected:
-- `projects/prompts//qa.md`
+- `projects/roles//qa.md`
- All project-specific QA templates should follow this pattern
## Best Practices
-1. **Be Specific**: Don't just say "tested the feature" - list what you tested
-2. **Include Environment**: Version numbers, browser, OS can matter
-3. **Document Edge Cases**: If you tested special scenarios, note them
-4. **Use Screenshots**: For UI issues, screenshots help (link in comment)
-5. **Reference Requirements**: Link back to acceptance criteria from the issue
+1. **Be Specific** — Don't just say "tested the feature" — list what you tested
+2. **Include Environment** — Version numbers, browser, OS can matter
+3. **Document Edge Cases** — If you tested special scenarios, note them
+4. **Reference Requirements** — Link back to acceptance criteria from the issue
+5. **Use Screenshots** — For UI issues, screenshots help (link in comment)
## Related
-- Issue #103: Enforce QA comment on every review (pass or fail)
-- Tool: `task_comment` - Add comments to issues
-- Tool: `task_complete` - Complete QA tasks
+- Tool: [`task_comment`](TOOLS.md#task_comment) — Add comments to issues
+- Tool: [`work_finish`](TOOLS.md#work_finish) — Complete QA tasks
+- Config: [`projects/roles//qa.md`](CONFIGURATION.md#role-instruction-files) — QA role instructions
diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
index bebb933..98e67be 100644
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -15,16 +15,16 @@ This works for the common case but breaks down when you want:
Roles become a configurable list instead of a hardcoded pair. Each role defines:
- **Name** — e.g. `design`, `dev`, `qa`, `devops`
-- **Tiers** — which developer tiers can be assigned (e.g. design only needs `medior`)
+- **Levels** — which developer levels can be assigned (e.g. design only needs `medior`)
- **Pipeline position** — where it sits in the task lifecycle
- **Worker count** — how many concurrent workers (default: 1)
```json
{
"roles": {
- "dev": { "tiers": ["junior", "medior", "senior"], "workers": 1 },
- "qa": { "tiers": ["qa"], "workers": 1 },
- "devops": { "tiers": ["medior", "senior"], "workers": 1 }
+ "dev": { "levels": ["junior", "medior", "senior"], "workers": 1 },
+ "qa": { "levels": ["reviewer", "tester"], "workers": 1 },
+ "devops": { "levels": ["medior", "senior"], "workers": 1 }
},
"pipeline": ["dev", "qa", "devops"]
}
@@ -35,15 +35,15 @@ The pipeline definition replaces the hardcoded `Doing → To Test → Testing
### Open questions
- How do custom labels map? Generate from role names, or let users define?
-- Should roles have their own instruction files (`projects/prompts//.md`) — yes, this already works
+- Should roles have their own instruction files (`projects/roles//.md`) — yes, this already works
- How to handle parallel roles (e.g. frontend + backend DEV in parallel before QA)?
---
-## Channel-agnostic groups
+## Channel-agnostic Groups
Currently DevClaw maps projects to **Telegram group IDs**. The `projectGroupId` is a Telegram-specific negative number. This means:
-- WhatsApp groups can't be used as project channels
+- WhatsApp groups can't be used as project channels (partially supported now via `channel` field)
- Discord, Slack, or other channels are excluded
- The naming (`groupId`, `groupName`) is Telegram-specific
@@ -77,19 +77,20 @@ Key changes:
- All tool params, state keys, and docs updated accordingly
- Backward compatible: existing Telegram-only keys migrated on read
-This enables any OpenClaw channel (Telegram, WhatsApp, Discord, Slack, etc.) to host a project — each group chat becomes an autonomous dev team regardless of platform.
+This enables any OpenClaw channel (Telegram, WhatsApp, Discord, Slack, etc.) to host a project.
### Open questions
- Should one project be bindable to multiple channels? (e.g. Telegram for devs, WhatsApp for stakeholder updates)
-- How does the orchestrator agent handle cross-channel context? (OpenClaw bindings already route by channel)
+- How does the orchestrator agent handle cross-channel context?
---
-## Other ideas
+## Other Ideas
- **Jira provider** — `IssueProvider` interface already abstracts GitHub/GitLab; Jira is the obvious next addition
-- **Deployment integration** — `task_complete` QA pass could trigger a deploy step via webhook or CLI
-- **Cost tracking** — log token usage per task/tier, surface in `queue_status`
+- **Deployment integration** — `work_finish` QA pass could trigger a deploy step via webhook or CLI
+- **Cost tracking** — log token usage per task/level, surface in `status`
- **Priority scoring** — automatic priority assignment based on labels, age, and dependencies
- **Session archival** — auto-archive idle sessions after configurable timeout (currently indefinite)
+- **Progressive delegation** — track QA pass rates per level and auto-promote (see [Management Theory](MANAGEMENT.md))
diff --git a/docs/TESTING.md b/docs/TESTING.md
index 35a2837..f151e11 100644
--- a/docs/TESTING.md
+++ b/docs/TESTING.md
@@ -59,10 +59,15 @@ npm run test:ui
"devclaw": {
"config": {
"models": {
- "junior": "anthropic/claude-haiku-4-5",
- "medior": "anthropic/claude-sonnet-4-5",
- "senior": "anthropic/claude-opus-4-5",
- "qa": "anthropic/claude-sonnet-4-5"
+ "dev": {
+ "junior": "anthropic/claude-haiku-4-5",
+ "medior": "anthropic/claude-sonnet-4-5",
+ "senior": "anthropic/claude-opus-4-5"
+ },
+ "qa": {
+ "reviewer": "anthropic/claude-sonnet-4-5",
+ "tester": "anthropic/claude-haiku-4-5"
+ }
}
}
}
diff --git a/docs/TOOLS.md b/docs/TOOLS.md
new file mode 100644
index 0000000..15ee4ac
--- /dev/null
+++ b/docs/TOOLS.md
@@ -0,0 +1,361 @@
+# DevClaw — Tools Reference
+
+Complete reference for all 11 tools registered by DevClaw. See [`index.ts`](../index.ts) for registration.
+
+## Worker Lifecycle
+
+### `work_start`
+
+Pick up a task from the issue queue. Handles level assignment, label transition, session creation/reuse, task dispatch, and audit logging — all in one call.
+
+**Source:** [`lib/tools/work-start.ts`](../lib/tools/work-start.ts)
+
+**Context:** Only works in project group chats.
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `issueId` | number | No | Issue ID. If omitted, picks next by priority. |
+| `role` | `"dev"` \| `"qa"` | No | Worker role. Auto-detected from issue label if omitted. |
+| `projectGroupId` | string | No | Project group ID. Auto-detected from group context. |
+| `level` | string | No | Developer level (`junior`, `medior`, `senior`, `reviewer`). Auto-detected if omitted. |
+
+**What it does atomically:**
+
+1. Resolves project from `projects.json`
+2. Validates no active worker for this role
+3. Fetches issue from tracker, verifies correct label state
+4. Assigns level (LLM-chosen via `level` param → label detection → keyword heuristic fallback)
+5. Resolves level to model ID via config or defaults
+6. Loads prompt instructions from `projects/roles//.md`
+7. Looks up existing session for assigned level (session-per-level)
+8. Transitions label (e.g. `To Do` → `Doing`)
+9. Creates session via Gateway RPC if new (`sessions.patch`)
+10. Dispatches task to worker session via CLI (`openclaw gateway call agent`)
+11. Updates `projects.json` state (active, issueId, level, session key)
+12. Writes audit log entries (work_start + model_selection)
+13. Sends notification
+14. Returns announcement text
+
+**Level selection priority:**
+
+1. `level` parameter (LLM-selected) — highest priority
+2. Issue label (e.g. a label named "junior" or "senior")
+3. Keyword heuristic from `model-selector.ts` — fallback
+
+**Execution guards:**
+
+- Rejects if role already has an active worker
+- Respects `roleExecution` (sequential: rejects if other role is active)
+
+**On failure:** Rolls back label transition. No orphaned state.
+
+---
+
+### `work_finish`
+
+Complete a task with a result. Called by workers (DEV/QA sub-agent sessions) directly, or by the orchestrator.
+
+**Source:** [`lib/tools/work-finish.ts`](../lib/tools/work-finish.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `role` | `"dev"` \| `"qa"` | Yes | Worker role |
+| `result` | string | Yes | Completion result (see table below) |
+| `projectGroupId` | string | Yes | Project group ID |
+| `summary` | string | No | Brief summary for the announcement |
+| `prUrl` | string | No | PR/MR URL (auto-detected if omitted) |
+
+**Valid results by role:**
+
+| Role | Result | Label transition | Side effects |
+|---|---|---|---|
+| DEV | `"done"` | Doing → To Test | git pull, auto-detect PR URL |
+| DEV | `"blocked"` | Doing → To Do | Task returns to queue |
+| QA | `"pass"` | Testing → Done | Issue closed |
+| QA | `"fail"` | Testing → To Improve | Issue reopened |
+| QA | `"refine"` | Testing → Refining | Awaits human decision |
+| QA | `"blocked"` | Testing → To Test | Task returns to QA queue |
+
+**What it does atomically:**
+
+1. Validates role:result combination
+2. Resolves project and active worker
+3. Executes completion via pipeline service (label transition + side effects)
+4. Deactivates worker (sessions map preserved for reuse)
+5. Sends notification
+6. Ticks queue to fill free worker slots
+7. Writes audit log
+
+**Auto-chaining** (when enabled on the project): `dev:done` dispatches QA automatically. `qa:fail` re-dispatches DEV using the previous level.
+
+---
+
+## Task Management
+
+### `task_create`
+
+Create a new issue in the project's issue tracker.
+
+**Source:** [`lib/tools/task-create.ts`](../lib/tools/task-create.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | Yes | Project group ID |
+| `title` | string | Yes | Issue title |
+| `description` | string | No | Full issue body (markdown) |
+| `label` | StateLabel | No | State label. Defaults to `"Planning"`. |
+| `assignees` | string[] | No | GitHub/GitLab usernames to assign |
+| `pickup` | boolean | No | If true, immediately pick up for DEV after creation |
+
+**Use cases:**
+
+- Orchestrator creates tasks from chat messages
+- Workers file follow-up bugs discovered during development
+- Breaking down epics into smaller tasks
+
+**Default behavior:** Creates issues in `"Planning"` state. Only use `"To Do"` when the user explicitly requests immediate work.
+
+---
+
+### `task_update`
+
+Change an issue's state label manually without going through the full pickup/complete flow.
+
+**Source:** [`lib/tools/task-update.ts`](../lib/tools/task-update.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | Yes | Project group ID |
+| `issueId` | number | Yes | Issue ID to update |
+| `state` | StateLabel | Yes | New state label |
+| `reason` | string | No | Audit log reason for the change |
+
+**Valid states:** `Planning`, `To Do`, `Doing`, `To Test`, `Testing`, `Done`, `To Improve`, `Refining`
+
+**Use cases:**
+
+- Manual state adjustments (e.g. `Planning → To Do` after approval)
+- Failed auto-transitions that need correction
+- Bulk state changes by orchestrator
+
+---
+
+### `task_comment`
+
+Add a comment to an issue for feedback, notes, or discussion.
+
+**Source:** [`lib/tools/task-comment.ts`](../lib/tools/task-comment.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | Yes | Project group ID |
+| `issueId` | number | Yes | Issue ID to comment on |
+| `body` | string | Yes | Comment body (markdown) |
+| `authorRole` | `"dev"` \| `"qa"` \| `"orchestrator"` | No | Attribution role prefix |
+
+**Use cases:**
+
+- QA adds review feedback before pass/fail decision
+- DEV posts implementation notes or progress updates
+- Orchestrator adds summary comments
+
+When `authorRole` is provided, the comment is prefixed with a role emoji and attribution label.
+
+---
+
+## Operations
+
+### `status`
+
+Lightweight queue + worker state dashboard.
+
+**Source:** [`lib/tools/status.ts`](../lib/tools/status.ts)
+
+**Context:** Auto-filters to project in group chats. Shows all projects in DMs.
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | No | Filter to specific project. Omit for all. |
+
+**Returns per project:**
+
+- Worker state: active/idle, current issue, level, start time
+- Queue counts: To Do, To Test, To Improve
+- Role execution mode
+
+---
+
+### `health`
+
+Worker health scan with optional auto-fix.
+
+**Source:** [`lib/tools/health.ts`](../lib/tools/health.ts)
+
+**Context:** Auto-filters to project in group chats.
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | No | Filter to specific project. Omit for all. |
+| `fix` | boolean | No | Apply fixes for detected issues. Default: `false` (read-only). |
+| `activeSessions` | string[] | No | Active session IDs for zombie detection. |
+
+**Health checks:**
+
+| Issue | Severity | Detection | Auto-fix |
+|---|---|---|---|
+| Active worker with no session key | Critical | `active=true` but no session in map | Deactivate worker |
+| Active worker whose session is dead | Critical | Session key not in active sessions list | Deactivate worker, revert label |
+| Worker active >2 hours | Warning | `startTime` older than 2h | Deactivate worker, revert label to queue |
+| Inactive worker with lingering issue ID | Warning | `active=false` but `issueId` still set | Clear issueId |
+
+---
+
+### `work_heartbeat`
+
+Manual trigger for heartbeat: health fix + queue dispatch. Same logic as the background heartbeat service, but invoked on demand.
+
+**Source:** [`lib/tools/work-heartbeat.ts`](../lib/tools/work-heartbeat.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | No | Target single project. Omit for all. |
+| `dryRun` | boolean | No | Report only, don't dispatch. Default: `false`. |
+| `maxPickups` | number | No | Max worker dispatches per tick. |
+| `activeSessions` | string[] | No | Active session IDs for zombie detection. |
+
+**Two-pass sweep:**
+
+1. **Health pass** — Runs `checkWorkerHealth` per project per role. Auto-fixes zombies, stale workers, orphaned state.
+2. **Tick pass** — Calls `projectTick` per project. Fills free worker slots by priority (To Improve > To Test > To Do).
+
+**Execution guards:**
+
+- `projectExecution: "sequential"` — only one project active at a time
+- `roleExecution: "sequential"` — only one role (DEV or QA) active at a time per project (enforced in `projectTick`)
+
+---
+
+## Setup
+
+### `project_register`
+
+One-time project setup. Creates state labels, scaffolds prompt files, adds project to state.
+
+**Source:** [`lib/tools/project-register.ts`](../lib/tools/project-register.ts)
+
+**Context:** Only works in the Telegram/WhatsApp group being registered.
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `projectGroupId` | string | No | Auto-detected from current group if omitted |
+| `name` | string | Yes | Short project name (e.g. `my-webapp`) |
+| `repo` | string | Yes | Path to git repo (e.g. `~/git/my-project`) |
+| `groupName` | string | No | Display name. Defaults to `Project: {name}`. |
+| `baseBranch` | string | Yes | Base branch for development |
+| `deployBranch` | string | No | Deploy branch. Defaults to baseBranch. |
+| `deployUrl` | string | No | Deployment URL |
+| `roleExecution` | `"parallel"` \| `"sequential"` | No | DEV/QA parallelism. Default: `"parallel"`. |
+
+**What it does atomically:**
+
+1. Validates project not already registered
+2. Resolves repo path, auto-detects GitHub/GitLab from git remote
+3. Verifies provider health (CLI installed and authenticated)
+4. Creates all 8 state labels (idempotent — safe to run again)
+5. Adds project entry to `projects.json` with empty worker state
+ - DEV sessions: `{ junior: null, medior: null, senior: null }`
+ - QA sessions: `{ reviewer: null, tester: null }`
+6. Scaffolds prompt files: `projects/roles//dev.md` and `qa.md`
+7. Writes audit log
+
+---
+
+### `setup`
+
+Agent + workspace initialization.
+
+**Source:** [`lib/tools/setup.ts`](../lib/tools/setup.ts)
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `newAgentName` | string | No | Create a new agent. Omit to configure current workspace. |
+| `channelBinding` | `"telegram"` \| `"whatsapp"` | No | Channel to bind (with `newAgentName` only) |
+| `migrateFrom` | string | No | Agent ID to migrate channel binding from |
+| `models` | object | No | Model overrides per role and level (see [Configuration](CONFIGURATION.md#model-tiers)) |
+| `projectExecution` | `"parallel"` \| `"sequential"` | No | Project execution mode |
+
+**What it does:**
+
+1. Creates a new agent or configures existing workspace
+2. Optionally binds messaging channel (Telegram/WhatsApp)
+3. Optionally migrates channel binding from another agent
+4. Writes workspace files: AGENTS.md, HEARTBEAT.md, `projects/projects.json`
+5. Configures model tiers in `openclaw.json`
+
+---
+
+### `onboard`
+
+Conversational onboarding guide. Returns step-by-step instructions for the agent to walk the user through setup.
+
+**Source:** [`lib/tools/onboard.ts`](../lib/tools/onboard.ts)
+
+**Context:** Works in DMs and via-agent. Blocks group chats (setup should not happen in project groups).
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `mode` | `"first-run"` \| `"reconfigure"` | No | Auto-detected from current state |
+
+**Flow:**
+
+1. Call `onboard` — returns QA-style step-by-step instructions
+2. Agent walks user through: agent selection, channel binding, model tiers
+3. Agent calls `setup` with collected answers
+4. User registers projects via `project_register` in group chats
+
+---
+
+## Completion Rules Reference
+
+The pipeline service (`lib/services/pipeline.ts`) defines declarative completion rules:
+
+```
+dev:done → Doing → To Test (git pull, detect PR)
+dev:blocked → Doing → To Do (return to queue)
+qa:pass → Testing → Done (close issue)
+qa:fail → Testing → To Improve (reopen issue)
+qa:refine → Testing → Refining (await human decision)
+qa:blocked → Testing → To Test (return to QA queue)
+```
+
+## Issue Priority Order
+
+When the heartbeat or `work_heartbeat` fills free worker slots, issues are prioritized:
+
+1. **To Improve** — QA failures get fixed first (highest priority)
+2. **To Test** — Completed DEV work gets reviewed next
+3. **To Do** — Fresh tasks are picked up last
+
+This ensures the pipeline clears its backlog before starting new work.
From 479910848979db749f46185724ca7fb4ed8baf09 Mon Sep 17 00:00:00 2001
From: Claude
Date: Wed, 11 Feb 2026 01:12:54 +0000
Subject: [PATCH 02/14] =?UTF-8?q?docs:=20refine=20README=20structure=20?=
=?UTF-8?q?=E2=80=94=20add=20why=20section,=20simplify=20onboarding,=20mov?=
=?UTF-8?q?e=20diagrams?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
- Add "Why DevClaw" paragraph explaining the gap between raw OpenClaw and development orchestration
- Rename "Shared sessions" to "Session re-use (context preservation)" in token savings
- Add "External task state" benefit covering GitHub/GitLab integration and pluggable IssueProvider
- Simplify installation to conversational onboarding with full example dialogue
- Move "How it works" and "Session reuse" diagrams to ARCHITECTURE.md (keep reference)
- Add Architecture section with link to detailed technical documentation
- Explain tools as guardrails that encode operations as deterministic code
https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
---
README.md | 135 ++++++++++++++-----------------------------
docs/ARCHITECTURE.md | 52 +++++++++++++++++
2 files changed, 94 insertions(+), 93 deletions(-)
diff --git a/README.md b/README.md
index 5f8c3a5..13d16cb 100644
--- a/README.md
+++ b/README.md
@@ -10,6 +10,10 @@ Add an agent to a Telegram/WhatsApp group, point it at a GitHub/GitLab repo —
DevClaw is the [OpenClaw](https://openclaw.ai) plugin that makes this work.
+## Why DevClaw
+
+OpenClaw gives you a powerful multi-agent runtime — channel bindings, session management, tool permissions, gateway RPC. But it's a general-purpose platform. It doesn't know what "pick up an issue" means, how to transition a label, when to reuse a session, or how to chain DEV completion into QA review. Managing a development workflow on raw OpenClaw means the orchestrator agent handles all of that through fragile, token-expensive LLM reasoning — and it gets it wrong often enough to need constant supervision. DevClaw encodes the entire development lifecycle into deterministic plugin code: level assignment, label transitions, session dispatch, auto-chaining, health checks, and audit logging. The agent calls one tool. The plugin does the rest. That's the difference between "an agent that can write code" and "a team that ships autonomously."
+
## Benefits
### Process consistency
@@ -22,7 +26,7 @@ DevClaw reduces token consumption at three levels:
| Mechanism | How it works | Estimated savings |
|---|---|---|
-| **Shared sessions** | Each developer level per role maintains one persistent session per project. When a medior dev finishes task A and picks up task B, the plugin reuses the existing session — no codebase re-reading. | **~40-60%** per task (~50K tokens saved per session reuse) |
+| **Session re-use (context preservation)** | Each developer level per role maintains one persistent session per project. When a medior dev finishes task A and picks up task B, the accumulated codebase context carries over — no re-reading the repo. | **~40-60%** per task (~50K context tokens saved per reuse) |
| **Tier selection** | Junior for typos (Haiku), medior for features (Sonnet), senior for architecture (Opus). The right model for the job means you're not burning Opus tokens on a CSS fix. | **~30-50%** on simple tasks vs. always using the largest model |
| **Token-free heartbeat** | The heartbeat service runs every 60s doing health checks and queue dispatch using pure deterministic code + CLI calls. Zero LLM tokens consumed. Workers only use tokens when they actually process tasks. | **100%** savings on orchestration overhead |
@@ -33,6 +37,10 @@ Each project is fully isolated — separate task queue, separate worker state, s
- **Project-level**: DEV and QA can work simultaneously on different tasks (parallel, default) or one role at a time (sequential)
- **Plugin-level**: Multiple projects can have active workers at once (parallel, default) or only one project active at a time (sequential)
+### External task state (pluggable issue trackers)
+
+Task state lives in your issue tracker — not in DevClaw's internal storage. Every label transition, issue creation, and status query goes through the `IssueProvider` interface, which abstracts the underlying tracker. GitHub Issues and GitLab Issues are supported today (auto-detected from your git remote); the pluggable architecture means any task manager — Jira, Linear, or a custom system — can be added by implementing the same interface. This gives you full visibility in your existing tools: filter by DevClaw labels in GitHub, build dashboards in GitLab, set up webhooks on label changes. The issue tracker is the source of truth; DevClaw reads from it and writes to it, but never replaces it.
+
### Continuous planning
The heartbeat service runs a continuous loop: health check → queue scan → dispatch. It detects stale workers (>2 hours), auto-reverts stuck labels, and fills free worker slots — all without human intervention or agent LLM tokens. The orchestrator agent only gets involved when a decision requires judgment.
@@ -170,44 +178,37 @@ Three layers guarantee tasks never get stuck:
cp -r devclaw ~/.openclaw/extensions/
```
-Verify:
+### Onboarding
-```bash
-openclaw plugins list
-# Should show: DevClaw | devclaw | loaded
+The recommended way to set up DevClaw is through a conversation with any agent that has the plugin loaded. Call the `onboard` tool and the agent walks you through everything step by step:
+
+```
+You: "Help me set up DevClaw"
+Agent: calls onboard({ mode: "first-run" })
+
+Agent: "I'll help you set up DevClaw. First — should I configure this agent
+ as the orchestrator, or create a new one?"
+You: "Use this agent"
+
+Agent: "Got it. Do you want to bind a messaging channel? (telegram/whatsapp/none)"
+You: "Telegram"
+
+Agent: "Default model tiers:
+ DEV junior → Haiku, medior → Sonnet, senior → Opus
+ QA reviewer → Sonnet, tester → Haiku
+ Accept defaults or customize?"
+You: "Defaults are fine"
+
+Agent: calls setup({ agentId: "my-agent", channelBinding: "telegram", ... })
+Agent: "Done! Workspace configured. Want to register a project now?"
+You: "Yes — register my-app at ~/git/my-app"
+
+Agent: calls project_register({ ... })
+Agent: "Project registered. 8 labels created, role instructions scaffolded.
+ Try: 'check the queue' to see pending issues."
```
-### Run setup
-
-Three options — pick one:
-
-**Option A: Conversational onboarding (recommended)**
-
-Call the `onboard` tool from any agent with DevClaw loaded. It walks through configuration step by step.
-
-**Option B: CLI wizard**
-
-```bash
-openclaw devclaw setup
-```
-
-**Option C: Non-interactive CLI**
-
-```bash
-openclaw devclaw setup --new-agent "My Orchestrator"
-```
-
-Setup creates an agent, configures model tiers, writes workspace files (AGENTS.md, HEARTBEAT.md, role templates), and optionally binds a messaging channel.
-
-### Register a project
-
-In the Telegram/WhatsApp group for the project:
-
-> "Register project my-app at ~/git/my-app with base branch main"
-
-The agent calls `project_register`, which atomically creates all 8 state labels, scaffolds role instruction files, and adds the project to `projects.json`.
-
-### Start working
+After setup, work flows naturally through conversation in your project's group chat:
```
"Check the queue" → agent calls status
@@ -216,72 +217,20 @@ The agent calls `project_register`, which atomically creates all 8 state labels,
[Heartbeat fills next slot] → QA dispatched automatically
```
-See the [Onboarding Guide](docs/ONBOARDING.md) for detailed step-by-step instructions.
+DevClaw also supports a [CLI wizard and non-interactive setup](docs/ONBOARDING.md#step-2-run-setup) for scripted or headless environments. See the [Onboarding Guide](docs/ONBOARDING.md) for the full step-by-step reference.
---
-## How it works
+## Architecture
-```mermaid
-graph TB
- subgraph "Group Chat A"
- direction TB
- A_O["Orchestrator"]
- A_GL[GitHub/GitLab Issues]
- A_DEV["DEV (worker session)"]
- A_QA["QA (worker session)"]
- A_O -->|work_start| A_GL
- A_O -->|dispatches| A_DEV
- A_O -->|dispatches| A_QA
- end
-
- subgraph "Group Chat B"
- direction TB
- B_O["Orchestrator"]
- B_GL[GitHub/GitLab Issues]
- B_DEV["DEV (worker session)"]
- B_QA["QA (worker session)"]
- B_O -->|work_start| B_GL
- B_O -->|dispatches| B_DEV
- B_O -->|dispatches| B_QA
- end
-
- AGENT["Single OpenClaw Agent"]
- AGENT --- A_O
- AGENT --- B_O
-```
-
-Same agent process — each group chat gives it a different project context. The orchestrator role, the workers, the task queue, and all state are fully isolated per group.
-
----
-
-## Session reuse
-
-Worker sessions are expensive to start — each new spawn reads the full codebase (~50K tokens). DevClaw maintains **separate sessions per level per role** (session-per-level design). When a medior dev finishes task A and picks up task B on the same project, the plugin detects the existing session and sends the task directly.
-
-The plugin handles session dispatch internally via OpenClaw CLI. The orchestrator agent never calls `sessions_spawn` or `sessions_send` — it calls `work_start` and the plugin does the rest.
-
-```mermaid
-sequenceDiagram
- participant O as Orchestrator
- participant DC as DevClaw Plugin
- participant IT as Issue Tracker
- participant S as Worker Session
-
- O->>DC: work_start({ issueId: 42, role: "dev" })
- DC->>IT: Fetch issue, verify label
- DC->>DC: Assign level (junior/medior/senior)
- DC->>DC: Check existing session for assigned level
- DC->>IT: Transition label (To Do → Doing)
- DC->>S: Dispatch task via CLI (create or reuse session)
- DC->>DC: Update projects.json, write audit log
- DC-->>O: { success: true, announcement: "..." }
-```
+For detailed technical diagrams — system overview, end-to-end flows, session-per-level design, session reuse mechanics, data flow map, and the complete ticket lifecycle from creation to completion — see the [Architecture documentation](docs/ARCHITECTURE.md).
---
## Tools
+DevClaw's tools are the guardrails that make autonomous development reliable. Without them, an LLM orchestrator would need to reason about label transitions, session lifecycle, state serialization, and audit logging on every action — and get it wrong often enough to require constant supervision. Each tool encodes one operation as deterministic code: the agent provides intent ("pick up issue #42 for DEV"), the tool handles the mechanics (verify label, resolve level, transition state, dispatch session, log event, return announcement). The agent can't skip a step, use the wrong label, or forget to update state — those decisions are made by the plugin, not the model.
+
DevClaw registers **11 tools**, grouped by function:
### Worker lifecycle
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
index 92b8251..ec0a630 100644
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -1,5 +1,57 @@
# DevClaw — Architecture & Component Interaction
+## How it works
+
+One OpenClaw agent process serves multiple group chats — each group gives it a different project context. The orchestrator role, the workers, the task queue, and all state are fully isolated per group.
+
+```mermaid
+graph TB
+ subgraph "Group Chat A"
+ direction TB
+ A_O["Orchestrator"]
+ A_GL[GitHub/GitLab Issues]
+ A_DEV["DEV (worker session)"]
+ A_QA["QA (worker session)"]
+ A_O -->|work_start| A_GL
+ A_O -->|dispatches| A_DEV
+ A_O -->|dispatches| A_QA
+ end
+
+ subgraph "Group Chat B"
+ direction TB
+ B_O["Orchestrator"]
+ B_GL[GitHub/GitLab Issues]
+ B_DEV["DEV (worker session)"]
+ B_QA["QA (worker session)"]
+ B_O -->|work_start| B_GL
+ B_O -->|dispatches| B_DEV
+ B_O -->|dispatches| B_QA
+ end
+
+ AGENT["Single OpenClaw Agent"]
+ AGENT --- A_O
+ AGENT --- B_O
+```
+
+Worker sessions are expensive to start — each new spawn reads the full codebase (~50K tokens). DevClaw maintains **separate sessions per level per role** ([session-per-level design](#session-per-level-design)). When a medior dev finishes task A and picks up task B on the same project, the accumulated context carries over — no re-reading the repo. The plugin handles all session dispatch internally via OpenClaw CLI; the orchestrator agent never calls `sessions_spawn` or `sessions_send`.
+
+```mermaid
+sequenceDiagram
+ participant O as Orchestrator
+ participant DC as DevClaw Plugin
+ participant IT as Issue Tracker
+ participant S as Worker Session
+
+ O->>DC: work_start({ issueId: 42, role: "dev" })
+ DC->>IT: Fetch issue, verify label
+ DC->>DC: Assign level (junior/medior/senior)
+ DC->>DC: Check existing session for assigned level
+ DC->>IT: Transition label (To Do → Doing)
+ DC->>S: Dispatch task via CLI (create or reuse session)
+ DC->>DC: Update projects.json, write audit log
+ DC-->>O: { success: true, announcement: "..." }
+```
+
## Agents vs Sessions
Understanding the OpenClaw model is key to understanding how DevClaw works:
From 1c68113ef4e49a8f330a278e905053ff6470b9af Mon Sep 17 00:00:00 2001
From: Claude
Date: Wed, 11 Feb 2026 01:37:09 +0000
Subject: [PATCH 03/14] =?UTF-8?q?docs:=20add=20README2.md=20=E2=80=94=20fr?=
=?UTF-8?q?esh=20rewrite=20for=20OpenClaw=20users?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Complete from-scratch rewrite of the README with an OpenClaw user perspective:
- Opens with a concrete group chat interaction showing the full flow
- Frames the problem as "babysitting the thing you built to avoid babysitting"
- "Meet your team" section makes the model mapping fun and relatable
- Pipeline explained as a story, not a spec
- Behind-the-scenes section covers session reuse, heartbeat, auto-chaining
- Issue tracker integration framed as "your issues, your tracker"
- Onboarding shown as a natural conversation
- Tools framed as guardrails, not API endpoints
https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
---
README2.md | 304 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 304 insertions(+)
create mode 100644 README2.md
diff --git a/README2.md b/README2.md
new file mode 100644
index 0000000..b1cc988
--- /dev/null
+++ b/README2.md
@@ -0,0 +1,304 @@
+
+
+
+
+# DevClaw
+
+**Turn any group chat into a dev team that ships.**
+
+DevClaw is a plugin for [OpenClaw](https://openclaw.ai) that turns your orchestrator agent into a development manager. It hires developers, assigns tasks, reviews code, and keeps the pipeline moving — across as many projects as you have group chats.
+
+---
+
+## What it looks like
+
+Add your OpenClaw agent to a Telegram group. Register a project. That's it — you now have a dev team:
+
+```
+You: "Check the queue"
+Agent: "3 issues in To Do. DEV is idle. QA is idle."
+
+You: "Pick up #42 for DEV"
+Agent: "⚡ Sending DEV (medior) for #42: Add login page"
+ (a Sonnet session opens, reads the repo, starts coding)
+
+ ... 10 minutes later ...
+
+Agent: "✅ DEV DONE #42 — Login page with OAuth. Moved to QA."
+ (a reviewer session opens automatically, starts reviewing)
+
+ ... 5 minutes later ...
+
+Agent: "🎉 QA PASS #42. Issue closed."
+```
+
+No configuration between those steps. No manual handoff. The developer finished, QA started automatically, the issue closed itself. You watched it happen in your group chat.
+
+Add another group → another project. Same agent, fully isolated teams.
+
+---
+
+## The problem DevClaw solves
+
+OpenClaw is a great multi-agent runtime. It handles sessions, tools, channels, gateway RPC — everything you need to run AI agents. But it's a general-purpose platform. It has no opinion about how software gets built.
+
+Without DevClaw, your orchestrator agent has to figure out on its own how to:
+- Pick the right model for the task complexity
+- Create or reuse the right worker session
+- Transition issue labels in the right order
+- Track which worker is doing what across projects
+- Chain DEV completion into QA review
+- Detect crashed workers and recover
+- Log everything for auditability
+
+That's a lot of reasoning per task. LLMs do it imperfectly — they forget steps, corrupt state, pick the wrong model, lose session references. You end up babysitting the thing you built to avoid babysitting.
+
+DevClaw moves all of that into deterministic plugin code. The agent says "pick up issue #42." The plugin handles the other 10 steps atomically. Every time, the same way, zero reasoning tokens spent on orchestration.
+
+---
+
+## Meet your team
+
+DevClaw doesn't think in model IDs. It thinks in people.
+
+When a task comes in, you don't configure `anthropic/claude-sonnet-4-5` — you assign a **medior developer**. The orchestrator evaluates task complexity and picks the right person for the job:
+
+### Developers
+
+| Who | What they do | Under the hood |
+|---|---|---|
+| **Junior** | Typos, CSS fixes, renames, single-file changes | Haiku |
+| **Medior** | Features, bug fixes, multi-file changes | Sonnet |
+| **Senior** | Architecture, migrations, system-wide refactoring | Opus |
+
+### QA
+
+| Who | What they do | Under the hood |
+|---|---|---|
+| **Reviewer** | Code review, test validation, PR inspection | Sonnet |
+| **Tester** | Manual testing, smoke tests | Haiku |
+
+A CSS typo gets the intern. A database migration gets the architect. You're not burning Opus tokens on a color change, and you're not sending Haiku to redesign your auth system.
+
+Every mapping is [configurable](docs/CONFIGURATION.md#model-tiers) — swap in any model you want per level.
+
+---
+
+## How a task moves through the pipeline
+
+Every issue follows the same path, no exceptions. DevClaw enforces it:
+
+```
+Planning → To Do → Doing → To Test → Testing → Done
+```
+
+```mermaid
+stateDiagram-v2
+ [*] --> Planning
+ Planning --> ToDo: Ready for development
+
+ ToDo --> Doing: DEV picks up
+ Doing --> ToTest: DEV done
+
+ ToTest --> Testing: QA picks up (or auto-chains)
+ Testing --> Done: QA pass (issue closed)
+ Testing --> ToImprove: QA fail (back to DEV)
+ Testing --> Refining: QA needs human input
+
+ ToImprove --> Doing: DEV fixes (or auto-chains)
+ Refining --> ToDo: Human decides
+
+ Done --> [*]
+```
+
+These labels live on your actual GitHub/GitLab issues. Not in some internal database — in the tool you already use. Filter by `Doing` in GitHub to see what's in progress. Set up a webhook on `Done` to trigger deploys. The issue tracker is the source of truth.
+
+### What "atomic" means here
+
+When you say "pick up #42 for DEV", the plugin does all of this in one operation:
+1. Verifies the issue is in the right state
+2. Picks the developer level (or uses what you specified)
+3. Transitions the label (`To Do` → `Doing`)
+4. Creates or reuses the right worker session
+5. Dispatches the task with project-specific instructions
+6. Updates internal state
+7. Logs an audit entry
+
+If step 4 fails, step 3 is rolled back. No half-states, no orphaned labels, no "the issue says Doing but nobody's working on it."
+
+---
+
+## What happens behind the scenes
+
+### Workers report back themselves
+
+When a developer finishes, they call `work_finish` directly — no orchestrator involved:
+
+- **DEV "done"** → label moves to `To Test`, QA starts automatically
+- **DEV "blocked"** → label moves back to `To Do`, task returns to queue
+- **QA "pass"** → label moves to `Done`, issue closes
+- **QA "fail"** → label moves to `To Improve`, DEV gets re-dispatched
+
+The orchestrator doesn't need to poll, check, or coordinate. Workers are self-reporting.
+
+### Sessions accumulate context
+
+Each developer level gets its own persistent session per project. Your medior dev that's done 5 features on `my-app` already knows the codebase — it doesn't re-read 50K tokens of source code every time it picks up a new task.
+
+That's a **~40-60% token saving per task** from session reuse alone.
+
+Combined with tier selection (not using Opus when Haiku will do) and the token-free heartbeat (more on that next), DevClaw significantly reduces your token bill versus running everything through one large model.
+
+### The heartbeat runs for free
+
+Every 60 seconds, a background service:
+- Checks if any workers have been stuck for >2 hours (reverts them)
+- Scans the queue for available tasks
+- Dispatches workers to fill empty slots
+
+All of this is deterministic code — CLI calls and JSON reads. Zero LLM tokens. Workers only consume tokens when they're actually writing code or reviewing PRs.
+
+### Everything is logged
+
+Every tool call writes an NDJSON line to `audit.log`:
+
+```bash
+cat audit.log | jq 'select(.event=="work_start")'
+```
+
+Full trace of every task, every level selection, every label transition, every health fix. No manual logging needed.
+
+---
+
+## Your issues, your tracker
+
+DevClaw doesn't replace your issue tracker — it uses it. All task state lives in GitHub Issues or GitLab Issues (auto-detected from your git remote). The eight pipeline labels are created on your repo when you register a project.
+
+The abstraction layer (`IssueProvider`) is pluggable. GitHub and GitLab work today. Jira, Linear, or anything else just needs to implement the same interface.
+
+This means:
+- Your project manager sees task progress in GitHub/GitLab without knowing DevClaw exists
+- Your CI/CD can trigger on label changes
+- Your existing dashboards and filters keep working
+- If you stop using DevClaw, your issues and labels stay exactly where they are
+
+---
+
+## Custom instructions per project
+
+Each project gets its own instruction files that workers receive with every task:
+
+```
+workspace/projects/roles/
+├── my-webapp/
+│ ├── dev.md "Run npm test before committing. Deploy URL: staging.example.com"
+│ └── qa.md "Check OAuth flow. Verify mobile responsiveness."
+├── my-api/
+│ ├── dev.md "Run cargo test. Follow REST conventions in CONTRIBUTING.md"
+│ └── qa.md "Verify all endpoints return correct status codes."
+└── default/
+ ├── dev.md (fallback for projects without custom instructions)
+ └── qa.md
+```
+
+Deployment steps, test commands, coding standards, acceptance criteria — all injected automatically at dispatch time.
+
+---
+
+## Getting started
+
+### Prerequisites
+
+- [OpenClaw](https://openclaw.ai) installed (`openclaw --version`)
+- Node.js >= 20
+- `gh` CLI ([GitHub](https://cli.github.com)) or `glab` CLI ([GitLab](https://gitlab.com/gitlab-org/cli)), authenticated
+
+### Install
+
+```bash
+cp -r devclaw ~/.openclaw/extensions/
+```
+
+### Set up through conversation
+
+The easiest way to configure DevClaw is to just talk to your agent:
+
+```
+You: "Help me set up DevClaw"
+Agent: "I'll walk you through it. Should I use this agent as the
+ orchestrator, or create a new one?"
+You: "Use this one"
+
+Agent: "Want to bind a messaging channel?"
+You: "Telegram"
+
+Agent: "Here are the default developer assignments:
+ Junior → Haiku, Medior → Sonnet, Senior → Opus
+ Reviewer → Sonnet, Tester → Haiku
+ Keep these or customize?"
+You: "Keep them"
+
+Agent: "Done. Want to register a project?"
+You: "Yes — my-app at ~/git/my-app, main branch"
+
+Agent: "Project registered. 8 labels created on your repo.
+ Role instructions scaffolded. Try: 'check the queue'"
+```
+
+You can also use the [CLI wizard or non-interactive setup](docs/ONBOARDING.md#step-2-run-setup) for scripted environments.
+
+---
+
+## The toolbox
+
+DevClaw gives the orchestrator 11 tools. These aren't just convenience wrappers — they're **guardrails**. Each tool encodes a complex multi-step operation into a single atomic call. The agent provides intent, the plugin handles mechanics. The agent physically cannot skip a label transition, forget to update state, or dispatch to the wrong session — those decisions are made by deterministic code, not LLM reasoning.
+
+| Tool | What it does |
+|---|---|
+| `work_start` | Pick up a task — resolves level, transitions label, dispatches session, logs audit |
+| `work_finish` | Complete a task — transitions label, updates state, auto-chains next step, ticks queue |
+| `task_create` | Create a new issue (used by workers to file bugs they discover) |
+| `task_update` | Manually change an issue's state label |
+| `task_comment` | Add a comment to an issue (with role attribution) |
+| `status` | Dashboard: queue counts + who's working on what |
+| `health` | Detect zombie workers, stale sessions, state inconsistencies |
+| `work_heartbeat` | Manually trigger a health check + queue dispatch cycle |
+| `project_register` | One-time project setup: creates labels, scaffolds instructions, initializes state |
+| `setup` | Agent + workspace initialization |
+| `onboard` | Conversational setup guide |
+
+Full parameters and usage in the [Tools Reference](docs/TOOLS.md).
+
+---
+
+## Parallel everything
+
+Each project is fully isolated — its own queue, workers, sessions, and state. No cross-project contamination.
+
+Two execution modes:
+- **Project-level** — DEV and QA work simultaneously on different tasks (default) or take turns
+- **Plugin-level** — all projects run in parallel (default) or one at a time
+
+One agent, many groups, many projects, all at once.
+
+---
+
+## Documentation
+
+| | |
+|---|---|
+| **[Architecture](docs/ARCHITECTURE.md)** | System design, session model, data flow, end-to-end diagrams |
+| **[Tools Reference](docs/TOOLS.md)** | Complete reference for all 11 tools |
+| **[Configuration](docs/CONFIGURATION.md)** | `openclaw.json`, `projects.json`, heartbeat, notifications |
+| **[Onboarding Guide](docs/ONBOARDING.md)** | Full step-by-step setup |
+| **[QA Workflow](docs/QA_WORKFLOW.md)** | QA process and review templates |
+| **[Context Awareness](docs/CONTEXT-AWARENESS.md)** | How tools adapt to group vs. DM vs. agent context |
+| **[Testing](docs/TESTING.md)** | Test suite, fixtures, CI/CD |
+| **[Management Theory](docs/MANAGEMENT.md)** | The delegation model behind the design |
+| **[Roadmap](docs/ROADMAP.md)** | What's coming next |
+
+---
+
+## License
+
+MIT
From 7a59c7670958c7814c104ae6b16e05f7c173fa30 Mon Sep 17 00:00:00 2001
From: Claude
Date: Wed, 11 Feb 2026 02:06:38 +0000
Subject: [PATCH 04/14] docs: add task management section, auto-scheduling with
configs, simplify team tables
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
- Bundle task management: issue tracker integration, creating/updating/commenting,
custom instructions per project — all in one section
- Add automatic scheduling section: heartbeat service, auto-chaining, execution
modes, with full config snippets and settings table
- Remove standalone "Parallel everything" section (folded into scheduling)
- Simplify team tables: Level / Assigns to / Model columns
https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
---
README2.md | 137 ++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 103 insertions(+), 34 deletions(-)
diff --git a/README2.md b/README2.md
index b1cc988..ed066a6 100644
--- a/README2.md
+++ b/README2.md
@@ -65,7 +65,7 @@ When a task comes in, you don't configure `anthropic/claude-sonnet-4-5` — you
### Developers
-| Who | What they do | Under the hood |
+| Level | Assigns to | Model |
|---|---|---|
| **Junior** | Typos, CSS fixes, renames, single-file changes | Haiku |
| **Medior** | Features, bug fixes, multi-file changes | Sonnet |
@@ -73,7 +73,7 @@ When a task comes in, you don't configure `anthropic/claude-sonnet-4-5` — you
### QA
-| Who | What they do | Under the hood |
+| Level | Assigns to | Model |
|---|---|---|
| **Reviewer** | Code review, test validation, PR inspection | Sonnet |
| **Tester** | Manual testing, smoke tests | Haiku |
@@ -149,15 +149,6 @@ That's a **~40-60% token saving per task** from session reuse alone.
Combined with tier selection (not using Opus when Haiku will do) and the token-free heartbeat (more on that next), DevClaw significantly reduces your token bill versus running everything through one large model.
-### The heartbeat runs for free
-
-Every 60 seconds, a background service:
-- Checks if any workers have been stuck for >2 hours (reverts them)
-- Scans the queue for available tasks
-- Dispatches workers to fill empty slots
-
-All of this is deterministic code — CLI calls and JSON reads. Zero LLM tokens. Workers only consume tokens when they're actually writing code or reviewing PRs.
-
### Everything is logged
Every tool call writes an NDJSON line to `audit.log`:
@@ -170,23 +161,113 @@ Full trace of every task, every level selection, every label transition, every h
---
-## Your issues, your tracker
+## Automatic scheduling
-DevClaw doesn't replace your issue tracker — it uses it. All task state lives in GitHub Issues or GitLab Issues (auto-detected from your git remote). The eight pipeline labels are created on your repo when you register a project.
+DevClaw doesn't wait for you to tell it what to do next. A background heartbeat service continuously scans for available work and dispatches workers — zero LLM tokens, pure deterministic code.
-The abstraction layer (`IssueProvider`) is pluggable. GitHub and GitLab work today. Jira, Linear, or anything else just needs to implement the same interface.
+### The heartbeat
-This means:
-- Your project manager sees task progress in GitHub/GitLab without knowing DevClaw exists
-- Your CI/CD can trigger on label changes
-- Your existing dashboards and filters keep working
-- If you stop using DevClaw, your issues and labels stay exactly where they are
+Every tick, the service runs two passes:
+
+1. **Health pass** — detects workers stuck for >2 hours, reverts their labels back to queue, deactivates them. Catches crashed sessions, context overflows, or workers that died without reporting back.
+2. **Queue pass** — scans for available tasks by priority (`To Improve` > `To Test` > `To Do`), fills free worker slots. DEV and QA slots are filled independently.
+
+All CLI calls and JSON reads. Workers only consume tokens when they actually start coding or reviewing.
+
+### Auto-chaining
+
+When enabled, task completions automatically trigger the next step:
+
+- **DEV "done"** → QA reviewer is dispatched immediately
+- **QA "fail"** → DEV is re-dispatched at the same level that originally worked on it
+- **QA "pass"** → issue closes, pipeline done
+- **"blocked"** → task returns to queue for retry, no chaining
+
+No orchestrator involvement. The worker calls `work_finish`, the plugin handles the rest.
+
+### Execution modes
+
+Each project is fully isolated — its own queue, workers, sessions, state. No cross-project contamination. Two levels of parallelism control how work gets scheduled:
+
+- **Project-level (`roleExecution`)** — DEV and QA work simultaneously on different tasks (default: `parallel`) or take turns (`sequential`)
+- **Plugin-level (`projectExecution`)** — all registered projects dispatch workers independently (default: `parallel`) or only one project runs at a time (`sequential`)
+
+### Configuration
+
+All scheduling behavior is configurable in `openclaw.json`:
+
+```json
+{
+ "plugins": {
+ "entries": {
+ "devclaw": {
+ "config": {
+ "work_heartbeat": {
+ "enabled": true,
+ "intervalSeconds": 60,
+ "maxPickupsPerTick": 4
+ },
+ "projectExecution": "parallel"
+ }
+ }
+ }
+ }
+}
+```
+
+Per-project settings live in `projects.json`:
+
+```json
+{
+ "-1234567890": {
+ "name": "my-app",
+ "autoChain": true,
+ "roleExecution": "parallel"
+ }
+}
+```
+
+| Setting | Where | Default | What it controls |
+|---|---|---|---|
+| `work_heartbeat.enabled` | `openclaw.json` | `true` | Turn the heartbeat on/off |
+| `work_heartbeat.intervalSeconds` | `openclaw.json` | `60` | Seconds between ticks |
+| `work_heartbeat.maxPickupsPerTick` | `openclaw.json` | `4` | Max workers dispatched per tick |
+| `projectExecution` | `openclaw.json` | `"parallel"` | All projects at once, or one at a time |
+| `autoChain` | `projects.json` | `false` | Auto-dispatch next step on completion |
+| `roleExecution` | `projects.json` | `"parallel"` | DEV+QA at once, or one role at a time |
+
+See the [Configuration reference](docs/CONFIGURATION.md) for the full schema.
---
-## Custom instructions per project
+## Task management
-Each project gets its own instruction files that workers receive with every task:
+### Your issues stay in your tracker
+
+DevClaw doesn't have its own task database. All task state lives in **GitHub Issues** or **GitLab Issues** — auto-detected from your git remote. The eight pipeline labels are created on your repo when you register a project. Your project manager sees progress in GitHub without knowing DevClaw exists. Your CI/CD can trigger on label changes. If you stop using DevClaw, your issues and labels stay exactly where they are.
+
+The provider is pluggable (`IssueProvider` interface). GitHub and GitLab work today. Jira, Linear, or anything else just needs to implement the same interface.
+
+### Creating, updating, and commenting
+
+Tasks can come from anywhere — the orchestrator creates them from chat, workers file bugs they discover mid-task, or you create them directly in GitHub/GitLab:
+
+```
+You: "Create an issue: fix the broken OAuth redirect"
+Agent: creates issue #43 with label "Planning"
+
+You: "Move #43 to To Do"
+Agent: transitions label Planning → To Do
+
+You: "Add a comment on #42: needs to handle the edge case for expired tokens"
+Agent: adds comment attributed to "orchestrator"
+```
+
+Workers can also comment during work — QA leaves review feedback, DEV posts implementation notes. Every comment carries role attribution so you know who said what.
+
+### Custom instructions per project
+
+Each project gets instruction files that workers receive with every task they pick up:
```
workspace/projects/roles/
@@ -201,7 +282,7 @@ workspace/projects/roles/
└── qa.md
```
-Deployment steps, test commands, coding standards, acceptance criteria — all injected automatically at dispatch time.
+Deployment steps, test commands, coding standards, acceptance criteria — all injected at dispatch time, per project, per role.
---
@@ -271,18 +352,6 @@ Full parameters and usage in the [Tools Reference](docs/TOOLS.md).
---
-## Parallel everything
-
-Each project is fully isolated — its own queue, workers, sessions, and state. No cross-project contamination.
-
-Two execution modes:
-- **Project-level** — DEV and QA work simultaneously on different tasks (default) or take turns
-- **Plugin-level** — all projects run in parallel (default) or one at a time
-
-One agent, many groups, many projects, all at once.
-
----
-
## Documentation
| | |
From 089664a675152ba04d2d5f08a069182417355f90 Mon Sep 17 00:00:00 2001
From: Claude
Date: Wed, 11 Feb 2026 02:08:41 +0000
Subject: [PATCH 05/14] docs: rewrite 'what it looks like' to show
multi-project auto-scheduling
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Show two projects running overnight with heartbeat-driven dispatch,
auto-chaining, QA failures cycling back to DEV, and different developer
levels — all without human involvement. Manual mode shown as secondary.
https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
---
README2.md | 48 +++++++++++++++++++++++++++++++-----------------
1 file changed, 31 insertions(+), 17 deletions(-)
diff --git a/README2.md b/README2.md
index ed066a6..2062303 100644
--- a/README2.md
+++ b/README2.md
@@ -12,29 +12,43 @@ DevClaw is a plugin for [OpenClaw](https://openclaw.ai) that turns your orchestr
## What it looks like
-Add your OpenClaw agent to a Telegram group. Register a project. That's it — you now have a dev team:
+You have two projects in two Telegram groups. You go to bed. You wake up:
+
+```
+── Group: "Dev - My Webapp" ──────────────────────────────
+
+Agent: "⚡ Sending DEV (medior) for #42: Add login page"
+Agent: "✅ DEV DONE #42 — Login page with OAuth. Moved to QA."
+Agent: "🔍 Sending QA (reviewer) for #42: Add login page"
+Agent: "🎉 QA PASS #42. Issue closed."
+Agent: "⚡ Sending DEV (junior) for #43: Fix button color on /settings"
+Agent: "✅ DEV DONE #43 — Updated to brand blue. Moved to QA."
+Agent: "🔍 Sending QA (reviewer) for #43: Fix button color on /settings"
+Agent: "❌ QA FAIL #43 — Color doesn't match dark mode. Back to DEV."
+Agent: "⚡ Sending DEV (junior) for #43: Fix button color on /settings"
+
+── Group: "Dev - My API" ─────────────────────────────────
+
+Agent: "🧠 Spawning DEV (senior) for #18: Migrate auth to OAuth2"
+Agent: "✅ DEV DONE #18 — OAuth2 provider with refresh tokens. Moved to QA."
+Agent: "🔍 Sending QA (reviewer) for #18: Migrate auth to OAuth2"
+Agent: "🎉 QA PASS #18. Issue closed."
+Agent: "⚡ Sending DEV (medior) for #19: Add rate limiting to /api/search"
+```
+
+Three issues shipped, one sent back for a fix (and auto-retried), another project's migration completed — all while you slept. The heartbeat scanned the queues, dispatched workers, chained DEV into QA, and chained QA failures back to DEV. No human in the loop.
+
+You can also drive it manually:
```
You: "Check the queue"
-Agent: "3 issues in To Do. DEV is idle. QA is idle."
+Agent: "2 issues in To Do. DEV is idle. QA is idle."
-You: "Pick up #42 for DEV"
-Agent: "⚡ Sending DEV (medior) for #42: Add login page"
- (a Sonnet session opens, reads the repo, starts coding)
-
- ... 10 minutes later ...
-
-Agent: "✅ DEV DONE #42 — Login page with OAuth. Moved to QA."
- (a reviewer session opens automatically, starts reviewing)
-
- ... 5 minutes later ...
-
-Agent: "🎉 QA PASS #42. Issue closed."
+You: "Pick up #44 for DEV"
+Agent: "⚡ Sending DEV (medior) for #44: Refactor user profile page"
```
-No configuration between those steps. No manual handoff. The developer finished, QA started automatically, the issue closed itself. You watched it happen in your group chat.
-
-Add another group → another project. Same agent, fully isolated teams.
+Same agent, as many groups as you want, fully isolated teams per project.
---
From 40215f5c289d2ee542f868ef3310c413f8ebedcc Mon Sep 17 00:00:00 2001
From: Claude
Date: Wed, 11 Feb 2026 02:12:50 +0000
Subject: [PATCH 06/14] docs: show planning and steering in the same group chat
demo
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Manual interaction example now shows creating issues, sequencing work,
and parking tasks in Planning — all in the same conversation where
autonomous execution happens.
https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
---
README2.md | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/README2.md b/README2.md
index 2062303..8c4a40b 100644
--- a/README2.md
+++ b/README2.md
@@ -38,17 +38,24 @@ Agent: "⚡ Sending DEV (medior) for #19: Add rate limiting to /api/search"
Three issues shipped, one sent back for a fix (and auto-retried), another project's migration completed — all while you slept. The heartbeat scanned the queues, dispatched workers, chained DEV into QA, and chained QA failures back to DEV. No human in the loop.
-You can also drive it manually:
+You can also plan and steer in the same chat — the orchestrator is right there:
```
You: "Check the queue"
Agent: "2 issues in To Do. DEV is idle. QA is idle."
-You: "Pick up #44 for DEV"
+You: "We need to refactor the profile page before we add settings.
+ Create an issue for that, mark it as To Do, and pick it up."
+Agent: creates #44 "Refactor user profile page" with label "To Do"
Agent: "⚡ Sending DEV (medior) for #44: Refactor user profile page"
+
+You: "Also create an issue for adding the settings page.
+ Keep it in Planning until #44 is done."
+Agent: creates #45 "Add settings page" with label "Planning"
+Agent: "Parked in Planning. I'll move it to To Do once #44 ships."
```
-Same agent, as many groups as you want, fully isolated teams per project.
+Same group chat for planning, prioritizing, and execution. Same agent, as many groups as you want, fully isolated teams per project.
---
From 261babdf613f69521817692aab8f135588f088c0 Mon Sep 17 00:00:00 2001
From: Claude
Date: Wed, 11 Feb 2026 02:20:08 +0000
Subject: [PATCH 07/14] docs: merge demo into one continuous chat view with
external sync
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Single unbroken group-chat transcript now flows from autonomous work,
through human planning/steering, syncing issues to Linear, and back
to autonomous execution — showing everything happens in the same
conversation.
https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
---
README2.md | 31 ++++++++++++++++++++-----------
1 file changed, 20 insertions(+), 11 deletions(-)
diff --git a/README2.md b/README2.md
index 8c4a40b..88214c5 100644
--- a/README2.md
+++ b/README2.md
@@ -34,28 +34,37 @@ Agent: "✅ DEV DONE #18 — OAuth2 provider with refresh tokens. Moved to QA."
Agent: "🔍 Sending QA (reviewer) for #18: Migrate auth to OAuth2"
Agent: "🎉 QA PASS #18. Issue closed."
Agent: "⚡ Sending DEV (medior) for #19: Add rate limiting to /api/search"
-```
-Three issues shipped, one sent back for a fix (and auto-retried), another project's migration completed — all while you slept. The heartbeat scanned the queues, dispatched workers, chained DEV into QA, and chained QA failures back to DEV. No human in the loop.
+── You jump in ─────────────────────────────────────────
-You can also plan and steer in the same chat — the orchestrator is right there:
-
-```
-You: "Check the queue"
-Agent: "2 issues in To Do. DEV is idle. QA is idle."
+You: "How's the frontend looking?"
+Agent: "#41 shipped, #42 shipped, #43 back in DEV (dark-mode fix).
+ 1 issue left in To Do. DEV is working, QA is idle."
You: "We need to refactor the profile page before we add settings.
- Create an issue for that, mark it as To Do, and pick it up."
-Agent: creates #44 "Refactor user profile page" with label "To Do"
+ Create an issue for that, mark it high priority, and pick it up."
+Agent: creates #44 "Refactor user profile page" — To Do, priority: high
Agent: "⚡ Sending DEV (medior) for #44: Refactor user profile page"
You: "Also create an issue for adding the settings page.
Keep it in Planning until #44 is done."
-Agent: creates #45 "Add settings page" with label "Planning"
+Agent: creates #45 "Add settings page" — Planning
Agent: "Parked in Planning. I'll move it to To Do once #44 ships."
+
+You: "Push these to Linear."
+Agent: synced #44 → LIN-312 "Refactor user profile page"
+Agent: synced #45 → LIN-313 "Add settings page"
+Agent: "Both issues synced to Linear."
+
+── Autonomous work continues ───────────────────────────
+
+Agent: "✅ DEV DONE #43 — Fixed dark-mode color. Back to QA."
+Agent: "🎉 QA PASS #43. Issue closed."
+Agent: "✅ DEV DONE #44 — Profile page refactored. Moved to QA."
+Agent: "📋 Moving #45 to To Do — dependency #44 is in QA."
```
-Same group chat for planning, prioritizing, and execution. Same agent, as many groups as you want, fully isolated teams per project.
+Three issues shipped, one sent back for a fix (and auto-retried), another project's migration completed — all while you slept. And when you dropped in, you planned work, reprioritized, and synced to your external tracker without leaving the chat. The heartbeat kept going before, during, and after.
---
From 9d1e253f119bf08d1ab18b0da2a40c227ed62a8e Mon Sep 17 00:00:00 2001
From: Claude
Date: Wed, 11 Feb 2026 04:20:25 +0000
Subject: [PATCH 08/14] docs: remove auto-chaining, reframe around scheduling
system
Auto-chaining was removed from the codebase. All docs now describe the
scheduling model: work_finish transitions labels, the heartbeat's tick
pass (which also fires immediately after every work_finish) detects
available work and fills free slots. Removed autoChain config references.
Files updated: README.md, README2.md, docs/TOOLS.md, ARCHITECTURE.md,
ROADMAP.md, MANAGEMENT.md, ONBOARDING.md, lib/templates.ts
https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
---
README.md | 24 +++++++-----------------
README2.md | 38 ++++++++++++++++++--------------------
docs/ARCHITECTURE.md | 25 ++++++++++---------------
docs/MANAGEMENT.md | 4 ++--
docs/ONBOARDING.md | 2 +-
docs/ROADMAP.md | 2 +-
docs/TOOLS.md | 2 +-
lib/templates.ts | 8 ++++----
8 files changed, 44 insertions(+), 61 deletions(-)
diff --git a/README.md b/README.md
index 13d16cb..2de7a25 100644
--- a/README.md
+++ b/README.md
@@ -12,7 +12,7 @@ DevClaw is the [OpenClaw](https://openclaw.ai) plugin that makes this work.
## Why DevClaw
-OpenClaw gives you a powerful multi-agent runtime — channel bindings, session management, tool permissions, gateway RPC. But it's a general-purpose platform. It doesn't know what "pick up an issue" means, how to transition a label, when to reuse a session, or how to chain DEV completion into QA review. Managing a development workflow on raw OpenClaw means the orchestrator agent handles all of that through fragile, token-expensive LLM reasoning — and it gets it wrong often enough to need constant supervision. DevClaw encodes the entire development lifecycle into deterministic plugin code: level assignment, label transitions, session dispatch, auto-chaining, health checks, and audit logging. The agent calls one tool. The plugin does the rest. That's the difference between "an agent that can write code" and "a team that ships autonomously."
+OpenClaw gives you a powerful multi-agent runtime — channel bindings, session management, tool permissions, gateway RPC. But it's a general-purpose platform. It doesn't know what "pick up an issue" means, how to transition a label, when to reuse a session, or how to schedule QA after DEV completes. Managing a development workflow on raw OpenClaw means the orchestrator agent handles all of that through fragile, token-expensive LLM reasoning — and it gets it wrong often enough to need constant supervision. DevClaw encodes the entire development lifecycle into deterministic plugin code: level assignment, label transitions, session dispatch, scheduling, health checks, and audit logging. The agent calls one tool. The plugin does the rest. That's the difference between "an agent that can write code" and "a team that ships autonomously."
## Benefits
@@ -47,11 +47,10 @@ The heartbeat service runs a continuous loop: health check → queue scan → di
### Feedback loops
-Three automated feedback loops keep the pipeline self-correcting:
+Two automated feedback loops keep the pipeline self-correcting:
-1. **Auto-chaining** — DEV "done" automatically dispatches QA. QA "fail" automatically re-dispatches DEV. No orchestrator action needed.
-2. **Stale worker watchdog** — Workers active >2 hours are auto-detected. Labels revert to queue, workers deactivated. Tasks available for retry.
-3. **Completion enforcement** — Every task message includes a mandatory `work_finish` section. Workers use `"blocked"` if stuck. Three-layer guarantee prevents tasks from getting stuck forever.
+1. **Stale worker watchdog** — Workers active >2 hours are auto-detected. Labels revert to queue, workers deactivated. Tasks available for retry.
+2. **Completion enforcement** — Every task message includes a mandatory `work_finish` section. Workers use `"blocked"` if stuck. Three-layer guarantee prevents tasks from getting stuck forever.
### Role-based instruction prompts
@@ -114,12 +113,12 @@ stateDiagram-v2
ToDo --> Doing: work_start (DEV) ⇄ blocked
Doing --> ToTest: work_finish (DEV done)
- ToTest --> Testing: work_start (QA) / auto-chain ⇄ blocked
+ ToTest --> Testing: work_start (QA) ⇄ blocked
Testing --> Done: work_finish (QA pass)
Testing --> ToImprove: work_finish (QA fail)
Testing --> Refining: work_finish (QA refine)
- ToImprove --> Doing: work_start (DEV fix) or auto-chain
+ ToImprove --> Doing: work_start (DEV fix)
Refining --> ToDo: Human decision
Done --> [*]
@@ -142,15 +141,6 @@ stateDiagram-v2
Workers call `work_finish` directly when they're done — no orchestrator involvement needed for the state transition. Workers can also call `task_create` to file follow-up issues they discover during work.
-### Auto-chaining
-
-When a project has auto-chaining enabled:
-
-- **DEV "done"** → QA is dispatched immediately (using the reviewer level)
-- **QA "fail"** → DEV fix is dispatched immediately (reuses previous DEV level)
-- **QA "pass" / "refine" / "blocked"** → no chaining (pipeline done, needs human input, or returned to queue)
-- **DEV "blocked"** → no chaining (returned to queue for retry)
-
### Completion enforcement
Three layers guarantee tasks never get stuck:
@@ -238,7 +228,7 @@ DevClaw registers **11 tools**, grouped by function:
| Tool | Description |
|---|---|
| [`work_start`](docs/TOOLS.md#work_start) | Pick up a task — handles level assignment, label transition, session dispatch, audit |
-| [`work_finish`](docs/TOOLS.md#work_finish) | Complete a task — handles label transition, state update, auto-chaining, queue tick |
+| [`work_finish`](docs/TOOLS.md#work_finish) | Complete a task — handles label transition, state update, queue tick |
### Task management
diff --git a/README2.md b/README2.md
index 88214c5..10f9379 100644
--- a/README2.md
+++ b/README2.md
@@ -64,7 +64,7 @@ Agent: "✅ DEV DONE #44 — Profile page refactored. Moved to QA."
Agent: "📋 Moving #45 to To Do — dependency #44 is in QA."
```
-Three issues shipped, one sent back for a fix (and auto-retried), another project's migration completed — all while you slept. And when you dropped in, you planned work, reprioritized, and synced to your external tracker without leaving the chat. The heartbeat kept going before, during, and after.
+Three issues shipped, one sent back for a fix (the scheduler retried it automatically), another project's migration completed — all while you slept. And when you dropped in, you planned work, reprioritized, and synced to your external tracker without leaving the chat. The heartbeat kept going before, during, and after.
---
@@ -77,7 +77,7 @@ Without DevClaw, your orchestrator agent has to figure out on its own how to:
- Create or reuse the right worker session
- Transition issue labels in the right order
- Track which worker is doing what across projects
-- Chain DEV completion into QA review
+- Schedule QA after DEV completes, and re-schedule DEV after QA fails
- Detect crashed workers and recover
- Log everything for auditability
@@ -130,12 +130,12 @@ stateDiagram-v2
ToDo --> Doing: DEV picks up
Doing --> ToTest: DEV done
- ToTest --> Testing: QA picks up (or auto-chains)
+ ToTest --> Testing: Scheduler picks up QA
Testing --> Done: QA pass (issue closed)
Testing --> ToImprove: QA fail (back to DEV)
Testing --> Refining: QA needs human input
- ToImprove --> Doing: DEV fixes (or auto-chains)
+ ToImprove --> Doing: Scheduler picks up DEV fix
Refining --> ToDo: Human decides
Done --> [*]
@@ -164,10 +164,10 @@ If step 4 fails, step 3 is rolled back. No half-states, no orphaned labels, no "
When a developer finishes, they call `work_finish` directly — no orchestrator involved:
-- **DEV "done"** → label moves to `To Test`, QA starts automatically
+- **DEV "done"** → label moves to `To Test`, scheduler picks up QA on next tick
- **DEV "blocked"** → label moves back to `To Do`, task returns to queue
- **QA "pass"** → label moves to `Done`, issue closes
-- **QA "fail"** → label moves to `To Improve`, DEV gets re-dispatched
+- **QA "fail"** → label moves to `To Improve`, scheduler picks up DEV on next tick
The orchestrator doesn't need to poll, check, or coordinate. Workers are self-reporting.
@@ -193,27 +193,27 @@ Full trace of every task, every level selection, every label transition, every h
## Automatic scheduling
-DevClaw doesn't wait for you to tell it what to do next. A background heartbeat service continuously scans for available work and dispatches workers — zero LLM tokens, pure deterministic code.
+DevClaw doesn't wait for you to tell it what to do next. A background scheduling system continuously scans for available work and dispatches workers — zero LLM tokens, pure deterministic code. This is the engine that keeps the pipeline moving: when DEV finishes, the scheduler sees a `To Test` issue and dispatches QA. When QA fails, the scheduler sees a `To Improve` issue and dispatches DEV. No hand-offs, no orchestrator reasoning — just label-driven scheduling.
-### The heartbeat
+### The `work_heartbeat`
-Every tick, the service runs two passes:
+Every tick (default: 60 seconds), the scheduler runs two passes:
1. **Health pass** — detects workers stuck for >2 hours, reverts their labels back to queue, deactivates them. Catches crashed sessions, context overflows, or workers that died without reporting back.
2. **Queue pass** — scans for available tasks by priority (`To Improve` > `To Test` > `To Do`), fills free worker slots. DEV and QA slots are filled independently.
-All CLI calls and JSON reads. Workers only consume tokens when they actually start coding or reviewing.
+All CLI calls and JSON reads. Workers only consume tokens when they actually start coding or reviewing. The scheduler also fires immediately after every `work_finish` (as a tick), so transitions happen without waiting for the next interval.
-### Auto-chaining
+### How tasks flow between roles
-When enabled, task completions automatically trigger the next step:
+When a worker calls `work_finish`, the plugin transitions the label. The scheduler picks up the rest:
-- **DEV "done"** → QA reviewer is dispatched immediately
-- **QA "fail"** → DEV is re-dispatched at the same level that originally worked on it
-- **QA "pass"** → issue closes, pipeline done
-- **"blocked"** → task returns to queue for retry, no chaining
+- **DEV "done"** → label moves to `To Test` → next tick dispatches QA
+- **QA "fail"** → label moves to `To Improve` → next tick dispatches DEV (reuses previous level)
+- **QA "pass"** → label moves to `Done`, issue closes
+- **"blocked"** → label reverts to queue (`To Do` or `To Test`) for retry
-No orchestrator involvement. The worker calls `work_finish`, the plugin handles the rest.
+No orchestrator involvement. Workers self-report, the scheduler fills free slots.
### Execution modes
@@ -251,7 +251,6 @@ Per-project settings live in `projects.json`:
{
"-1234567890": {
"name": "my-app",
- "autoChain": true,
"roleExecution": "parallel"
}
}
@@ -263,7 +262,6 @@ Per-project settings live in `projects.json`:
| `work_heartbeat.intervalSeconds` | `openclaw.json` | `60` | Seconds between ticks |
| `work_heartbeat.maxPickupsPerTick` | `openclaw.json` | `4` | Max workers dispatched per tick |
| `projectExecution` | `openclaw.json` | `"parallel"` | All projects at once, or one at a time |
-| `autoChain` | `projects.json` | `false` | Auto-dispatch next step on completion |
| `roleExecution` | `projects.json` | `"parallel"` | DEV+QA at once, or one role at a time |
See the [Configuration reference](docs/CONFIGURATION.md) for the full schema.
@@ -367,7 +365,7 @@ DevClaw gives the orchestrator 11 tools. These aren't just convenience wrappers
| Tool | What it does |
|---|---|
| `work_start` | Pick up a task — resolves level, transitions label, dispatches session, logs audit |
-| `work_finish` | Complete a task — transitions label, updates state, auto-chains next step, ticks queue |
+| `work_finish` | Complete a task — transitions label, updates state, ticks queue for next dispatch |
| `task_create` | Create a new issue (used by workers to file bugs they discover) |
| `task_update` | Manually change an issue's state label |
| `task_comment` | Add a comment to an issue (with role attribution) |
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
index ec0a630..c73b7a3 100644
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -174,7 +174,7 @@ graph TB
WF -->|closes/reopens| GL
WF -->|reads/writes| PJ
WF -->|git pull| REPO
- WF -->|auto-chain dispatch| CLI
+ WF -->|tick dispatch| CLI
WF -->|appends| AL
TCR -->|creates issue| GL
@@ -374,7 +374,7 @@ sequenceDiagram
participant PJ as projects.json
participant AL as audit.log
participant REPO as Git Repo
- participant QA as QA Session (auto-chain)
+ participant QA as QA Session
DEV->>WF: work_finish({ role: "dev", result: "done", projectGroupId: "-123", summary: "Login page with OAuth" })
WF->>PJ: readProjects()
@@ -385,21 +385,16 @@ sequenceDiagram
WF->>GL: transitionLabel "Doing" → "To Test"
WF->>AL: append { event: "work_finish", role: "dev", result: "done" }
- alt autoChain enabled
- WF->>GL: transitionLabel "To Test" → "Testing"
- WF->>QA: dispatchTask(role: "qa", level: "reviewer")
- WF->>PJ: activateWorker(-123, qa)
- WF-->>DEV: { announcement: "✅ DEV DONE #42", autoChain: { dispatched: true, role: "qa" } }
- else autoChain disabled
- WF-->>DEV: { announcement: "✅ DEV DONE #42", nextAction: "qa_pickup" }
- end
+ WF->>WF: tick queue (fill free slots)
+ Note over WF: Scheduler sees "To Test" issue, QA slot free → dispatches QA
+ WF-->>DEV: { announcement: "✅ DEV DONE #42", tickPickups: [...] }
```
**Writes:**
- `Git repo`: pulled latest (has DEV's merged code)
- `projects.json`: dev.active=false, dev.issueId=null (sessions map preserved for reuse)
-- `Issue Tracker`: label "Doing" → "To Test" (+ "To Test" → "Testing" if auto-chain)
-- `audit.log`: 1 entry (work_finish) + optional auto-chain entries
+- `Issue Tracker`: label "Doing" → "To Test"
+- `audit.log`: 1 entry (work_finish) + tick entries if workers dispatched
### Phase 6: QA pickup
@@ -462,7 +457,7 @@ DEV Blocked: "Doing" → "To Do"
QA Blocked: "Testing" → "To Test"
```
-Worker cannot complete (missing info, environment errors, etc.). Issue returns to queue for retry. No auto-chain — the task is available for the next heartbeat pickup.
+Worker cannot complete (missing info, environment errors, etc.). Issue returns to queue for retry. The task is available for the next heartbeat pickup.
### Completion enforcement
@@ -517,7 +512,7 @@ Every piece of data and where it lives:
│ │
│ setup → agent creation + workspace + model config │
│ work_start → level + label + dispatch + role instr (e2e) │
-│ work_finish → label + state + git pull + auto-chain │
+│ work_finish → label + state + git pull + tick queue │
│ task_create → create issue in tracker │
│ task_update → manual label state change │
│ task_comment → add comment to issue │
@@ -588,7 +583,7 @@ graph LR
PR[Project registration]
SETUP[Agent + workspace setup]
SD[Session dispatch
create + send via CLI]
- AC[Auto-chaining
DEV→QA, QA fail→DEV]
+ AC[Scheduling
tick queue after work_finish]
RI[Role instructions
loaded per project]
A[Audit logging]
Z[Zombie cleanup]
diff --git a/docs/MANAGEMENT.md b/docs/MANAGEMENT.md
index c99431e..1d6fa0c 100644
--- a/docs/MANAGEMENT.md
+++ b/docs/MANAGEMENT.md
@@ -29,9 +29,9 @@ Classical management theory — later formalized by Bernard Bass in his work on
DevClaw's task lifecycle is built on this. The orchestrator delegates a task via `work_start`, then steps away. It only re-engages in three scenarios:
-1. **DEV completes work** → The task moves to QA automatically. No orchestrator involvement needed.
+1. **DEV completes work** → The label moves to `To Test`. The scheduler dispatches QA on the next tick. No orchestrator involvement needed.
2. **QA passes** → The issue closes. Pipeline complete.
-3. **QA fails** → The task cycles back to DEV with a fix request. The orchestrator may need to adjust the model level.
+3. **QA fails** → The label moves to `To Improve`. The scheduler dispatches DEV on the next tick. The orchestrator may need to adjust the model level.
4. **QA refines** → The task enters a holding state that _requires human decision_. This is the explicit escalation boundary.
The "refine" state is the most interesting from a delegation perspective. It's a conscious architectural decision that says: some judgments should not be automated. When the QA agent determines that a task needs rethinking rather than just fixing, it escalates to the only actor who has the full business context — the human.
diff --git a/docs/ONBOARDING.md b/docs/ONBOARDING.md
index 00e7747..d110641 100644
--- a/docs/ONBOARDING.md
+++ b/docs/ONBOARDING.md
@@ -244,7 +244,7 @@ Change which model powers each level in `openclaw.json` — see [Configuration](
| Developer assignment | Plugin | LLM-selected level by orchestrator, keyword heuristic fallback |
| State management | Plugin | Atomic read/write to `projects.json` |
| Session management | Plugin | Creates, reuses, and dispatches to sessions via CLI. Agent never touches session tools. |
-| Task completion | Plugin (`work_finish`) | Workers self-report. Auto-chains if enabled. |
+| Task completion | Plugin (`work_finish`) | Workers self-report. Scheduler dispatches next role. |
| Prompt instructions | Plugin (`work_start`) | Loaded from `projects/roles//.md`, appended to task message |
| Audit logging | Plugin | Automatic NDJSON append per tool call |
| Zombie detection | Plugin | `health` checks active vs alive |
diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
index 98e67be..35c809e 100644
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -30,7 +30,7 @@ Roles become a configurable list instead of a hardcoded pair. Each role defines:
}
```
-The pipeline definition replaces the hardcoded `Doing → To Test → Testing → Done` flow. Labels and transitions are generated from the pipeline config. Auto-chaining follows the pipeline order.
+The pipeline definition replaces the hardcoded `Doing → To Test → Testing → Done` flow. Labels and transitions are generated from the pipeline config. The scheduler follows the pipeline order when filling free slots.
### Open questions
diff --git a/docs/TOOLS.md b/docs/TOOLS.md
index 15ee4ac..8c586ba 100644
--- a/docs/TOOLS.md
+++ b/docs/TOOLS.md
@@ -90,7 +90,7 @@ Complete a task with a result. Called by workers (DEV/QA sub-agent sessions) dir
6. Ticks queue to fill free worker slots
7. Writes audit log
-**Auto-chaining** (when enabled on the project): `dev:done` dispatches QA automatically. `qa:fail` re-dispatches DEV using the previous level.
+**Scheduling:** After completion, `work_finish` ticks the queue. The scheduler sees the new label (`To Test` or `To Improve`) and dispatches the next worker if a slot is free.
---
diff --git a/lib/templates.ts b/lib/templates.ts
index 8df2344..ad970b2 100644
--- a/lib/templates.ts
+++ b/lib/templates.ts
@@ -102,7 +102,7 @@ All orchestration goes through these tools. You do NOT manually manage sessions,
| \`status\` | Task queue and worker state per project (lightweight dashboard) |
| \`health\` | Scan worker health: zombies, stale workers, orphaned state. Pass fix=true to auto-fix |
| \`work_start\` | End-to-end: label transition, level assignment, session create/reuse, dispatch with role instructions |
-| \`work_finish\` | End-to-end: label transition, state update, issue close/reopen. Auto-ticks queue after completion. |
+| \`work_finish\` | End-to-end: label transition, state update, issue close/reopen. Ticks scheduler after completion. |
### Pipeline Flow
@@ -135,10 +135,10 @@ Evaluate each task and pass the appropriate developer level to \`work_start\`:
### When Work Completes
-Workers call \`work_finish\` themselves — the label transition, state update, and audit log happen atomically. After completion, \`work_finish\` auto-ticks the queue to fill free slots:
+Workers call \`work_finish\` themselves — the label transition, state update, and audit log happen atomically. After completion, \`work_finish\` ticks the scheduler to fill free slots:
-- DEV "done" → issue moves to "To Test" → tick dispatches QA
-- QA "fail" → issue moves to "To Improve" → tick dispatches DEV
+- DEV "done" → issue moves to "To Test" → scheduler dispatches QA
+- QA "fail" → issue moves to "To Improve" → scheduler dispatches DEV
- QA "pass" → Done, no further dispatch
- QA "refine" / blocked → needs human input
From fce256fe599abdba47030daadd47f773a7952a23 Mon Sep 17 00:00:00 2001
From: Claude
Date: Wed, 11 Feb 2026 04:31:53 +0000
Subject: [PATCH 09/14] docs: add concise three-pillar benefits section with
cross-links
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Autonomous multi-project development, process enforcement, and token
savings — each as a brief paragraph with inline links to the detailed
sections below.
https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
---
README2.md | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/README2.md b/README2.md
index 10f9379..0c577a8 100644
--- a/README2.md
+++ b/README2.md
@@ -68,6 +68,22 @@ Three issues shipped, one sent back for a fix (the scheduler retried it automati
---
+## Why DevClaw
+
+### Autonomous multi-project development
+
+Every project runs in [complete isolation](#execution-modes) with its own queue, workers, and sessions. DEV and QA [execute in parallel](#execution-modes) within each project, and [multiple projects run simultaneously](#execution-modes). The [scheduling engine](#automatic-scheduling) ties it together: a token-free `work_heartbeat` continuously scans queues, dispatches workers, and drives [DEV → QA → DEV feedback loops](#how-tasks-flow-between-roles) — no human in the loop. Workers receive [custom instructions per project per role](#custom-instructions-per-project) at dispatch time: test commands, coding standards, deployment steps.
+
+### Process enforcement
+
+Task state lives in your [existing issue tracker](#your-issues-stay-in-your-tracker) — GitHub or GitLab issues — as the single source of truth. Every tool call is an [atomic operation with rollback](#what-atomic-means-here): label transitions, state updates, session dispatch, and audit logging happen in deterministic code. The agent says what to do; [11 tools enforce how it gets done](#the-toolbox).
+
+### Token savings
+
+[Tier selection](#meet-your-team) routes tasks to the cheapest model that can handle them — Haiku for typos, Opus for architecture (**~30-50%** on simple tasks). [Session reuse](#sessions-accumulate-context) preserves accumulated codebase knowledge across tasks (**~40-60%** per task). The [scheduling engine](#automatic-scheduling) runs on pure CLI calls — **zero** LLM tokens for orchestration. Combined: **~60-80% reduction** versus running everything through one model with fresh context each time.
+
+---
+
## The problem DevClaw solves
OpenClaw is a great multi-agent runtime. It handles sessions, tools, channels, gateway RPC — everything you need to run AI agents. But it's a general-purpose platform. It has no opinion about how software gets built.
From f13e101656b826d7cdd8815b609406d1e0759ed0 Mon Sep 17 00:00:00 2001
From: Claude
Date: Wed, 11 Feb 2026 05:01:38 +0000
Subject: [PATCH 10/14] docs: shorten demo, inline human interaction, use
GitHub issues
Trimmed the chat transcript: removed separate "You jump in" section,
inlined a single human message within the webapp group flow, dropped
Linear sync in favor of GitHub issue creation. Shorter, tighter.
https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
---
README2.md | 39 ++++++++-------------------------------
1 file changed, 8 insertions(+), 31 deletions(-)
diff --git a/README2.md b/README2.md
index 0c577a8..ef8405e 100644
--- a/README2.md
+++ b/README2.md
@@ -23,48 +23,25 @@ Agent: "🔍 Sending QA (reviewer) for #42: Add login page"
Agent: "🎉 QA PASS #42. Issue closed."
Agent: "⚡ Sending DEV (junior) for #43: Fix button color on /settings"
Agent: "✅ DEV DONE #43 — Updated to brand blue. Moved to QA."
-Agent: "🔍 Sending QA (reviewer) for #43: Fix button color on /settings"
Agent: "❌ QA FAIL #43 — Color doesn't match dark mode. Back to DEV."
Agent: "⚡ Sending DEV (junior) for #43: Fix button color on /settings"
+You: "Create an issue for refactoring the profile page, pick it up."
+Agent: created #44 "Refactor user profile page" on GitHub — To Do
+Agent: "⚡ Sending DEV (medior) for #44: Refactor user profile page"
+
+Agent: "✅ DEV DONE #43 — Fixed dark-mode color. Back to QA."
+Agent: "🎉 QA PASS #43. Issue closed."
+
── Group: "Dev - My API" ─────────────────────────────────
Agent: "🧠 Spawning DEV (senior) for #18: Migrate auth to OAuth2"
Agent: "✅ DEV DONE #18 — OAuth2 provider with refresh tokens. Moved to QA."
-Agent: "🔍 Sending QA (reviewer) for #18: Migrate auth to OAuth2"
Agent: "🎉 QA PASS #18. Issue closed."
Agent: "⚡ Sending DEV (medior) for #19: Add rate limiting to /api/search"
-
-── You jump in ─────────────────────────────────────────
-
-You: "How's the frontend looking?"
-Agent: "#41 shipped, #42 shipped, #43 back in DEV (dark-mode fix).
- 1 issue left in To Do. DEV is working, QA is idle."
-
-You: "We need to refactor the profile page before we add settings.
- Create an issue for that, mark it high priority, and pick it up."
-Agent: creates #44 "Refactor user profile page" — To Do, priority: high
-Agent: "⚡ Sending DEV (medior) for #44: Refactor user profile page"
-
-You: "Also create an issue for adding the settings page.
- Keep it in Planning until #44 is done."
-Agent: creates #45 "Add settings page" — Planning
-Agent: "Parked in Planning. I'll move it to To Do once #44 ships."
-
-You: "Push these to Linear."
-Agent: synced #44 → LIN-312 "Refactor user profile page"
-Agent: synced #45 → LIN-313 "Add settings page"
-Agent: "Both issues synced to Linear."
-
-── Autonomous work continues ───────────────────────────
-
-Agent: "✅ DEV DONE #43 — Fixed dark-mode color. Back to QA."
-Agent: "🎉 QA PASS #43. Issue closed."
-Agent: "✅ DEV DONE #44 — Profile page refactored. Moved to QA."
-Agent: "📋 Moving #45 to To Do — dependency #44 is in QA."
```
-Three issues shipped, one sent back for a fix (the scheduler retried it automatically), another project's migration completed — all while you slept. And when you dropped in, you planned work, reprioritized, and synced to your external tracker without leaving the chat. The heartbeat kept going before, during, and after.
+Multiple issues shipped, a QA failure automatically retried, and a second project's migration completed — all while you slept. When you dropped in mid-stream to create an issue, the scheduler kept going before, during, and after.
---
From 8cebeee31415f38a9263b0122b183723afd62ef6 Mon Sep 17 00:00:00 2001
From: Claude
Date: Wed, 11 Feb 2026 05:07:14 +0000
Subject: [PATCH 11/14] docs: visually differentiate human message in demo
transcript
Indent the "You:" line differently from the Agent lines so it stands
out as a human dropping into an otherwise autonomous flow.
https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
---
README2.md | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/README2.md b/README2.md
index ef8405e..baa41a4 100644
--- a/README2.md
+++ b/README2.md
@@ -26,7 +26,8 @@ Agent: "✅ DEV DONE #43 — Updated to brand blue. Moved to QA."
Agent: "❌ QA FAIL #43 — Color doesn't match dark mode. Back to DEV."
Agent: "⚡ Sending DEV (junior) for #43: Fix button color on /settings"
-You: "Create an issue for refactoring the profile page, pick it up."
+ You: "Create an issue for refactoring the profile page, pick it up."
+
Agent: created #44 "Refactor user profile page" on GitHub — To Do
Agent: "⚡ Sending DEV (medior) for #44: Refactor user profile page"
From 348b33f40b17df1cb700dac2b967674126bb75d6 Mon Sep 17 00:00:00 2001
From: Claude
Date: Wed, 11 Feb 2026 06:00:59 +0000
Subject: [PATCH 12/14] docs: title with OpenClaw, savings in header, vertical
architecture diagram
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
- Title: "DevClaw — Development Plugin for OpenClaw"
- Renamed "Token savings" header to "~60-80% token savings"
- Added vertical ASCII diagram showing heartbeat dispatching to
isolated projects with parallel DEV/QA — mobile-friendly layout
https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
---
README2.md | 34 +++++++++++++++++++++++++++++++---
1 file changed, 31 insertions(+), 3 deletions(-)
diff --git a/README2.md b/README2.md
index baa41a4..10c43d5 100644
--- a/README2.md
+++ b/README2.md
@@ -2,7 +2,7 @@
-# DevClaw
+# DevClaw — Development Plugin for OpenClaw
**Turn any group chat into a dev team that ships.**
@@ -52,13 +52,41 @@ Multiple issues shipped, a QA failure automatically retried, and a second projec
Every project runs in [complete isolation](#execution-modes) with its own queue, workers, and sessions. DEV and QA [execute in parallel](#execution-modes) within each project, and [multiple projects run simultaneously](#execution-modes). The [scheduling engine](#automatic-scheduling) ties it together: a token-free `work_heartbeat` continuously scans queues, dispatches workers, and drives [DEV → QA → DEV feedback loops](#how-tasks-flow-between-roles) — no human in the loop. Workers receive [custom instructions per project per role](#custom-instructions-per-project) at dispatch time: test commands, coding standards, deployment steps.
+```
+┌─ work_heartbeat ─────────────────┐
+│ health → queue → dispatch │
+│ every 60s · zero LLM tokens │
+└──────────┬───────────────────────┘
+ │
+ ┌─────▼─────────────────────┐
+ │ My Webapp │
+ │ │
+ │ DEV (medior) ──▶ QA │
+ │ #43 #42 │
+ │ │
+ │ dev.md · qa.md │
+ └───────────────────────────┘
+ │
+ ┌─────▼─────────────────────┐
+ │ My API │
+ │ │
+ │ DEV (senior) ──▶ QA │
+ │ #19 #18 │
+ │ │
+ │ dev.md · qa.md │
+ └───────────────────────────┘
+
+ each project fully isolated:
+ own queue · own workers · own sessions
+```
+
### Process enforcement
Task state lives in your [existing issue tracker](#your-issues-stay-in-your-tracker) — GitHub or GitLab issues — as the single source of truth. Every tool call is an [atomic operation with rollback](#what-atomic-means-here): label transitions, state updates, session dispatch, and audit logging happen in deterministic code. The agent says what to do; [11 tools enforce how it gets done](#the-toolbox).
-### Token savings
+### ~60-80% token savings
-[Tier selection](#meet-your-team) routes tasks to the cheapest model that can handle them — Haiku for typos, Opus for architecture (**~30-50%** on simple tasks). [Session reuse](#sessions-accumulate-context) preserves accumulated codebase knowledge across tasks (**~40-60%** per task). The [scheduling engine](#automatic-scheduling) runs on pure CLI calls — **zero** LLM tokens for orchestration. Combined: **~60-80% reduction** versus running everything through one model with fresh context each time.
+[Tier selection](#meet-your-team) routes tasks to the cheapest model that can handle them — Haiku for typos, Opus for architecture (~30-50% on simple tasks). [Session reuse](#sessions-accumulate-context) preserves accumulated codebase knowledge across tasks (~40-60% per task). The [scheduling engine](#automatic-scheduling) runs on pure CLI calls — zero LLM tokens for orchestration.
---
From 13abef8bc1465efa2c2e1c0a562db0b897bbb1ca Mon Sep 17 00:00:00 2001
From: Claude
Date: Wed, 11 Feb 2026 07:23:04 +0000
Subject: [PATCH 13/14] docs: remove diagram, restructure Why DevClaw into
clean bullet lists
Replaced dense inline-linked paragraphs with a short intro sentence
per pillar followed by bullet points. Each bullet is one concept with
one link. Removed the ASCII parallelization diagram.
https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
---
README2.md | 44 ++++++++++++++------------------------------
1 file changed, 14 insertions(+), 30 deletions(-)
diff --git a/README2.md b/README2.md
index 10c43d5..9770381 100644
--- a/README2.md
+++ b/README2.md
@@ -50,43 +50,27 @@ Multiple issues shipped, a QA failure automatically retried, and a second projec
### Autonomous multi-project development
-Every project runs in [complete isolation](#execution-modes) with its own queue, workers, and sessions. DEV and QA [execute in parallel](#execution-modes) within each project, and [multiple projects run simultaneously](#execution-modes). The [scheduling engine](#automatic-scheduling) ties it together: a token-free `work_heartbeat` continuously scans queues, dispatches workers, and drives [DEV → QA → DEV feedback loops](#how-tasks-flow-between-roles) — no human in the loop. Workers receive [custom instructions per project per role](#custom-instructions-per-project) at dispatch time: test commands, coding standards, deployment steps.
+Each project is fully isolated — own queue, workers, sessions, and state. DEV and QA execute in parallel within each project, and multiple projects run simultaneously. A token-free scheduling engine drives it all autonomously:
-```
-┌─ work_heartbeat ─────────────────┐
-│ health → queue → dispatch │
-│ every 60s · zero LLM tokens │
-└──────────┬───────────────────────┘
- │
- ┌─────▼─────────────────────┐
- │ My Webapp │
- │ │
- │ DEV (medior) ──▶ QA │
- │ #43 #42 │
- │ │
- │ dev.md · qa.md │
- └───────────────────────────┘
- │
- ┌─────▼─────────────────────┐
- │ My API │
- │ │
- │ DEV (senior) ──▶ QA │
- │ #19 #18 │
- │ │
- │ dev.md · qa.md │
- └───────────────────────────┘
-
- each project fully isolated:
- own queue · own workers · own sessions
-```
+- **[Scheduling engine](#automatic-scheduling)** — `work_heartbeat` continuously scans queues, dispatches workers, and drives DEV → QA → DEV [feedback loops](#how-tasks-flow-between-roles)
+- **[Project isolation](#execution-modes)** — parallel workers per project, parallel projects across the system
+- **[Role instructions](#custom-instructions-per-project)** — per-project, per-role prompts injected at dispatch time
### Process enforcement
-Task state lives in your [existing issue tracker](#your-issues-stay-in-your-tracker) — GitHub or GitLab issues — as the single source of truth. Every tool call is an [atomic operation with rollback](#what-atomic-means-here): label transitions, state updates, session dispatch, and audit logging happen in deterministic code. The agent says what to do; [11 tools enforce how it gets done](#the-toolbox).
+GitHub/GitLab issues are the single source of truth — not an internal database. Every tool call wraps the full operation into deterministic code with rollback on failure:
+
+- **[External task state](#your-issues-stay-in-your-tracker)** — labels, transitions, and status queries go through your issue tracker
+- **[Atomic operations](#what-atomic-means-here)** — label transition + state update + session dispatch + audit log in one call
+- **[Tool-based guardrails](#the-toolbox)** — 11 tools enforce the process; the agent provides intent, the plugin handles mechanics
### ~60-80% token savings
-[Tier selection](#meet-your-team) routes tasks to the cheapest model that can handle them — Haiku for typos, Opus for architecture (~30-50% on simple tasks). [Session reuse](#sessions-accumulate-context) preserves accumulated codebase knowledge across tasks (~40-60% per task). The [scheduling engine](#automatic-scheduling) runs on pure CLI calls — zero LLM tokens for orchestration.
+Three mechanisms compound to cut token usage dramatically versus running one large model with fresh context each time:
+
+- **[Tier selection](#meet-your-team)** — Haiku for typos, Sonnet for features, Opus for architecture (~30-50% on simple tasks)
+- **[Session reuse](#sessions-accumulate-context)** — workers accumulate codebase knowledge across tasks (~40-60% per task)
+- **[Token-free scheduling](#automatic-scheduling)** — `work_heartbeat` runs on pure CLI calls, zero LLM tokens for orchestration
---
From b8ea37189b263797267c8c33dc8da3bab7cf62e3 Mon Sep 17 00:00:00 2001
From: Claude
Date: Wed, 11 Feb 2026 07:29:17 +0000
Subject: [PATCH 14/14] docs: replace README with README2, add install link to
intro
Removed the old README.md and promoted README2.md to README.md.
Added a "Get started" link after the intro paragraph pointing to
the installation section.
https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
---
README.md | 520 +++++++++++++++++++++++++++++++++--------------------
README2.md | 407 -----------------------------------------
2 files changed, 324 insertions(+), 603 deletions(-)
delete mode 100644 README2.md
diff --git a/README.md b/README.md
index 2de7a25..720bb5e 100644
--- a/README.md
+++ b/README.md
@@ -4,273 +4,401 @@
# DevClaw — Development Plugin for OpenClaw
-**Every group chat becomes an autonomous development team.**
+**Turn any group chat into a dev team that ships.**
-Add an agent to a Telegram/WhatsApp group, point it at a GitHub/GitLab repo — that group now has an **orchestrator** managing the backlog, a **DEV** worker writing code, and a **QA** worker reviewing it. All autonomous. Add another group, get another team. Each project runs in complete isolation with its own task queue, workers, and session state.
+DevClaw is a plugin for [OpenClaw](https://openclaw.ai) that turns your orchestrator agent into a development manager. It hires developers, assigns tasks, reviews code, and keeps the pipeline moving — across as many projects as you have group chats. [Get started →](#getting-started)
-DevClaw is the [OpenClaw](https://openclaw.ai) plugin that makes this work.
+---
+
+## What it looks like
+
+You have two projects in two Telegram groups. You go to bed. You wake up:
+
+```
+── Group: "Dev - My Webapp" ──────────────────────────────
+
+Agent: "⚡ Sending DEV (medior) for #42: Add login page"
+Agent: "✅ DEV DONE #42 — Login page with OAuth. Moved to QA."
+Agent: "🔍 Sending QA (reviewer) for #42: Add login page"
+Agent: "🎉 QA PASS #42. Issue closed."
+Agent: "⚡ Sending DEV (junior) for #43: Fix button color on /settings"
+Agent: "✅ DEV DONE #43 — Updated to brand blue. Moved to QA."
+Agent: "❌ QA FAIL #43 — Color doesn't match dark mode. Back to DEV."
+Agent: "⚡ Sending DEV (junior) for #43: Fix button color on /settings"
+
+ You: "Create an issue for refactoring the profile page, pick it up."
+
+Agent: created #44 "Refactor user profile page" on GitHub — To Do
+Agent: "⚡ Sending DEV (medior) for #44: Refactor user profile page"
+
+Agent: "✅ DEV DONE #43 — Fixed dark-mode color. Back to QA."
+Agent: "🎉 QA PASS #43. Issue closed."
+
+── Group: "Dev - My API" ─────────────────────────────────
+
+Agent: "🧠 Spawning DEV (senior) for #18: Migrate auth to OAuth2"
+Agent: "✅ DEV DONE #18 — OAuth2 provider with refresh tokens. Moved to QA."
+Agent: "🎉 QA PASS #18. Issue closed."
+Agent: "⚡ Sending DEV (medior) for #19: Add rate limiting to /api/search"
+```
+
+Multiple issues shipped, a QA failure automatically retried, and a second project's migration completed — all while you slept. When you dropped in mid-stream to create an issue, the scheduler kept going before, during, and after.
+
+---
## Why DevClaw
-OpenClaw gives you a powerful multi-agent runtime — channel bindings, session management, tool permissions, gateway RPC. But it's a general-purpose platform. It doesn't know what "pick up an issue" means, how to transition a label, when to reuse a session, or how to schedule QA after DEV completes. Managing a development workflow on raw OpenClaw means the orchestrator agent handles all of that through fragile, token-expensive LLM reasoning — and it gets it wrong often enough to need constant supervision. DevClaw encodes the entire development lifecycle into deterministic plugin code: level assignment, label transitions, session dispatch, scheduling, health checks, and audit logging. The agent calls one tool. The plugin does the rest. That's the difference between "an agent that can write code" and "a team that ships autonomously."
+### Autonomous multi-project development
-## Benefits
+Each project is fully isolated — own queue, workers, sessions, and state. DEV and QA execute in parallel within each project, and multiple projects run simultaneously. A token-free scheduling engine drives it all autonomously:
-### Process consistency
+- **[Scheduling engine](#automatic-scheduling)** — `work_heartbeat` continuously scans queues, dispatches workers, and drives DEV → QA → DEV [feedback loops](#how-tasks-flow-between-roles)
+- **[Project isolation](#execution-modes)** — parallel workers per project, parallel projects across the system
+- **[Role instructions](#custom-instructions-per-project)** — per-project, per-role prompts injected at dispatch time
-Every task follows the same fixed pipeline — `Planning → To Do → Doing → To Test → Testing → Done` — across every project. Label transitions, state updates, session dispatch, and audit logging happen atomically inside the plugin. The orchestrator agent **cannot** skip a step, forget a label, or corrupt session state. Hundreds of lines of manual orchestration logic collapse into a single `work_start` call.
+### Process enforcement
-### Token savings
+GitHub/GitLab issues are the single source of truth — not an internal database. Every tool call wraps the full operation into deterministic code with rollback on failure:
-DevClaw reduces token consumption at three levels:
+- **[External task state](#your-issues-stay-in-your-tracker)** — labels, transitions, and status queries go through your issue tracker
+- **[Atomic operations](#what-atomic-means-here)** — label transition + state update + session dispatch + audit log in one call
+- **[Tool-based guardrails](#the-toolbox)** — 11 tools enforce the process; the agent provides intent, the plugin handles mechanics
-| Mechanism | How it works | Estimated savings |
+### ~60-80% token savings
+
+Three mechanisms compound to cut token usage dramatically versus running one large model with fresh context each time:
+
+- **[Tier selection](#meet-your-team)** — Haiku for typos, Sonnet for features, Opus for architecture (~30-50% on simple tasks)
+- **[Session reuse](#sessions-accumulate-context)** — workers accumulate codebase knowledge across tasks (~40-60% per task)
+- **[Token-free scheduling](#automatic-scheduling)** — `work_heartbeat` runs on pure CLI calls, zero LLM tokens for orchestration
+
+---
+
+## The problem DevClaw solves
+
+OpenClaw is a great multi-agent runtime. It handles sessions, tools, channels, gateway RPC — everything you need to run AI agents. But it's a general-purpose platform. It has no opinion about how software gets built.
+
+Without DevClaw, your orchestrator agent has to figure out on its own how to:
+- Pick the right model for the task complexity
+- Create or reuse the right worker session
+- Transition issue labels in the right order
+- Track which worker is doing what across projects
+- Schedule QA after DEV completes, and re-schedule DEV after QA fails
+- Detect crashed workers and recover
+- Log everything for auditability
+
+That's a lot of reasoning per task. LLMs do it imperfectly — they forget steps, corrupt state, pick the wrong model, lose session references. You end up babysitting the thing you built to avoid babysitting.
+
+DevClaw moves all of that into deterministic plugin code. The agent says "pick up issue #42." The plugin handles the other 10 steps atomically. Every time, the same way, zero reasoning tokens spent on orchestration.
+
+---
+
+## Meet your team
+
+DevClaw doesn't think in model IDs. It thinks in people.
+
+When a task comes in, you don't configure `anthropic/claude-sonnet-4-5` — you assign a **medior developer**. The orchestrator evaluates task complexity and picks the right person for the job:
+
+### Developers
+
+| Level | Assigns to | Model |
|---|---|---|
-| **Session re-use (context preservation)** | Each developer level per role maintains one persistent session per project. When a medior dev finishes task A and picks up task B, the accumulated codebase context carries over — no re-reading the repo. | **~40-60%** per task (~50K context tokens saved per reuse) |
-| **Tier selection** | Junior for typos (Haiku), medior for features (Sonnet), senior for architecture (Opus). The right model for the job means you're not burning Opus tokens on a CSS fix. | **~30-50%** on simple tasks vs. always using the largest model |
-| **Token-free heartbeat** | The heartbeat service runs every 60s doing health checks and queue dispatch using pure deterministic code + CLI calls. Zero LLM tokens consumed. Workers only use tokens when they actually process tasks. | **100%** savings on orchestration overhead |
+| **Junior** | Typos, CSS fixes, renames, single-file changes | Haiku |
+| **Medior** | Features, bug fixes, multi-file changes | Sonnet |
+| **Senior** | Architecture, migrations, system-wide refactoring | Opus |
-### Project isolation and parallelization
+### QA
-Each project is fully isolated — separate task queue, separate worker state, separate sessions. No cross-project contamination. Two execution modes control parallelism:
+| Level | Assigns to | Model |
+|---|---|---|
+| **Reviewer** | Code review, test validation, PR inspection | Sonnet |
+| **Tester** | Manual testing, smoke tests | Haiku |
-- **Project-level**: DEV and QA can work simultaneously on different tasks (parallel, default) or one role at a time (sequential)
-- **Plugin-level**: Multiple projects can have active workers at once (parallel, default) or only one project active at a time (sequential)
+A CSS typo gets the intern. A database migration gets the architect. You're not burning Opus tokens on a color change, and you're not sending Haiku to redesign your auth system.
-### External task state (pluggable issue trackers)
-
-Task state lives in your issue tracker — not in DevClaw's internal storage. Every label transition, issue creation, and status query goes through the `IssueProvider` interface, which abstracts the underlying tracker. GitHub Issues and GitLab Issues are supported today (auto-detected from your git remote); the pluggable architecture means any task manager — Jira, Linear, or a custom system — can be added by implementing the same interface. This gives you full visibility in your existing tools: filter by DevClaw labels in GitHub, build dashboards in GitLab, set up webhooks on label changes. The issue tracker is the source of truth; DevClaw reads from it and writes to it, but never replaces it.
-
-### Continuous planning
-
-The heartbeat service runs a continuous loop: health check → queue scan → dispatch. It detects stale workers (>2 hours), auto-reverts stuck labels, and fills free worker slots — all without human intervention or agent LLM tokens. The orchestrator agent only gets involved when a decision requires judgment.
-
-### Feedback loops
-
-Two automated feedback loops keep the pipeline self-correcting:
-
-1. **Stale worker watchdog** — Workers active >2 hours are auto-detected. Labels revert to queue, workers deactivated. Tasks available for retry.
-2. **Completion enforcement** — Every task message includes a mandatory `work_finish` section. Workers use `"blocked"` if stuck. Three-layer guarantee prevents tasks from getting stuck forever.
-
-### Role-based instruction prompts
-
-Workers receive customizable, project-specific instructions loaded at dispatch time:
-
-```
-workspace/projects/roles/
-├── my-webapp/
-│ ├── dev.md ← "Run npm test before committing. Deploy URL: ..."
-│ └── qa.md ← "Check OAuth flow. Verify mobile responsiveness."
-└── default/
- ├── dev.md ← Fallback for projects without custom instructions
- └── qa.md
-```
-
-Edit these files to inject deployment steps, test commands, acceptance criteria, or coding standards — per project, per role.
-
-### Atomic operations with rollback
-
-Every tool call wraps multiple operations (label transition + state update + session dispatch + audit log) into a single atomic action. If session dispatch fails, the label transition is rolled back. No orphaned state. No half-completed operations.
-
-### Full audit trail
-
-Every tool call automatically appends an NDJSON entry to `log/audit.log`. Query with `jq` to trace any task's full history. No manual logging required from the orchestrator.
+Every mapping is [configurable](docs/CONFIGURATION.md#model-tiers) — swap in any model you want per level.
---
-## The model-to-role mapping
+## How a task moves through the pipeline
-DevClaw doesn't expose raw model names. You're assigning a _junior developer_ to fix a typo, not configuring `anthropic/claude-haiku-4-5`. Each developer level maps to a configurable LLM:
+Every issue follows the same path, no exceptions. DevClaw enforces it:
-### DEV levels
-
-| Level | Who they are | Default model | Assigns to |
-|---|---|---|---|
-| `junior` | The intern | `anthropic/claude-haiku-4-5` | Typos, single-file fixes, CSS changes |
-| `medior` | The reliable mid-level | `anthropic/claude-sonnet-4-5` | Features, bug fixes, multi-file changes |
-| `senior` | The architect | `anthropic/claude-opus-4-5` | Architecture, migrations, system-wide refactoring |
-
-### QA levels
-
-| Level | Who they are | Default model | Assigns to |
-|---|---|---|---|
-| `reviewer` | The code reviewer | `anthropic/claude-sonnet-4-5` | Code review, test validation, PR inspection |
-| `tester` | The QA tester | `anthropic/claude-haiku-4-5` | Manual testing, smoke tests |
-
-The orchestrator LLM evaluates each issue and picks the appropriate level. A keyword-based heuristic in `model-selector.ts` serves as fallback when the orchestrator omits the level. Override which model powers each level in [`openclaw.json`](docs/CONFIGURATION.md#model-tiers).
-
----
-
-## Task workflow
-
-Every task (issue) moves through a fixed pipeline of label states. DevClaw tools handle every transition atomically.
+```
+Planning → To Do → Doing → To Test → Testing → Done
+```
```mermaid
stateDiagram-v2
[*] --> Planning
Planning --> ToDo: Ready for development
- ToDo --> Doing: work_start (DEV) ⇄ blocked
- Doing --> ToTest: work_finish (DEV done)
+ ToDo --> Doing: DEV picks up
+ Doing --> ToTest: DEV done
- ToTest --> Testing: work_start (QA) ⇄ blocked
- Testing --> Done: work_finish (QA pass)
- Testing --> ToImprove: work_finish (QA fail)
- Testing --> Refining: work_finish (QA refine)
+ ToTest --> Testing: Scheduler picks up QA
+ Testing --> Done: QA pass (issue closed)
+ Testing --> ToImprove: QA fail (back to DEV)
+ Testing --> Refining: QA needs human input
- ToImprove --> Doing: work_start (DEV fix)
- Refining --> ToDo: Human decision
+ ToImprove --> Doing: Scheduler picks up DEV fix
+ Refining --> ToDo: Human decides
Done --> [*]
```
-### The eight state labels
+These labels live on your actual GitHub/GitLab issues. Not in some internal database — in the tool you already use. Filter by `Doing` in GitHub to see what's in progress. Set up a webhook on `Done` to trigger deploys. The issue tracker is the source of truth.
-| Label | Color | Meaning |
-|---|---|---|
-| **Planning** | Blue-grey | Pre-work review — issue exists but not ready for development |
-| **To Do** | Blue | Ready for DEV pickup |
-| **Doing** | Orange | DEV actively working |
-| **To Test** | Cyan | Ready for QA pickup |
-| **Testing** | Purple | QA actively reviewing |
-| **Done** | Green | Complete — issue closed |
-| **To Improve** | Red | QA failed — back to DEV |
-| **Refining** | Yellow | Awaiting human decision |
+### What "atomic" means here
-### Worker self-reporting
+When you say "pick up #42 for DEV", the plugin does all of this in one operation:
+1. Verifies the issue is in the right state
+2. Picks the developer level (or uses what you specified)
+3. Transitions the label (`To Do` → `Doing`)
+4. Creates or reuses the right worker session
+5. Dispatches the task with project-specific instructions
+6. Updates internal state
+7. Logs an audit entry
-Workers call `work_finish` directly when they're done — no orchestrator involvement needed for the state transition. Workers can also call `task_create` to file follow-up issues they discover during work.
-
-### Completion enforcement
-
-Three layers guarantee tasks never get stuck:
-
-1. **Completion contract** — Every task message includes a mandatory section requiring `work_finish`, even on failure. Workers use `"blocked"` if stuck.
-2. **Blocked result** — Both DEV and QA can gracefully put a task back in queue (`Doing → To Do`, `Testing → To Test`).
-3. **Stale worker watchdog** — Heartbeat detects workers active >2 hours and auto-reverts labels to queue.
+If step 4 fails, step 3 is rolled back. No half-states, no orphaned labels, no "the issue says Doing but nobody's working on it."
---
-## Installation
+## What happens behind the scenes
-### Requirements
+### Workers report back themselves
-| Requirement | Why | Verify |
-|---|---|---|
-| [OpenClaw](https://openclaw.ai) | DevClaw is an OpenClaw plugin | `openclaw --version` |
-| Node.js >= 20 | Plugin runtime | `node --version` |
-| [`gh`](https://cli.github.com) or [`glab`](https://gitlab.com/gitlab-org/cli) CLI | Issue tracker provider (auto-detected from git remote) | `gh --version` / `glab --version` |
-| CLI authenticated | Plugin calls gh/glab for every label transition | `gh auth status` / `glab auth status` |
+When a developer finishes, they call `work_finish` directly — no orchestrator involved:
-### Install the plugin
+- **DEV "done"** → label moves to `To Test`, scheduler picks up QA on next tick
+- **DEV "blocked"** → label moves back to `To Do`, task returns to queue
+- **QA "pass"** → label moves to `Done`, issue closes
+- **QA "fail"** → label moves to `To Improve`, scheduler picks up DEV on next tick
+
+The orchestrator doesn't need to poll, check, or coordinate. Workers are self-reporting.
+
+### Sessions accumulate context
+
+Each developer level gets its own persistent session per project. Your medior dev that's done 5 features on `my-app` already knows the codebase — it doesn't re-read 50K tokens of source code every time it picks up a new task.
+
+That's a **~40-60% token saving per task** from session reuse alone.
+
+Combined with tier selection (not using Opus when Haiku will do) and the token-free heartbeat (more on that next), DevClaw significantly reduces your token bill versus running everything through one large model.
+
+### Everything is logged
+
+Every tool call writes an NDJSON line to `audit.log`:
+
+```bash
+cat audit.log | jq 'select(.event=="work_start")'
+```
+
+Full trace of every task, every level selection, every label transition, every health fix. No manual logging needed.
+
+---
+
+## Automatic scheduling
+
+DevClaw doesn't wait for you to tell it what to do next. A background scheduling system continuously scans for available work and dispatches workers — zero LLM tokens, pure deterministic code. This is the engine that keeps the pipeline moving: when DEV finishes, the scheduler sees a `To Test` issue and dispatches QA. When QA fails, the scheduler sees a `To Improve` issue and dispatches DEV. No hand-offs, no orchestrator reasoning — just label-driven scheduling.
+
+### The `work_heartbeat`
+
+Every tick (default: 60 seconds), the scheduler runs two passes:
+
+1. **Health pass** — detects workers stuck for >2 hours, reverts their labels back to queue, deactivates them. Catches crashed sessions, context overflows, or workers that died without reporting back.
+2. **Queue pass** — scans for available tasks by priority (`To Improve` > `To Test` > `To Do`), fills free worker slots. DEV and QA slots are filled independently.
+
+All CLI calls and JSON reads. Workers only consume tokens when they actually start coding or reviewing. The scheduler also fires immediately after every `work_finish` (as a tick), so transitions happen without waiting for the next interval.
+
+### How tasks flow between roles
+
+When a worker calls `work_finish`, the plugin transitions the label. The scheduler picks up the rest:
+
+- **DEV "done"** → label moves to `To Test` → next tick dispatches QA
+- **QA "fail"** → label moves to `To Improve` → next tick dispatches DEV (reuses previous level)
+- **QA "pass"** → label moves to `Done`, issue closes
+- **"blocked"** → label reverts to queue (`To Do` or `To Test`) for retry
+
+No orchestrator involvement. Workers self-report, the scheduler fills free slots.
+
+### Execution modes
+
+Each project is fully isolated — its own queue, workers, sessions, state. No cross-project contamination. Two levels of parallelism control how work gets scheduled:
+
+- **Project-level (`roleExecution`)** — DEV and QA work simultaneously on different tasks (default: `parallel`) or take turns (`sequential`)
+- **Plugin-level (`projectExecution`)** — all registered projects dispatch workers independently (default: `parallel`) or only one project runs at a time (`sequential`)
+
+### Configuration
+
+All scheduling behavior is configurable in `openclaw.json`:
+
+```json
+{
+ "plugins": {
+ "entries": {
+ "devclaw": {
+ "config": {
+ "work_heartbeat": {
+ "enabled": true,
+ "intervalSeconds": 60,
+ "maxPickupsPerTick": 4
+ },
+ "projectExecution": "parallel"
+ }
+ }
+ }
+ }
+}
+```
+
+Per-project settings live in `projects.json`:
+
+```json
+{
+ "-1234567890": {
+ "name": "my-app",
+ "roleExecution": "parallel"
+ }
+}
+```
+
+| Setting | Where | Default | What it controls |
+|---|---|---|---|
+| `work_heartbeat.enabled` | `openclaw.json` | `true` | Turn the heartbeat on/off |
+| `work_heartbeat.intervalSeconds` | `openclaw.json` | `60` | Seconds between ticks |
+| `work_heartbeat.maxPickupsPerTick` | `openclaw.json` | `4` | Max workers dispatched per tick |
+| `projectExecution` | `openclaw.json` | `"parallel"` | All projects at once, or one at a time |
+| `roleExecution` | `projects.json` | `"parallel"` | DEV+QA at once, or one role at a time |
+
+See the [Configuration reference](docs/CONFIGURATION.md) for the full schema.
+
+---
+
+## Task management
+
+### Your issues stay in your tracker
+
+DevClaw doesn't have its own task database. All task state lives in **GitHub Issues** or **GitLab Issues** — auto-detected from your git remote. The eight pipeline labels are created on your repo when you register a project. Your project manager sees progress in GitHub without knowing DevClaw exists. Your CI/CD can trigger on label changes. If you stop using DevClaw, your issues and labels stay exactly where they are.
+
+The provider is pluggable (`IssueProvider` interface). GitHub and GitLab work today. Jira, Linear, or anything else just needs to implement the same interface.
+
+### Creating, updating, and commenting
+
+Tasks can come from anywhere — the orchestrator creates them from chat, workers file bugs they discover mid-task, or you create them directly in GitHub/GitLab:
+
+```
+You: "Create an issue: fix the broken OAuth redirect"
+Agent: creates issue #43 with label "Planning"
+
+You: "Move #43 to To Do"
+Agent: transitions label Planning → To Do
+
+You: "Add a comment on #42: needs to handle the edge case for expired tokens"
+Agent: adds comment attributed to "orchestrator"
+```
+
+Workers can also comment during work — QA leaves review feedback, DEV posts implementation notes. Every comment carries role attribution so you know who said what.
+
+### Custom instructions per project
+
+Each project gets instruction files that workers receive with every task they pick up:
+
+```
+workspace/projects/roles/
+├── my-webapp/
+│ ├── dev.md "Run npm test before committing. Deploy URL: staging.example.com"
+│ └── qa.md "Check OAuth flow. Verify mobile responsiveness."
+├── my-api/
+│ ├── dev.md "Run cargo test. Follow REST conventions in CONTRIBUTING.md"
+│ └── qa.md "Verify all endpoints return correct status codes."
+└── default/
+ ├── dev.md (fallback for projects without custom instructions)
+ └── qa.md
+```
+
+Deployment steps, test commands, coding standards, acceptance criteria — all injected at dispatch time, per project, per role.
+
+---
+
+## Getting started
+
+### Prerequisites
+
+- [OpenClaw](https://openclaw.ai) installed (`openclaw --version`)
+- Node.js >= 20
+- `gh` CLI ([GitHub](https://cli.github.com)) or `glab` CLI ([GitLab](https://gitlab.com/gitlab-org/cli)), authenticated
+
+### Install
```bash
cp -r devclaw ~/.openclaw/extensions/
```
-### Onboarding
+### Set up through conversation
-The recommended way to set up DevClaw is through a conversation with any agent that has the plugin loaded. Call the `onboard` tool and the agent walks you through everything step by step:
+The easiest way to configure DevClaw is to just talk to your agent:
```
You: "Help me set up DevClaw"
-Agent: calls onboard({ mode: "first-run" })
+Agent: "I'll walk you through it. Should I use this agent as the
+ orchestrator, or create a new one?"
+You: "Use this one"
-Agent: "I'll help you set up DevClaw. First — should I configure this agent
- as the orchestrator, or create a new one?"
-You: "Use this agent"
-
-Agent: "Got it. Do you want to bind a messaging channel? (telegram/whatsapp/none)"
+Agent: "Want to bind a messaging channel?"
You: "Telegram"
-Agent: "Default model tiers:
- DEV junior → Haiku, medior → Sonnet, senior → Opus
- QA reviewer → Sonnet, tester → Haiku
- Accept defaults or customize?"
-You: "Defaults are fine"
+Agent: "Here are the default developer assignments:
+ Junior → Haiku, Medior → Sonnet, Senior → Opus
+ Reviewer → Sonnet, Tester → Haiku
+ Keep these or customize?"
+You: "Keep them"
-Agent: calls setup({ agentId: "my-agent", channelBinding: "telegram", ... })
-Agent: "Done! Workspace configured. Want to register a project now?"
-You: "Yes — register my-app at ~/git/my-app"
+Agent: "Done. Want to register a project?"
+You: "Yes — my-app at ~/git/my-app, main branch"
-Agent: calls project_register({ ... })
-Agent: "Project registered. 8 labels created, role instructions scaffolded.
- Try: 'check the queue' to see pending issues."
+Agent: "Project registered. 8 labels created on your repo.
+ Role instructions scaffolded. Try: 'check the queue'"
```
-After setup, work flows naturally through conversation in your project's group chat:
-
-```
-"Check the queue" → agent calls status
-"Pick up issue #1 for DEV" → agent calls work_start
-[DEV works autonomously] → calls work_finish when done
-[Heartbeat fills next slot] → QA dispatched automatically
-```
-
-DevClaw also supports a [CLI wizard and non-interactive setup](docs/ONBOARDING.md#step-2-run-setup) for scripted or headless environments. See the [Onboarding Guide](docs/ONBOARDING.md) for the full step-by-step reference.
+You can also use the [CLI wizard or non-interactive setup](docs/ONBOARDING.md#step-2-run-setup) for scripted environments.
---
-## Architecture
+## The toolbox
-For detailed technical diagrams — system overview, end-to-end flows, session-per-level design, session reuse mechanics, data flow map, and the complete ticket lifecycle from creation to completion — see the [Architecture documentation](docs/ARCHITECTURE.md).
+DevClaw gives the orchestrator 11 tools. These aren't just convenience wrappers — they're **guardrails**. Each tool encodes a complex multi-step operation into a single atomic call. The agent provides intent, the plugin handles mechanics. The agent physically cannot skip a label transition, forget to update state, or dispatch to the wrong session — those decisions are made by deterministic code, not LLM reasoning.
----
-
-## Tools
-
-DevClaw's tools are the guardrails that make autonomous development reliable. Without them, an LLM orchestrator would need to reason about label transitions, session lifecycle, state serialization, and audit logging on every action — and get it wrong often enough to require constant supervision. Each tool encodes one operation as deterministic code: the agent provides intent ("pick up issue #42 for DEV"), the tool handles the mechanics (verify label, resolve level, transition state, dispatch session, log event, return announcement). The agent can't skip a step, use the wrong label, or forget to update state — those decisions are made by the plugin, not the model.
-
-DevClaw registers **11 tools**, grouped by function:
-
-### Worker lifecycle
-
-| Tool | Description |
+| Tool | What it does |
|---|---|
-| [`work_start`](docs/TOOLS.md#work_start) | Pick up a task — handles level assignment, label transition, session dispatch, audit |
-| [`work_finish`](docs/TOOLS.md#work_finish) | Complete a task — handles label transition, state update, queue tick |
+| `work_start` | Pick up a task — resolves level, transitions label, dispatches session, logs audit |
+| `work_finish` | Complete a task — transitions label, updates state, ticks queue for next dispatch |
+| `task_create` | Create a new issue (used by workers to file bugs they discover) |
+| `task_update` | Manually change an issue's state label |
+| `task_comment` | Add a comment to an issue (with role attribution) |
+| `status` | Dashboard: queue counts + who's working on what |
+| `health` | Detect zombie workers, stale sessions, state inconsistencies |
+| `work_heartbeat` | Manually trigger a health check + queue dispatch cycle |
+| `project_register` | One-time project setup: creates labels, scaffolds instructions, initializes state |
+| `setup` | Agent + workspace initialization |
+| `onboard` | Conversational setup guide |
-### Task management
-
-| Tool | Description |
-|---|---|
-| [`task_create`](docs/TOOLS.md#task_create) | Create a new issue in the tracker |
-| [`task_update`](docs/TOOLS.md#task_update) | Change an issue's state label manually |
-| [`task_comment`](docs/TOOLS.md#task_comment) | Add a comment to an issue |
-
-### Operations
-
-| Tool | Description |
-|---|---|
-| [`status`](docs/TOOLS.md#status) | Queue counts + worker state dashboard |
-| [`health`](docs/TOOLS.md#health) | Worker health checks + zombie detection |
-| [`work_heartbeat`](docs/TOOLS.md#work_heartbeat) | Manual trigger for health + queue dispatch |
-
-### Setup
-
-| Tool | Description |
-|---|---|
-| [`project_register`](docs/TOOLS.md#project_register) | One-time project setup (labels, prompts, state) |
-| [`setup`](docs/TOOLS.md#setup) | Agent + workspace initialization |
-| [`onboard`](docs/TOOLS.md#onboard) | Conversational onboarding guide |
-
-See the [Tools Reference](docs/TOOLS.md) for full parameters and usage.
+Full parameters and usage in the [Tools Reference](docs/TOOLS.md).
---
## Documentation
-| Document | Description |
+| | |
|---|---|
-| [Architecture](docs/ARCHITECTURE.md) | System design, session-per-level model, data flow, component interactions |
-| [Tools Reference](docs/TOOLS.md) | Complete reference for all 11 tools with parameters and examples |
-| [Configuration](docs/CONFIGURATION.md) | Full config reference — `openclaw.json`, `projects.json`, heartbeat, notifications |
-| [Onboarding Guide](docs/ONBOARDING.md) | Step-by-step setup: install, configure, register projects, test the pipeline |
-| [QA Workflow](docs/QA_WORKFLOW.md) | QA process: review documentation, comment templates, enforcement |
-| [Context Awareness](docs/CONTEXT-AWARENESS.md) | How DevClaw adapts behavior based on interaction context |
-| [Testing Guide](docs/TESTING.md) | Automated test suite: scenarios, fixtures, CI/CD integration |
-| [Management Theory](docs/MANAGEMENT.md) | The delegation theory behind DevClaw's design |
-| [Roadmap](docs/ROADMAP.md) | Planned features: configurable roles, channel-agnostic groups, Jira |
+| **[Architecture](docs/ARCHITECTURE.md)** | System design, session model, data flow, end-to-end diagrams |
+| **[Tools Reference](docs/TOOLS.md)** | Complete reference for all 11 tools |
+| **[Configuration](docs/CONFIGURATION.md)** | `openclaw.json`, `projects.json`, heartbeat, notifications |
+| **[Onboarding Guide](docs/ONBOARDING.md)** | Full step-by-step setup |
+| **[QA Workflow](docs/QA_WORKFLOW.md)** | QA process and review templates |
+| **[Context Awareness](docs/CONTEXT-AWARENESS.md)** | How tools adapt to group vs. DM vs. agent context |
+| **[Testing](docs/TESTING.md)** | Test suite, fixtures, CI/CD |
+| **[Management Theory](docs/MANAGEMENT.md)** | The delegation model behind the design |
+| **[Roadmap](docs/ROADMAP.md)** | What's coming next |
---
diff --git a/README2.md b/README2.md
deleted file mode 100644
index 9770381..0000000
--- a/README2.md
+++ /dev/null
@@ -1,407 +0,0 @@
-
-
-
-
-# DevClaw — Development Plugin for OpenClaw
-
-**Turn any group chat into a dev team that ships.**
-
-DevClaw is a plugin for [OpenClaw](https://openclaw.ai) that turns your orchestrator agent into a development manager. It hires developers, assigns tasks, reviews code, and keeps the pipeline moving — across as many projects as you have group chats.
-
----
-
-## What it looks like
-
-You have two projects in two Telegram groups. You go to bed. You wake up:
-
-```
-── Group: "Dev - My Webapp" ──────────────────────────────
-
-Agent: "⚡ Sending DEV (medior) for #42: Add login page"
-Agent: "✅ DEV DONE #42 — Login page with OAuth. Moved to QA."
-Agent: "🔍 Sending QA (reviewer) for #42: Add login page"
-Agent: "🎉 QA PASS #42. Issue closed."
-Agent: "⚡ Sending DEV (junior) for #43: Fix button color on /settings"
-Agent: "✅ DEV DONE #43 — Updated to brand blue. Moved to QA."
-Agent: "❌ QA FAIL #43 — Color doesn't match dark mode. Back to DEV."
-Agent: "⚡ Sending DEV (junior) for #43: Fix button color on /settings"
-
- You: "Create an issue for refactoring the profile page, pick it up."
-
-Agent: created #44 "Refactor user profile page" on GitHub — To Do
-Agent: "⚡ Sending DEV (medior) for #44: Refactor user profile page"
-
-Agent: "✅ DEV DONE #43 — Fixed dark-mode color. Back to QA."
-Agent: "🎉 QA PASS #43. Issue closed."
-
-── Group: "Dev - My API" ─────────────────────────────────
-
-Agent: "🧠 Spawning DEV (senior) for #18: Migrate auth to OAuth2"
-Agent: "✅ DEV DONE #18 — OAuth2 provider with refresh tokens. Moved to QA."
-Agent: "🎉 QA PASS #18. Issue closed."
-Agent: "⚡ Sending DEV (medior) for #19: Add rate limiting to /api/search"
-```
-
-Multiple issues shipped, a QA failure automatically retried, and a second project's migration completed — all while you slept. When you dropped in mid-stream to create an issue, the scheduler kept going before, during, and after.
-
----
-
-## Why DevClaw
-
-### Autonomous multi-project development
-
-Each project is fully isolated — own queue, workers, sessions, and state. DEV and QA execute in parallel within each project, and multiple projects run simultaneously. A token-free scheduling engine drives it all autonomously:
-
-- **[Scheduling engine](#automatic-scheduling)** — `work_heartbeat` continuously scans queues, dispatches workers, and drives DEV → QA → DEV [feedback loops](#how-tasks-flow-between-roles)
-- **[Project isolation](#execution-modes)** — parallel workers per project, parallel projects across the system
-- **[Role instructions](#custom-instructions-per-project)** — per-project, per-role prompts injected at dispatch time
-
-### Process enforcement
-
-GitHub/GitLab issues are the single source of truth — not an internal database. Every tool call wraps the full operation into deterministic code with rollback on failure:
-
-- **[External task state](#your-issues-stay-in-your-tracker)** — labels, transitions, and status queries go through your issue tracker
-- **[Atomic operations](#what-atomic-means-here)** — label transition + state update + session dispatch + audit log in one call
-- **[Tool-based guardrails](#the-toolbox)** — 11 tools enforce the process; the agent provides intent, the plugin handles mechanics
-
-### ~60-80% token savings
-
-Three mechanisms compound to cut token usage dramatically versus running one large model with fresh context each time:
-
-- **[Tier selection](#meet-your-team)** — Haiku for typos, Sonnet for features, Opus for architecture (~30-50% on simple tasks)
-- **[Session reuse](#sessions-accumulate-context)** — workers accumulate codebase knowledge across tasks (~40-60% per task)
-- **[Token-free scheduling](#automatic-scheduling)** — `work_heartbeat` runs on pure CLI calls, zero LLM tokens for orchestration
-
----
-
-## The problem DevClaw solves
-
-OpenClaw is a great multi-agent runtime. It handles sessions, tools, channels, gateway RPC — everything you need to run AI agents. But it's a general-purpose platform. It has no opinion about how software gets built.
-
-Without DevClaw, your orchestrator agent has to figure out on its own how to:
-- Pick the right model for the task complexity
-- Create or reuse the right worker session
-- Transition issue labels in the right order
-- Track which worker is doing what across projects
-- Schedule QA after DEV completes, and re-schedule DEV after QA fails
-- Detect crashed workers and recover
-- Log everything for auditability
-
-That's a lot of reasoning per task. LLMs do it imperfectly — they forget steps, corrupt state, pick the wrong model, lose session references. You end up babysitting the thing you built to avoid babysitting.
-
-DevClaw moves all of that into deterministic plugin code. The agent says "pick up issue #42." The plugin handles the other 10 steps atomically. Every time, the same way, zero reasoning tokens spent on orchestration.
-
----
-
-## Meet your team
-
-DevClaw doesn't think in model IDs. It thinks in people.
-
-When a task comes in, you don't configure `anthropic/claude-sonnet-4-5` — you assign a **medior developer**. The orchestrator evaluates task complexity and picks the right person for the job:
-
-### Developers
-
-| Level | Assigns to | Model |
-|---|---|---|
-| **Junior** | Typos, CSS fixes, renames, single-file changes | Haiku |
-| **Medior** | Features, bug fixes, multi-file changes | Sonnet |
-| **Senior** | Architecture, migrations, system-wide refactoring | Opus |
-
-### QA
-
-| Level | Assigns to | Model |
-|---|---|---|
-| **Reviewer** | Code review, test validation, PR inspection | Sonnet |
-| **Tester** | Manual testing, smoke tests | Haiku |
-
-A CSS typo gets the intern. A database migration gets the architect. You're not burning Opus tokens on a color change, and you're not sending Haiku to redesign your auth system.
-
-Every mapping is [configurable](docs/CONFIGURATION.md#model-tiers) — swap in any model you want per level.
-
----
-
-## How a task moves through the pipeline
-
-Every issue follows the same path, no exceptions. DevClaw enforces it:
-
-```
-Planning → To Do → Doing → To Test → Testing → Done
-```
-
-```mermaid
-stateDiagram-v2
- [*] --> Planning
- Planning --> ToDo: Ready for development
-
- ToDo --> Doing: DEV picks up
- Doing --> ToTest: DEV done
-
- ToTest --> Testing: Scheduler picks up QA
- Testing --> Done: QA pass (issue closed)
- Testing --> ToImprove: QA fail (back to DEV)
- Testing --> Refining: QA needs human input
-
- ToImprove --> Doing: Scheduler picks up DEV fix
- Refining --> ToDo: Human decides
-
- Done --> [*]
-```
-
-These labels live on your actual GitHub/GitLab issues. Not in some internal database — in the tool you already use. Filter by `Doing` in GitHub to see what's in progress. Set up a webhook on `Done` to trigger deploys. The issue tracker is the source of truth.
-
-### What "atomic" means here
-
-When you say "pick up #42 for DEV", the plugin does all of this in one operation:
-1. Verifies the issue is in the right state
-2. Picks the developer level (or uses what you specified)
-3. Transitions the label (`To Do` → `Doing`)
-4. Creates or reuses the right worker session
-5. Dispatches the task with project-specific instructions
-6. Updates internal state
-7. Logs an audit entry
-
-If step 4 fails, step 3 is rolled back. No half-states, no orphaned labels, no "the issue says Doing but nobody's working on it."
-
----
-
-## What happens behind the scenes
-
-### Workers report back themselves
-
-When a developer finishes, they call `work_finish` directly — no orchestrator involved:
-
-- **DEV "done"** → label moves to `To Test`, scheduler picks up QA on next tick
-- **DEV "blocked"** → label moves back to `To Do`, task returns to queue
-- **QA "pass"** → label moves to `Done`, issue closes
-- **QA "fail"** → label moves to `To Improve`, scheduler picks up DEV on next tick
-
-The orchestrator doesn't need to poll, check, or coordinate. Workers are self-reporting.
-
-### Sessions accumulate context
-
-Each developer level gets its own persistent session per project. Your medior dev that's done 5 features on `my-app` already knows the codebase — it doesn't re-read 50K tokens of source code every time it picks up a new task.
-
-That's a **~40-60% token saving per task** from session reuse alone.
-
-Combined with tier selection (not using Opus when Haiku will do) and the token-free heartbeat (more on that next), DevClaw significantly reduces your token bill versus running everything through one large model.
-
-### Everything is logged
-
-Every tool call writes an NDJSON line to `audit.log`:
-
-```bash
-cat audit.log | jq 'select(.event=="work_start")'
-```
-
-Full trace of every task, every level selection, every label transition, every health fix. No manual logging needed.
-
----
-
-## Automatic scheduling
-
-DevClaw doesn't wait for you to tell it what to do next. A background scheduling system continuously scans for available work and dispatches workers — zero LLM tokens, pure deterministic code. This is the engine that keeps the pipeline moving: when DEV finishes, the scheduler sees a `To Test` issue and dispatches QA. When QA fails, the scheduler sees a `To Improve` issue and dispatches DEV. No hand-offs, no orchestrator reasoning — just label-driven scheduling.
-
-### The `work_heartbeat`
-
-Every tick (default: 60 seconds), the scheduler runs two passes:
-
-1. **Health pass** — detects workers stuck for >2 hours, reverts their labels back to queue, deactivates them. Catches crashed sessions, context overflows, or workers that died without reporting back.
-2. **Queue pass** — scans for available tasks by priority (`To Improve` > `To Test` > `To Do`), fills free worker slots. DEV and QA slots are filled independently.
-
-All CLI calls and JSON reads. Workers only consume tokens when they actually start coding or reviewing. The scheduler also fires immediately after every `work_finish` (as a tick), so transitions happen without waiting for the next interval.
-
-### How tasks flow between roles
-
-When a worker calls `work_finish`, the plugin transitions the label. The scheduler picks up the rest:
-
-- **DEV "done"** → label moves to `To Test` → next tick dispatches QA
-- **QA "fail"** → label moves to `To Improve` → next tick dispatches DEV (reuses previous level)
-- **QA "pass"** → label moves to `Done`, issue closes
-- **"blocked"** → label reverts to queue (`To Do` or `To Test`) for retry
-
-No orchestrator involvement. Workers self-report, the scheduler fills free slots.
-
-### Execution modes
-
-Each project is fully isolated — its own queue, workers, sessions, state. No cross-project contamination. Two levels of parallelism control how work gets scheduled:
-
-- **Project-level (`roleExecution`)** — DEV and QA work simultaneously on different tasks (default: `parallel`) or take turns (`sequential`)
-- **Plugin-level (`projectExecution`)** — all registered projects dispatch workers independently (default: `parallel`) or only one project runs at a time (`sequential`)
-
-### Configuration
-
-All scheduling behavior is configurable in `openclaw.json`:
-
-```json
-{
- "plugins": {
- "entries": {
- "devclaw": {
- "config": {
- "work_heartbeat": {
- "enabled": true,
- "intervalSeconds": 60,
- "maxPickupsPerTick": 4
- },
- "projectExecution": "parallel"
- }
- }
- }
- }
-}
-```
-
-Per-project settings live in `projects.json`:
-
-```json
-{
- "-1234567890": {
- "name": "my-app",
- "roleExecution": "parallel"
- }
-}
-```
-
-| Setting | Where | Default | What it controls |
-|---|---|---|---|
-| `work_heartbeat.enabled` | `openclaw.json` | `true` | Turn the heartbeat on/off |
-| `work_heartbeat.intervalSeconds` | `openclaw.json` | `60` | Seconds between ticks |
-| `work_heartbeat.maxPickupsPerTick` | `openclaw.json` | `4` | Max workers dispatched per tick |
-| `projectExecution` | `openclaw.json` | `"parallel"` | All projects at once, or one at a time |
-| `roleExecution` | `projects.json` | `"parallel"` | DEV+QA at once, or one role at a time |
-
-See the [Configuration reference](docs/CONFIGURATION.md) for the full schema.
-
----
-
-## Task management
-
-### Your issues stay in your tracker
-
-DevClaw doesn't have its own task database. All task state lives in **GitHub Issues** or **GitLab Issues** — auto-detected from your git remote. The eight pipeline labels are created on your repo when you register a project. Your project manager sees progress in GitHub without knowing DevClaw exists. Your CI/CD can trigger on label changes. If you stop using DevClaw, your issues and labels stay exactly where they are.
-
-The provider is pluggable (`IssueProvider` interface). GitHub and GitLab work today. Jira, Linear, or anything else just needs to implement the same interface.
-
-### Creating, updating, and commenting
-
-Tasks can come from anywhere — the orchestrator creates them from chat, workers file bugs they discover mid-task, or you create them directly in GitHub/GitLab:
-
-```
-You: "Create an issue: fix the broken OAuth redirect"
-Agent: creates issue #43 with label "Planning"
-
-You: "Move #43 to To Do"
-Agent: transitions label Planning → To Do
-
-You: "Add a comment on #42: needs to handle the edge case for expired tokens"
-Agent: adds comment attributed to "orchestrator"
-```
-
-Workers can also comment during work — QA leaves review feedback, DEV posts implementation notes. Every comment carries role attribution so you know who said what.
-
-### Custom instructions per project
-
-Each project gets instruction files that workers receive with every task they pick up:
-
-```
-workspace/projects/roles/
-├── my-webapp/
-│ ├── dev.md "Run npm test before committing. Deploy URL: staging.example.com"
-│ └── qa.md "Check OAuth flow. Verify mobile responsiveness."
-├── my-api/
-│ ├── dev.md "Run cargo test. Follow REST conventions in CONTRIBUTING.md"
-│ └── qa.md "Verify all endpoints return correct status codes."
-└── default/
- ├── dev.md (fallback for projects without custom instructions)
- └── qa.md
-```
-
-Deployment steps, test commands, coding standards, acceptance criteria — all injected at dispatch time, per project, per role.
-
----
-
-## Getting started
-
-### Prerequisites
-
-- [OpenClaw](https://openclaw.ai) installed (`openclaw --version`)
-- Node.js >= 20
-- `gh` CLI ([GitHub](https://cli.github.com)) or `glab` CLI ([GitLab](https://gitlab.com/gitlab-org/cli)), authenticated
-
-### Install
-
-```bash
-cp -r devclaw ~/.openclaw/extensions/
-```
-
-### Set up through conversation
-
-The easiest way to configure DevClaw is to just talk to your agent:
-
-```
-You: "Help me set up DevClaw"
-Agent: "I'll walk you through it. Should I use this agent as the
- orchestrator, or create a new one?"
-You: "Use this one"
-
-Agent: "Want to bind a messaging channel?"
-You: "Telegram"
-
-Agent: "Here are the default developer assignments:
- Junior → Haiku, Medior → Sonnet, Senior → Opus
- Reviewer → Sonnet, Tester → Haiku
- Keep these or customize?"
-You: "Keep them"
-
-Agent: "Done. Want to register a project?"
-You: "Yes — my-app at ~/git/my-app, main branch"
-
-Agent: "Project registered. 8 labels created on your repo.
- Role instructions scaffolded. Try: 'check the queue'"
-```
-
-You can also use the [CLI wizard or non-interactive setup](docs/ONBOARDING.md#step-2-run-setup) for scripted environments.
-
----
-
-## The toolbox
-
-DevClaw gives the orchestrator 11 tools. These aren't just convenience wrappers — they're **guardrails**. Each tool encodes a complex multi-step operation into a single atomic call. The agent provides intent, the plugin handles mechanics. The agent physically cannot skip a label transition, forget to update state, or dispatch to the wrong session — those decisions are made by deterministic code, not LLM reasoning.
-
-| Tool | What it does |
-|---|---|
-| `work_start` | Pick up a task — resolves level, transitions label, dispatches session, logs audit |
-| `work_finish` | Complete a task — transitions label, updates state, ticks queue for next dispatch |
-| `task_create` | Create a new issue (used by workers to file bugs they discover) |
-| `task_update` | Manually change an issue's state label |
-| `task_comment` | Add a comment to an issue (with role attribution) |
-| `status` | Dashboard: queue counts + who's working on what |
-| `health` | Detect zombie workers, stale sessions, state inconsistencies |
-| `work_heartbeat` | Manually trigger a health check + queue dispatch cycle |
-| `project_register` | One-time project setup: creates labels, scaffolds instructions, initializes state |
-| `setup` | Agent + workspace initialization |
-| `onboard` | Conversational setup guide |
-
-Full parameters and usage in the [Tools Reference](docs/TOOLS.md).
-
----
-
-## Documentation
-
-| | |
-|---|---|
-| **[Architecture](docs/ARCHITECTURE.md)** | System design, session model, data flow, end-to-end diagrams |
-| **[Tools Reference](docs/TOOLS.md)** | Complete reference for all 11 tools |
-| **[Configuration](docs/CONFIGURATION.md)** | `openclaw.json`, `projects.json`, heartbeat, notifications |
-| **[Onboarding Guide](docs/ONBOARDING.md)** | Full step-by-step setup |
-| **[QA Workflow](docs/QA_WORKFLOW.md)** | QA process and review templates |
-| **[Context Awareness](docs/CONTEXT-AWARENESS.md)** | How tools adapt to group vs. DM vs. agent context |
-| **[Testing](docs/TESTING.md)** | Test suite, fixtures, CI/CD |
-| **[Management Theory](docs/MANAGEMENT.md)** | The delegation model behind the design |
-| **[Roadmap](docs/ROADMAP.md)** | What's coming next |
-
----
-
-## License
-
-MIT