docs: remove auto-chaining, reframe around scheduling system

Auto-chaining was removed from the codebase. All docs now describe the
scheduling model: work_finish transitions labels, the heartbeat's tick
pass (which also fires immediately after every work_finish) detects
available work and fills free slots. Removed autoChain config references.

Files updated: README.md, README2.md, docs/TOOLS.md, ARCHITECTURE.md,
ROADMAP.md, MANAGEMENT.md, ONBOARDING.md, lib/templates.ts

https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
This commit is contained in:
Claude
2026-02-11 04:20:25 +00:00
parent 261babdf61
commit 9d1e253f11
8 changed files with 44 additions and 61 deletions

View File

@@ -12,7 +12,7 @@ DevClaw is the [OpenClaw](https://openclaw.ai) plugin that makes this work.
## Why DevClaw ## Why DevClaw
OpenClaw gives you a powerful multi-agent runtime — channel bindings, session management, tool permissions, gateway RPC. But it's a general-purpose platform. It doesn't know what "pick up an issue" means, how to transition a label, when to reuse a session, or how to chain DEV completion into QA review. Managing a development workflow on raw OpenClaw means the orchestrator agent handles all of that through fragile, token-expensive LLM reasoning — and it gets it wrong often enough to need constant supervision. DevClaw encodes the entire development lifecycle into deterministic plugin code: level assignment, label transitions, session dispatch, auto-chaining, health checks, and audit logging. The agent calls one tool. The plugin does the rest. That's the difference between "an agent that can write code" and "a team that ships autonomously." OpenClaw gives you a powerful multi-agent runtime — channel bindings, session management, tool permissions, gateway RPC. But it's a general-purpose platform. It doesn't know what "pick up an issue" means, how to transition a label, when to reuse a session, or how to schedule QA after DEV completes. Managing a development workflow on raw OpenClaw means the orchestrator agent handles all of that through fragile, token-expensive LLM reasoning — and it gets it wrong often enough to need constant supervision. DevClaw encodes the entire development lifecycle into deterministic plugin code: level assignment, label transitions, session dispatch, scheduling, health checks, and audit logging. The agent calls one tool. The plugin does the rest. That's the difference between "an agent that can write code" and "a team that ships autonomously."
## Benefits ## Benefits
@@ -47,11 +47,10 @@ The heartbeat service runs a continuous loop: health check → queue scan → di
### Feedback loops ### Feedback loops
Three automated feedback loops keep the pipeline self-correcting: Two automated feedback loops keep the pipeline self-correcting:
1. **Auto-chaining** — DEV "done" automatically dispatches QA. QA "fail" automatically re-dispatches DEV. No orchestrator action needed. 1. **Stale worker watchdog** — Workers active >2 hours are auto-detected. Labels revert to queue, workers deactivated. Tasks available for retry.
2. **Stale worker watchdog** — Workers active >2 hours are auto-detected. Labels revert to queue, workers deactivated. Tasks available for retry. 2. **Completion enforcement** — Every task message includes a mandatory `work_finish` section. Workers use `"blocked"` if stuck. Three-layer guarantee prevents tasks from getting stuck forever.
3. **Completion enforcement** — Every task message includes a mandatory `work_finish` section. Workers use `"blocked"` if stuck. Three-layer guarantee prevents tasks from getting stuck forever.
### Role-based instruction prompts ### Role-based instruction prompts
@@ -114,12 +113,12 @@ stateDiagram-v2
ToDo --> Doing: work_start (DEV) ⇄ blocked ToDo --> Doing: work_start (DEV) ⇄ blocked
Doing --> ToTest: work_finish (DEV done) Doing --> ToTest: work_finish (DEV done)
ToTest --> Testing: work_start (QA) / auto-chain ⇄ blocked ToTest --> Testing: work_start (QA) ⇄ blocked
Testing --> Done: work_finish (QA pass) Testing --> Done: work_finish (QA pass)
Testing --> ToImprove: work_finish (QA fail) Testing --> ToImprove: work_finish (QA fail)
Testing --> Refining: work_finish (QA refine) Testing --> Refining: work_finish (QA refine)
ToImprove --> Doing: work_start (DEV fix) or auto-chain ToImprove --> Doing: work_start (DEV fix)
Refining --> ToDo: Human decision Refining --> ToDo: Human decision
Done --> [*] Done --> [*]
@@ -142,15 +141,6 @@ stateDiagram-v2
Workers call `work_finish` directly when they're done — no orchestrator involvement needed for the state transition. Workers can also call `task_create` to file follow-up issues they discover during work. Workers call `work_finish` directly when they're done — no orchestrator involvement needed for the state transition. Workers can also call `task_create` to file follow-up issues they discover during work.
### Auto-chaining
When a project has auto-chaining enabled:
- **DEV "done"** → QA is dispatched immediately (using the reviewer level)
- **QA "fail"** → DEV fix is dispatched immediately (reuses previous DEV level)
- **QA "pass" / "refine" / "blocked"** → no chaining (pipeline done, needs human input, or returned to queue)
- **DEV "blocked"** → no chaining (returned to queue for retry)
### Completion enforcement ### Completion enforcement
Three layers guarantee tasks never get stuck: Three layers guarantee tasks never get stuck:
@@ -238,7 +228,7 @@ DevClaw registers **11 tools**, grouped by function:
| Tool | Description | | Tool | Description |
|---|---| |---|---|
| [`work_start`](docs/TOOLS.md#work_start) | Pick up a task — handles level assignment, label transition, session dispatch, audit | | [`work_start`](docs/TOOLS.md#work_start) | Pick up a task — handles level assignment, label transition, session dispatch, audit |
| [`work_finish`](docs/TOOLS.md#work_finish) | Complete a task — handles label transition, state update, auto-chaining, queue tick | | [`work_finish`](docs/TOOLS.md#work_finish) | Complete a task — handles label transition, state update, queue tick |
### Task management ### Task management

View File

@@ -64,7 +64,7 @@ Agent: "✅ DEV DONE #44 — Profile page refactored. Moved to QA."
Agent: "📋 Moving #45 to To Do — dependency #44 is in QA." Agent: "📋 Moving #45 to To Do — dependency #44 is in QA."
``` ```
Three issues shipped, one sent back for a fix (and auto-retried), another project's migration completed — all while you slept. And when you dropped in, you planned work, reprioritized, and synced to your external tracker without leaving the chat. The heartbeat kept going before, during, and after. Three issues shipped, one sent back for a fix (the scheduler retried it automatically), another project's migration completed — all while you slept. And when you dropped in, you planned work, reprioritized, and synced to your external tracker without leaving the chat. The heartbeat kept going before, during, and after.
--- ---
@@ -77,7 +77,7 @@ Without DevClaw, your orchestrator agent has to figure out on its own how to:
- Create or reuse the right worker session - Create or reuse the right worker session
- Transition issue labels in the right order - Transition issue labels in the right order
- Track which worker is doing what across projects - Track which worker is doing what across projects
- Chain DEV completion into QA review - Schedule QA after DEV completes, and re-schedule DEV after QA fails
- Detect crashed workers and recover - Detect crashed workers and recover
- Log everything for auditability - Log everything for auditability
@@ -130,12 +130,12 @@ stateDiagram-v2
ToDo --> Doing: DEV picks up ToDo --> Doing: DEV picks up
Doing --> ToTest: DEV done Doing --> ToTest: DEV done
ToTest --> Testing: QA picks up (or auto-chains) ToTest --> Testing: Scheduler picks up QA
Testing --> Done: QA pass (issue closed) Testing --> Done: QA pass (issue closed)
Testing --> ToImprove: QA fail (back to DEV) Testing --> ToImprove: QA fail (back to DEV)
Testing --> Refining: QA needs human input Testing --> Refining: QA needs human input
ToImprove --> Doing: DEV fixes (or auto-chains) ToImprove --> Doing: Scheduler picks up DEV fix
Refining --> ToDo: Human decides Refining --> ToDo: Human decides
Done --> [*] Done --> [*]
@@ -164,10 +164,10 @@ If step 4 fails, step 3 is rolled back. No half-states, no orphaned labels, no "
When a developer finishes, they call `work_finish` directly — no orchestrator involved: When a developer finishes, they call `work_finish` directly — no orchestrator involved:
- **DEV "done"** → label moves to `To Test`, QA starts automatically - **DEV "done"** → label moves to `To Test`, scheduler picks up QA on next tick
- **DEV "blocked"** → label moves back to `To Do`, task returns to queue - **DEV "blocked"** → label moves back to `To Do`, task returns to queue
- **QA "pass"** → label moves to `Done`, issue closes - **QA "pass"** → label moves to `Done`, issue closes
- **QA "fail"** → label moves to `To Improve`, DEV gets re-dispatched - **QA "fail"** → label moves to `To Improve`, scheduler picks up DEV on next tick
The orchestrator doesn't need to poll, check, or coordinate. Workers are self-reporting. The orchestrator doesn't need to poll, check, or coordinate. Workers are self-reporting.
@@ -193,27 +193,27 @@ Full trace of every task, every level selection, every label transition, every h
## Automatic scheduling ## Automatic scheduling
DevClaw doesn't wait for you to tell it what to do next. A background heartbeat service continuously scans for available work and dispatches workers — zero LLM tokens, pure deterministic code. DevClaw doesn't wait for you to tell it what to do next. A background scheduling system continuously scans for available work and dispatches workers — zero LLM tokens, pure deterministic code. This is the engine that keeps the pipeline moving: when DEV finishes, the scheduler sees a `To Test` issue and dispatches QA. When QA fails, the scheduler sees a `To Improve` issue and dispatches DEV. No hand-offs, no orchestrator reasoning — just label-driven scheduling.
### The heartbeat ### The `work_heartbeat`
Every tick, the service runs two passes: Every tick (default: 60 seconds), the scheduler runs two passes:
1. **Health pass** — detects workers stuck for >2 hours, reverts their labels back to queue, deactivates them. Catches crashed sessions, context overflows, or workers that died without reporting back. 1. **Health pass** — detects workers stuck for >2 hours, reverts their labels back to queue, deactivates them. Catches crashed sessions, context overflows, or workers that died without reporting back.
2. **Queue pass** — scans for available tasks by priority (`To Improve` > `To Test` > `To Do`), fills free worker slots. DEV and QA slots are filled independently. 2. **Queue pass** — scans for available tasks by priority (`To Improve` > `To Test` > `To Do`), fills free worker slots. DEV and QA slots are filled independently.
All CLI calls and JSON reads. Workers only consume tokens when they actually start coding or reviewing. All CLI calls and JSON reads. Workers only consume tokens when they actually start coding or reviewing. The scheduler also fires immediately after every `work_finish` (as a tick), so transitions happen without waiting for the next interval.
### Auto-chaining ### How tasks flow between roles
When enabled, task completions automatically trigger the next step: When a worker calls `work_finish`, the plugin transitions the label. The scheduler picks up the rest:
- **DEV "done"** → QA reviewer is dispatched immediately - **DEV "done"** → label moves to `To Test` → next tick dispatches QA
- **QA "fail"** → DEV is re-dispatched at the same level that originally worked on it - **QA "fail"** → label moves to `To Improve` → next tick dispatches DEV (reuses previous level)
- **QA "pass"** → issue closes, pipeline done - **QA "pass"** → label moves to `Done`, issue closes
- **"blocked"** → task returns to queue for retry, no chaining - **"blocked"** → label reverts to queue (`To Do` or `To Test`) for retry
No orchestrator involvement. The worker calls `work_finish`, the plugin handles the rest. No orchestrator involvement. Workers self-report, the scheduler fills free slots.
### Execution modes ### Execution modes
@@ -251,7 +251,6 @@ Per-project settings live in `projects.json`:
{ {
"-1234567890": { "-1234567890": {
"name": "my-app", "name": "my-app",
"autoChain": true,
"roleExecution": "parallel" "roleExecution": "parallel"
} }
} }
@@ -263,7 +262,6 @@ Per-project settings live in `projects.json`:
| `work_heartbeat.intervalSeconds` | `openclaw.json` | `60` | Seconds between ticks | | `work_heartbeat.intervalSeconds` | `openclaw.json` | `60` | Seconds between ticks |
| `work_heartbeat.maxPickupsPerTick` | `openclaw.json` | `4` | Max workers dispatched per tick | | `work_heartbeat.maxPickupsPerTick` | `openclaw.json` | `4` | Max workers dispatched per tick |
| `projectExecution` | `openclaw.json` | `"parallel"` | All projects at once, or one at a time | | `projectExecution` | `openclaw.json` | `"parallel"` | All projects at once, or one at a time |
| `autoChain` | `projects.json` | `false` | Auto-dispatch next step on completion |
| `roleExecution` | `projects.json` | `"parallel"` | DEV+QA at once, or one role at a time | | `roleExecution` | `projects.json` | `"parallel"` | DEV+QA at once, or one role at a time |
See the [Configuration reference](docs/CONFIGURATION.md) for the full schema. See the [Configuration reference](docs/CONFIGURATION.md) for the full schema.
@@ -367,7 +365,7 @@ DevClaw gives the orchestrator 11 tools. These aren't just convenience wrappers
| Tool | What it does | | Tool | What it does |
|---|---| |---|---|
| `work_start` | Pick up a task — resolves level, transitions label, dispatches session, logs audit | | `work_start` | Pick up a task — resolves level, transitions label, dispatches session, logs audit |
| `work_finish` | Complete a task — transitions label, updates state, auto-chains next step, ticks queue | | `work_finish` | Complete a task — transitions label, updates state, ticks queue for next dispatch |
| `task_create` | Create a new issue (used by workers to file bugs they discover) | | `task_create` | Create a new issue (used by workers to file bugs they discover) |
| `task_update` | Manually change an issue's state label | | `task_update` | Manually change an issue's state label |
| `task_comment` | Add a comment to an issue (with role attribution) | | `task_comment` | Add a comment to an issue (with role attribution) |

View File

@@ -174,7 +174,7 @@ graph TB
WF -->|closes/reopens| GL WF -->|closes/reopens| GL
WF -->|reads/writes| PJ WF -->|reads/writes| PJ
WF -->|git pull| REPO WF -->|git pull| REPO
WF -->|auto-chain dispatch| CLI WF -->|tick dispatch| CLI
WF -->|appends| AL WF -->|appends| AL
TCR -->|creates issue| GL TCR -->|creates issue| GL
@@ -374,7 +374,7 @@ sequenceDiagram
participant PJ as projects.json participant PJ as projects.json
participant AL as audit.log participant AL as audit.log
participant REPO as Git Repo participant REPO as Git Repo
participant QA as QA Session (auto-chain) participant QA as QA Session
DEV->>WF: work_finish({ role: "dev", result: "done", projectGroupId: "-123", summary: "Login page with OAuth" }) DEV->>WF: work_finish({ role: "dev", result: "done", projectGroupId: "-123", summary: "Login page with OAuth" })
WF->>PJ: readProjects() WF->>PJ: readProjects()
@@ -385,21 +385,16 @@ sequenceDiagram
WF->>GL: transitionLabel "Doing" → "To Test" WF->>GL: transitionLabel "Doing" → "To Test"
WF->>AL: append { event: "work_finish", role: "dev", result: "done" } WF->>AL: append { event: "work_finish", role: "dev", result: "done" }
alt autoChain enabled WF->>WF: tick queue (fill free slots)
WF->>GL: transitionLabel "To Test" → "Testing" Note over WF: Scheduler sees "To Test" issue, QA slot free → dispatches QA
WF->>QA: dispatchTask(role: "qa", level: "reviewer") WF-->>DEV: { announcement: "✅ DEV DONE #42", tickPickups: [...] }
WF->>PJ: activateWorker(-123, qa)
WF-->>DEV: { announcement: "✅ DEV DONE #42", autoChain: { dispatched: true, role: "qa" } }
else autoChain disabled
WF-->>DEV: { announcement: "✅ DEV DONE #42", nextAction: "qa_pickup" }
end
``` ```
**Writes:** **Writes:**
- `Git repo`: pulled latest (has DEV's merged code) - `Git repo`: pulled latest (has DEV's merged code)
- `projects.json`: dev.active=false, dev.issueId=null (sessions map preserved for reuse) - `projects.json`: dev.active=false, dev.issueId=null (sessions map preserved for reuse)
- `Issue Tracker`: label "Doing" → "To Test" (+ "To Test" → "Testing" if auto-chain) - `Issue Tracker`: label "Doing" → "To Test"
- `audit.log`: 1 entry (work_finish) + optional auto-chain entries - `audit.log`: 1 entry (work_finish) + tick entries if workers dispatched
### Phase 6: QA pickup ### Phase 6: QA pickup
@@ -462,7 +457,7 @@ DEV Blocked: "Doing" → "To Do"
QA Blocked: "Testing" → "To Test" QA Blocked: "Testing" → "To Test"
``` ```
Worker cannot complete (missing info, environment errors, etc.). Issue returns to queue for retry. No auto-chain — the task is available for the next heartbeat pickup. Worker cannot complete (missing info, environment errors, etc.). Issue returns to queue for retry. The task is available for the next heartbeat pickup.
### Completion enforcement ### Completion enforcement
@@ -517,7 +512,7 @@ Every piece of data and where it lives:
│ │ │ │
│ setup → agent creation + workspace + model config │ │ setup → agent creation + workspace + model config │
│ work_start → level + label + dispatch + role instr (e2e) │ │ work_start → level + label + dispatch + role instr (e2e) │
│ work_finish → label + state + git pull + auto-chain │ work_finish → label + state + git pull + tick queue
│ task_create → create issue in tracker │ │ task_create → create issue in tracker │
│ task_update → manual label state change │ │ task_update → manual label state change │
│ task_comment → add comment to issue │ │ task_comment → add comment to issue │
@@ -588,7 +583,7 @@ graph LR
PR[Project registration] PR[Project registration]
SETUP[Agent + workspace setup] SETUP[Agent + workspace setup]
SD[Session dispatch<br/>create + send via CLI] SD[Session dispatch<br/>create + send via CLI]
AC[Auto-chaining<br/>DEV→QA, QA fail→DEV] AC[Scheduling<br/>tick queue after work_finish]
RI[Role instructions<br/>loaded per project] RI[Role instructions<br/>loaded per project]
A[Audit logging] A[Audit logging]
Z[Zombie cleanup] Z[Zombie cleanup]

View File

@@ -29,9 +29,9 @@ Classical management theory — later formalized by Bernard Bass in his work on
DevClaw's task lifecycle is built on this. The orchestrator delegates a task via `work_start`, then steps away. It only re-engages in three scenarios: DevClaw's task lifecycle is built on this. The orchestrator delegates a task via `work_start`, then steps away. It only re-engages in three scenarios:
1. **DEV completes work** → The task moves to QA automatically. No orchestrator involvement needed. 1. **DEV completes work** → The label moves to `To Test`. The scheduler dispatches QA on the next tick. No orchestrator involvement needed.
2. **QA passes** → The issue closes. Pipeline complete. 2. **QA passes** → The issue closes. Pipeline complete.
3. **QA fails** → The task cycles back to DEV with a fix request. The orchestrator may need to adjust the model level. 3. **QA fails** → The label moves to `To Improve`. The scheduler dispatches DEV on the next tick. The orchestrator may need to adjust the model level.
4. **QA refines** → The task enters a holding state that _requires human decision_. This is the explicit escalation boundary. 4. **QA refines** → The task enters a holding state that _requires human decision_. This is the explicit escalation boundary.
The "refine" state is the most interesting from a delegation perspective. It's a conscious architectural decision that says: some judgments should not be automated. When the QA agent determines that a task needs rethinking rather than just fixing, it escalates to the only actor who has the full business context — the human. The "refine" state is the most interesting from a delegation perspective. It's a conscious architectural decision that says: some judgments should not be automated. When the QA agent determines that a task needs rethinking rather than just fixing, it escalates to the only actor who has the full business context — the human.

View File

@@ -244,7 +244,7 @@ Change which model powers each level in `openclaw.json` — see [Configuration](
| Developer assignment | Plugin | LLM-selected level by orchestrator, keyword heuristic fallback | | Developer assignment | Plugin | LLM-selected level by orchestrator, keyword heuristic fallback |
| State management | Plugin | Atomic read/write to `projects.json` | | State management | Plugin | Atomic read/write to `projects.json` |
| Session management | Plugin | Creates, reuses, and dispatches to sessions via CLI. Agent never touches session tools. | | Session management | Plugin | Creates, reuses, and dispatches to sessions via CLI. Agent never touches session tools. |
| Task completion | Plugin (`work_finish`) | Workers self-report. Auto-chains if enabled. | | Task completion | Plugin (`work_finish`) | Workers self-report. Scheduler dispatches next role. |
| Prompt instructions | Plugin (`work_start`) | Loaded from `projects/roles/<project>/<role>.md`, appended to task message | | Prompt instructions | Plugin (`work_start`) | Loaded from `projects/roles/<project>/<role>.md`, appended to task message |
| Audit logging | Plugin | Automatic NDJSON append per tool call | | Audit logging | Plugin | Automatic NDJSON append per tool call |
| Zombie detection | Plugin | `health` checks active vs alive | | Zombie detection | Plugin | `health` checks active vs alive |

View File

@@ -30,7 +30,7 @@ Roles become a configurable list instead of a hardcoded pair. Each role defines:
} }
``` ```
The pipeline definition replaces the hardcoded `Doing → To Test → Testing → Done` flow. Labels and transitions are generated from the pipeline config. Auto-chaining follows the pipeline order. The pipeline definition replaces the hardcoded `Doing → To Test → Testing → Done` flow. Labels and transitions are generated from the pipeline config. The scheduler follows the pipeline order when filling free slots.
### Open questions ### Open questions

View File

@@ -90,7 +90,7 @@ Complete a task with a result. Called by workers (DEV/QA sub-agent sessions) dir
6. Ticks queue to fill free worker slots 6. Ticks queue to fill free worker slots
7. Writes audit log 7. Writes audit log
**Auto-chaining** (when enabled on the project): `dev:done` dispatches QA automatically. `qa:fail` re-dispatches DEV using the previous level. **Scheduling:** After completion, `work_finish` ticks the queue. The scheduler sees the new label (`To Test` or `To Improve`) and dispatches the next worker if a slot is free.
--- ---

View File

@@ -102,7 +102,7 @@ All orchestration goes through these tools. You do NOT manually manage sessions,
| \`status\` | Task queue and worker state per project (lightweight dashboard) | | \`status\` | Task queue and worker state per project (lightweight dashboard) |
| \`health\` | Scan worker health: zombies, stale workers, orphaned state. Pass fix=true to auto-fix | | \`health\` | Scan worker health: zombies, stale workers, orphaned state. Pass fix=true to auto-fix |
| \`work_start\` | End-to-end: label transition, level assignment, session create/reuse, dispatch with role instructions | | \`work_start\` | End-to-end: label transition, level assignment, session create/reuse, dispatch with role instructions |
| \`work_finish\` | End-to-end: label transition, state update, issue close/reopen. Auto-ticks queue after completion. | | \`work_finish\` | End-to-end: label transition, state update, issue close/reopen. Ticks scheduler after completion. |
### Pipeline Flow ### Pipeline Flow
@@ -135,10 +135,10 @@ Evaluate each task and pass the appropriate developer level to \`work_start\`:
### When Work Completes ### When Work Completes
Workers call \`work_finish\` themselves — the label transition, state update, and audit log happen atomically. After completion, \`work_finish\` auto-ticks the queue to fill free slots: Workers call \`work_finish\` themselves — the label transition, state update, and audit log happen atomically. After completion, \`work_finish\` ticks the scheduler to fill free slots:
- DEV "done" → issue moves to "To Test" → tick dispatches QA - DEV "done" → issue moves to "To Test" → scheduler dispatches QA
- QA "fail" → issue moves to "To Improve" → tick dispatches DEV - QA "fail" → issue moves to "To Improve" → scheduler dispatches DEV
- QA "pass" → Done, no further dispatch - QA "pass" → Done, no further dispatch
- QA "refine" / blocked → needs human input - QA "refine" / blocked → needs human input