docs: remove auto-chaining, reframe around scheduling system

Auto-chaining was removed from the codebase. All docs now describe the scheduling model: work_finish transitions labels, the heartbeat's tick pass (which also fires immediately after every work_finish) detects available work and fills free slots. Removed autoChain config references. Files updated: README.md, README2.md, docs/TOOLS.md, ARCHITECTURE.md, ROADMAP.md, MANAGEMENT.md, ONBOARDING.md, lib/templates.ts https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
2026-02-11 04:20:25 +00:00
parent 261babdf61
commit 9d1e253f11
8 changed files with 44 additions and 61 deletions
--- a/README.md
+++ b/README.md
@@ -12,7 +12,7 @@ DevClaw is the [OpenClaw](https://openclaw.ai) plugin that makes this work.
 ## Why DevClaw
-OpenClaw gives you a powerful multi-agent runtime — channel bindings, session management, tool permissions, gateway RPC. But it's a general-purpose platform. It doesn't know what "pick up an issue" means, how to transition a label, when to reuse a session, or how to chain DEV completion into QA review. Managing a development workflow on raw OpenClaw means the orchestrator agent handles all of that through fragile, token-expensive LLM reasoning — and it gets it wrong often enough to need constant supervision. DevClaw encodes the entire development lifecycle into deterministic plugin code: level assignment, label transitions, session dispatch, auto-chaining, health checks, and audit logging. The agent calls one tool. The plugin does the rest. That's the difference between "an agent that can write code" and "a team that ships autonomously."
+OpenClaw gives you a powerful multi-agent runtime — channel bindings, session management, tool permissions, gateway RPC. But it's a general-purpose platform. It doesn't know what "pick up an issue" means, how to transition a label, when to reuse a session, or how to schedule QA after DEV completes. Managing a development workflow on raw OpenClaw means the orchestrator agent handles all of that through fragile, token-expensive LLM reasoning — and it gets it wrong often enough to need constant supervision. DevClaw encodes the entire development lifecycle into deterministic plugin code: level assignment, label transitions, session dispatch, scheduling, health checks, and audit logging. The agent calls one tool. The plugin does the rest. That's the difference between "an agent that can write code" and "a team that ships autonomously."
 ## Benefits
@@ -47,11 +47,10 @@ The heartbeat service runs a continuous loop: health check → queue scan → di
 ### Feedback loops
-Three automated feedback loops keep the pipeline self-correcting:
+Two automated feedback loops keep the pipeline self-correcting:
-1. **Auto-chaining** — DEV "done" automatically dispatches QA. QA "fail" automatically re-dispatches DEV. No orchestrator action needed.
+1. **Stale worker watchdog** — Workers active >2 hours are auto-detected. Labels revert to queue, workers deactivated. Tasks available for retry.
-2. **Stale worker watchdog** — Workers active >2 hours are auto-detected. Labels revert to queue, workers deactivated. Tasks available for retry.
+2. **Completion enforcement** — Every task message includes a mandatory `work_finish` section. Workers use `"blocked"` if stuck. Three-layer guarantee prevents tasks from getting stuck forever.
 3. **Completion enforcement** — Every task message includes a mandatory `work_finish` section. Workers use `"blocked"` if stuck. Three-layer guarantee prevents tasks from getting stuck forever.
 ### Role-based instruction prompts
@@ -114,12 +113,12 @@ stateDiagram-v2
    ToDo --> Doing: work_start (DEV) ⇄ blocked
    Doing --> ToTest: work_finish (DEV done)
-    ToTest --> Testing: work_start (QA) / auto-chain ⇄ blocked
+    ToTest --> Testing: work_start (QA) ⇄ blocked
    Testing --> Done: work_finish (QA pass)
    Testing --> ToImprove: work_finish (QA fail)
    Testing --> Refining: work_finish (QA refine)
-    ToImprove --> Doing: work_start (DEV fix) or auto-chain
+    ToImprove --> Doing: work_start (DEV fix)
    Refining --> ToDo: Human decision
    Done --> [*]
@@ -142,15 +141,6 @@ stateDiagram-v2
 Workers call `work_finish` directly when they're done — no orchestrator involvement needed for the state transition. Workers can also call `task_create` to file follow-up issues they discover during work.
 ### Auto-chaining
 When a project has auto-chaining enabled:
 - **DEV "done"** → QA is dispatched immediately (using the reviewer level)
 - **QA "fail"** → DEV fix is dispatched immediately (reuses previous DEV level)
 - **QA "pass" / "refine" / "blocked"** → no chaining (pipeline done, needs human input, or returned to queue)
 - **DEV "blocked"** → no chaining (returned to queue for retry)
 ### Completion enforcement
 Three layers guarantee tasks never get stuck:
@@ -238,7 +228,7 @@ DevClaw registers **11 tools**, grouped by function:
 | Tool | Description |
 |---|---|
 | [`work_start`](docs/TOOLS.md#work_start) | Pick up a task — handles level assignment, label transition, session dispatch, audit |
-| [`work_finish`](docs/TOOLS.md#work_finish) | Complete a task — handles label transition, state update, auto-chaining, queue tick |
+| [`work_finish`](docs/TOOLS.md#work_finish) | Complete a task — handles label transition, state update, queue tick |
 ### Task management
--- a/README2.md
+++ b/README2.md
@@ -64,7 +64,7 @@ Agent:  "✅ DEV DONE #44 — Profile page refactored. Moved to QA."
 Agent:  "📋 Moving #45 to To Do — dependency #44 is in QA."
 ```
-Three issues shipped, one sent back for a fix (and auto-retried), another project's migration completed — all while you slept. And when you dropped in, you planned work, reprioritized, and synced to your external tracker without leaving the chat. The heartbeat kept going before, during, and after.
+Three issues shipped, one sent back for a fix (the scheduler retried it automatically), another project's migration completed — all while you slept. And when you dropped in, you planned work, reprioritized, and synced to your external tracker without leaving the chat. The heartbeat kept going before, during, and after.
 ---
@@ -77,7 +77,7 @@ Without DevClaw, your orchestrator agent has to figure out on its own how to:
 - Create or reuse the right worker session
 - Transition issue labels in the right order
 - Track which worker is doing what across projects
- Chain DEV completion into QA review
+- Schedule QA after DEV completes, and re-schedule DEV after QA fails
 - Detect crashed workers and recover
 - Log everything for auditability
@@ -130,12 +130,12 @@ stateDiagram-v2
    ToDo --> Doing: DEV picks up
    Doing --> ToTest: DEV done
-    ToTest --> Testing: QA picks up (or auto-chains)
+    ToTest --> Testing: Scheduler picks up QA
    Testing --> Done: QA pass (issue closed)
    Testing --> ToImprove: QA fail (back to DEV)
    Testing --> Refining: QA needs human input
-    ToImprove --> Doing: DEV fixes (or auto-chains)
+    ToImprove --> Doing: Scheduler picks up DEV fix
    Refining --> ToDo: Human decides
    Done --> [*]
@@ -164,10 +164,10 @@ If step 4 fails, step 3 is rolled back. No half-states, no orphaned labels, no "
 When a developer finishes, they call `work_finish` directly — no orchestrator involved:
- **DEV "done"** → label moves to `To Test`, QA starts automatically
+- **DEV "done"** → label moves to `To Test`, scheduler picks up QA on next tick
 - **DEV "blocked"** → label moves back to `To Do`, task returns to queue
 - **QA "pass"** → label moves to `Done`, issue closes
- **QA "fail"** → label moves to `To Improve`, DEV gets re-dispatched
+- **QA "fail"** → label moves to `To Improve`, scheduler picks up DEV on next tick
 The orchestrator doesn't need to poll, check, or coordinate. Workers are self-reporting.
@@ -193,27 +193,27 @@ Full trace of every task, every level selection, every label transition, every h
 ## Automatic scheduling
-DevClaw doesn't wait for you to tell it what to do next. A background heartbeat service continuously scans for available work and dispatches workers — zero LLM tokens, pure deterministic code.
+DevClaw doesn't wait for you to tell it what to do next. A background scheduling system continuously scans for available work and dispatches workers — zero LLM tokens, pure deterministic code. This is the engine that keeps the pipeline moving: when DEV finishes, the scheduler sees a `To Test` issue and dispatches QA. When QA fails, the scheduler sees a `To Improve` issue and dispatches DEV. No hand-offs, no orchestrator reasoning — just label-driven scheduling.
-### The heartbeat
+### The `work_heartbeat`
-Every tick, the service runs two passes:
+Every tick (default: 60 seconds), the scheduler runs two passes:
 1. **Health pass** — detects workers stuck for >2 hours, reverts their labels back to queue, deactivates them. Catches crashed sessions, context overflows, or workers that died without reporting back.
 2. **Queue pass** — scans for available tasks by priority (`To Improve` > `To Test` > `To Do`), fills free worker slots. DEV and QA slots are filled independently.
-All CLI calls and JSON reads. Workers only consume tokens when they actually start coding or reviewing.
+All CLI calls and JSON reads. Workers only consume tokens when they actually start coding or reviewing. The scheduler also fires immediately after every `work_finish` (as a tick), so transitions happen without waiting for the next interval.
-### Auto-chaining
+### How tasks flow between roles
-When enabled, task completions automatically trigger the next step:
+When a worker calls `work_finish`, the plugin transitions the label. The scheduler picks up the rest:
- **DEV "done"** → QA reviewer is dispatched immediately
+- **DEV "done"** → label moves to `To Test` → next tick dispatches QA
- **QA "fail"** → DEV is re-dispatched at the same level that originally worked on it
+- **QA "fail"** → label moves to `To Improve` → next tick dispatches DEV (reuses previous level)
- **QA "pass"** → issue closes, pipeline done
+- **QA "pass"** → label moves to `Done`, issue closes
- **"blocked"** → task returns to queue for retry, no chaining
+- **"blocked"** → label reverts to queue (`To Do` or `To Test`) for retry
-No orchestrator involvement. The worker calls `work_finish`, the plugin handles the rest.
+No orchestrator involvement. Workers self-report, the scheduler fills free slots.
 ### Execution modes
@@ -251,7 +251,6 @@ Per-project settings live in `projects.json`:
 {
  "-1234567890": {
    "name": "my-app",
    "autoChain": true,
    "roleExecution": "parallel"
  }
 }
@@ -263,7 +262,6 @@ Per-project settings live in `projects.json`:
 | `work_heartbeat.intervalSeconds` | `openclaw.json` | `60` | Seconds between ticks |
 | `work_heartbeat.maxPickupsPerTick` | `openclaw.json` | `4` | Max workers dispatched per tick |
 | `projectExecution` | `openclaw.json` | `"parallel"` | All projects at once, or one at a time |
 | `autoChain` | `projects.json` | `false` | Auto-dispatch next step on completion |
 | `roleExecution` | `projects.json` | `"parallel"` | DEV+QA at once, or one role at a time |
 See the [Configuration reference](docs/CONFIGURATION.md) for the full schema.
@@ -367,7 +365,7 @@ DevClaw gives the orchestrator 11 tools. These aren't just convenience wrappers
 | Tool | What it does |
 |---|---|
 | `work_start` | Pick up a task — resolves level, transitions label, dispatches session, logs audit |
-| `work_finish` | Complete a task — transitions label, updates state, auto-chains next step, ticks queue |
+| `work_finish` | Complete a task — transitions label, updates state, ticks queue for next dispatch |
 | `task_create` | Create a new issue (used by workers to file bugs they discover) |
 | `task_update` | Manually change an issue's state label |
 | `task_comment` | Add a comment to an issue (with role attribution) |
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -174,7 +174,7 @@ graph TB
    WF -->|closes/reopens| GL
    WF -->|reads/writes| PJ
    WF -->|git pull| REPO
-    WF -->|auto-chain dispatch| CLI
+    WF -->|tick dispatch| CLI
    WF -->|appends| AL
    TCR -->|creates issue| GL
@@ -374,7 +374,7 @@ sequenceDiagram
    participant PJ as projects.json
    participant AL as audit.log
    participant REPO as Git Repo
-    participant QA as QA Session (auto-chain)
+    participant QA as QA Session
    DEV->>WF: work_finish({ role: "dev", result: "done", projectGroupId: "-123", summary: "Login page with OAuth" })
    WF->>PJ: readProjects()
@@ -385,21 +385,16 @@ sequenceDiagram
    WF->>GL: transitionLabel "Doing" → "To Test"
    WF->>AL: append { event: "work_finish", role: "dev", result: "done" }
-    alt autoChain enabled
+    WF->>WF: tick queue (fill free slots)
-        WF->>GL: transitionLabel "To Test" → "Testing"
+    Note over WF: Scheduler sees "To Test" issue, QA slot free → dispatches QA
-        WF->>QA: dispatchTask(role: "qa", level: "reviewer")
+    WF-->>DEV: { announcement: "✅ DEV DONE #42", tickPickups: [...] }
        WF->>PJ: activateWorker(-123, qa)
        WF-->>DEV: { announcement: "✅ DEV DONE #42", autoChain: { dispatched: true, role: "qa" } }
    else autoChain disabled
        WF-->>DEV: { announcement: "✅ DEV DONE #42", nextAction: "qa_pickup" }
    end
 ```
 **Writes:**
 - `Git repo`: pulled latest (has DEV's merged code)
 - `projects.json`: dev.active=false, dev.issueId=null (sessions map preserved for reuse)
- `Issue Tracker`: label "Doing" → "To Test" (+ "To Test" → "Testing" if auto-chain)
+- `Issue Tracker`: label "Doing" → "To Test"
- `audit.log`: 1 entry (work_finish) + optional auto-chain entries
+- `audit.log`: 1 entry (work_finish) + tick entries if workers dispatched
 ### Phase 6: QA pickup
@@ -462,7 +457,7 @@ DEV Blocked: "Doing" → "To Do"
 QA Blocked:  "Testing" → "To Test"
 ```
-Worker cannot complete (missing info, environment errors, etc.). Issue returns to queue for retry. No auto-chain — the task is available for the next heartbeat pickup.
+Worker cannot complete (missing info, environment errors, etc.). Issue returns to queue for retry. The task is available for the next heartbeat pickup.
 ### Completion enforcement
@@ -517,7 +512,7 @@ Every piece of data and where it lives:
 │                                                                 │
 │  setup          → agent creation + workspace + model config     │
 │  work_start     → level + label + dispatch + role instr (e2e)   │
-│  work_finish    → label + state + git pull + auto-chain         │
+│  work_finish    → label + state + git pull + tick queue          │
 │  task_create    → create issue in tracker                       │
 │  task_update    → manual label state change                     │
 │  task_comment   → add comment to issue                          │
@@ -588,7 +583,7 @@ graph LR
        PR[Project registration]
        SETUP[Agent + workspace setup]
        SD[Session dispatch<br/>create + send via CLI]
-        AC[Auto-chaining<br/>DEV→QA, QA fail→DEV]
+        AC[Scheduling<br/>tick queue after work_finish]
        RI[Role instructions<br/>loaded per project]
        A[Audit logging]
        Z[Zombie cleanup]
--- a/docs/MANAGEMENT.md
+++ b/docs/MANAGEMENT.md
@@ -29,9 +29,9 @@ Classical management theory — later formalized by Bernard Bass in his work on
 DevClaw's task lifecycle is built on this. The orchestrator delegates a task via `work_start`, then steps away. It only re-engages in three scenarios:
-1. **DEV completes work** → The task moves to QA automatically. No orchestrator involvement needed.
+1. **DEV completes work** → The label moves to `To Test`. The scheduler dispatches QA on the next tick. No orchestrator involvement needed.
 2. **QA passes** → The issue closes. Pipeline complete.
-3. **QA fails** → The task cycles back to DEV with a fix request. The orchestrator may need to adjust the model level.
+3. **QA fails** → The label moves to `To Improve`. The scheduler dispatches DEV on the next tick. The orchestrator may need to adjust the model level.
 4. **QA refines** → The task enters a holding state that _requires human decision_. This is the explicit escalation boundary.
 The "refine" state is the most interesting from a delegation perspective. It's a conscious architectural decision that says: some judgments should not be automated. When the QA agent determines that a task needs rethinking rather than just fixing, it escalates to the only actor who has the full business context — the human.
--- a/docs/ONBOARDING.md
+++ b/docs/ONBOARDING.md
@@ -244,7 +244,7 @@ Change which model powers each level in `openclaw.json` — see [Configuration](
 | Developer assignment | Plugin | LLM-selected level by orchestrator, keyword heuristic fallback |
 | State management | Plugin | Atomic read/write to `projects.json` |
 | Session management | Plugin | Creates, reuses, and dispatches to sessions via CLI. Agent never touches session tools. |
-| Task completion | Plugin (`work_finish`) | Workers self-report. Auto-chains if enabled. |
+| Task completion | Plugin (`work_finish`) | Workers self-report. Scheduler dispatches next role. |
 | Prompt instructions | Plugin (`work_start`) | Loaded from `projects/roles/<project>/<role>.md`, appended to task message |
 | Audit logging | Plugin | Automatic NDJSON append per tool call |
 | Zombie detection | Plugin | `health` checks active vs alive |
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -30,7 +30,7 @@ Roles become a configurable list instead of a hardcoded pair. Each role defines:
 }
 ```
-The pipeline definition replaces the hardcoded `Doing → To Test → Testing → Done` flow. Labels and transitions are generated from the pipeline config. Auto-chaining follows the pipeline order.
+The pipeline definition replaces the hardcoded `Doing → To Test → Testing → Done` flow. Labels and transitions are generated from the pipeline config. The scheduler follows the pipeline order when filling free slots.
 ### Open questions
--- a/docs/TOOLS.md
+++ b/docs/TOOLS.md
@@ -90,7 +90,7 @@ Complete a task with a result. Called by workers (DEV/QA sub-agent sessions) dir
 6. Ticks queue to fill free worker slots
 7. Writes audit log
-**Auto-chaining** (when enabled on the project): `dev:done` dispatches QA automatically. `qa:fail` re-dispatches DEV using the previous level.
+**Scheduling:** After completion, `work_finish` ticks the queue. The scheduler sees the new label (`To Test` or `To Improve`) and dispatches the next worker if a slot is free.
 ---
--- a/lib/templates.ts
+++ b/lib/templates.ts
@@ -102,7 +102,7 @@ All orchestration goes through these tools. You do NOT manually manage sessions,
 | \`status\` | Task queue and worker state per project (lightweight dashboard) |
 | \`health\` | Scan worker health: zombies, stale workers, orphaned state. Pass fix=true to auto-fix |
 | \`work_start\` | End-to-end: label transition, level assignment, session create/reuse, dispatch with role instructions |
-| \`work_finish\` | End-to-end: label transition, state update, issue close/reopen. Auto-ticks queue after completion. |
+| \`work_finish\` | End-to-end: label transition, state update, issue close/reopen. Ticks scheduler after completion. |
 ### Pipeline Flow
@@ -135,10 +135,10 @@ Evaluate each task and pass the appropriate developer level to \`work_start\`:
 ### When Work Completes
-Workers call \`work_finish\` themselves — the label transition, state update, and audit log happen atomically. After completion, \`work_finish\` auto-ticks the queue to fill free slots:
+Workers call \`work_finish\` themselves — the label transition, state update, and audit log happen atomically. After completion, \`work_finish\` ticks the scheduler to fill free slots:
- DEV "done" → issue moves to "To Test" → tick dispatches QA
+- DEV "done" → issue moves to "To Test" → scheduler dispatches QA
- QA "fail" → issue moves to "To Improve" → tick dispatches DEV
+- QA "fail" → issue moves to "To Improve" → scheduler dispatches DEV
 - QA "pass" → Done, no further dispatch
 - QA "refine" / blocked → needs human input