docs: overhaul documentation for consistency with implementation

Complete documentation rewrite to match the current codebase: - README: add benefits section (process consistency, token savings with estimates, project isolation, continuous planning, feedback loops, role-based prompts, atomic operations, audit trail), task workflow with state diagram, model-to-role mapping tables, installation guide - New TOOLS.md: complete reference for all 11 tools with parameters, behavior, and execution guards - New CONFIGURATION.md: full config reference for openclaw.json, projects.json, heartbeat, notifications, workspace layout - Fix tool names across all docs: task_pickup→work_start, task_complete→work_finish - Fix tier model: QA has reviewer/tester levels, not flat "qa" - Fix config schema: nested models.dev.*/models.qa.* structure - Fix prompt path: projects/roles/ not projects/prompts/ - Fix worker state: uses "level" field not "model"/"tier" - Fix MANAGEMENT.md: remove incorrect model references - Fix TESTING.md: update model config example to nested structure - Remove VERIFICATION.md (one-off checklist, no longer needed) - Add cross-references between all docs pages https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
2026-02-10 20:13:22 +00:00
parent ead4807797
commit 553efcc146
11 changed files with 1388 additions and 897 deletions
--- a/docs/MANAGEMENT.md
+++ b/docs/MANAGEMENT.md
@@ -12,14 +12,14 @@ DevClaw exists because of a gap that management theorists identified decades ago

 In 1969, Paul Hersey and Ken Blanchard published what would become Situational Leadership Theory. The central idea is deceptively simple: the way you delegate should match the capability and reliability of the person doing the work. You don't hand an intern the system architecture redesign. You don't ask your principal engineer to rename a CSS class.

-DevClaw's model selection does exactly this. When a task comes in, the plugin evaluates complexity from the issue title and description, then routes it to the cheapest model that can handle it:
+DevClaw's level selection does exactly this. When a task comes in, the plugin routes it to the cheapest model that can handle it:

-| Complexity                       | Model  | Analogy                     |
-| -------------------------------- | ------ | --------------------------- |
-| Simple (typos, renames, copy)    | Haiku  | Junior dev — just execute   |
-| Standard (features, bug fixes)   | Sonnet | Mid-level — think and build |
-| Complex (architecture, security) | Opus   | Senior — design and reason  |
-| Review                           | Grok   | Independent reviewer        |
+| Complexity                       | Level    | Analogy                     |
+| -------------------------------- | -------- | --------------------------- |
+| Simple (typos, renames, copy)    | Junior   | The intern — just execute   |
+| Standard (features, bug fixes)   | Medior   | Mid-level — think and build |
+| Complex (architecture, security) | Senior   | The architect — design and reason |
+| Review                           | Reviewer | Independent code reviewer   |

 This isn't just cost optimization. It mirrors what effective managers do instinctively: match the delegation level to the task, not to a fixed assumption about the delegate.

@@ -27,11 +27,11 @@ This isn't just cost optimization. It mirrors what effective managers do instinc

 Classical management theory — later formalized by Bernard Bass in his work on Transformational Leadership — introduced a concept called Management by Exception (MBE). The principle: a manager should only be pulled back into a workstream when something deviates from the expected path.

-DevClaw's task lifecycle is built on this. The orchestrator delegates a task via `task_pickup`, then steps away. It only re-engages in three scenarios:
+DevClaw's task lifecycle is built on this. The orchestrator delegates a task via `work_start`, then steps away. It only re-engages in three scenarios:

 1. **DEV completes work** → The task moves to QA automatically. No orchestrator involvement needed.
 2. **QA passes** → The issue closes. Pipeline complete.
-3. **QA fails** → The task cycles back to DEV with a fix request. The orchestrator may need to adjust the model tier.
+3. **QA fails** → The task cycles back to DEV with a fix request. The orchestrator may need to adjust the model level.
 4. **QA refines** → The task enters a holding state that _requires human decision_. This is the explicit escalation boundary.

 The "refine" state is the most interesting from a delegation perspective. It's a conscious architectural decision that says: some judgments should not be automated. When the QA agent determines that a task needs rethinking rather than just fixing, it escalates to the only actor who has the full business context — the human.
@@ -61,7 +61,7 @@ One of the most common delegation failures is self-review. You don't ask the per
 DevClaw enforces structural separation between development and review by design:

 - DEV and QA are separate sub-agent sessions with separate state.
- QA uses a different model entirely (Grok), introducing genuine independence.
+- QA uses the reviewer level, which can be a different model entirely, introducing genuine independence.
 - The review happens after a clean label transition — QA picks up from `To Test`, not from watching DEV work in real time.

 This mirrors a principle from organizational design: effective controls require independence between execution and verification. It's the same reason companies separate their audit function from their operations.
@@ -72,7 +72,7 @@ Ronald Coase won a Nobel Prize for explaining why firms exist: transaction costs

 DevClaw applies the same logic to AI sessions. Spawning a new sub-agent session costs approximately 50,000 tokens of context loading — the agent needs to read the full codebase before it can do useful work. That's the onboarding cost.

-The plugin tracks session IDs across task completions. When a DEV finishes task A and task B is ready on the same project, DevClaw detects the existing session and returns `"sessionAction": "send"` instead of `"spawn"`. The orchestrator routes the new task to the running session. No re-onboarding. No context reload.
+The plugin tracks session keys across task completions. When a DEV finishes task A and task B is ready on the same project, DevClaw detects the existing session and reuses it instead of spawning a new one. No re-onboarding. No context reload.

 In management terms: keep your team stable. Reassigning the same person to the next task on their project is almost always cheaper than bringing in someone new — even if the new person is theoretically better qualified.

@@ -101,11 +101,11 @@ This is the deepest lesson from delegation theory: **good delegation isn't about

 Management research points to a few directions that could extend DevClaw's delegation model:

-**Progressive delegation.** Blanchard's model suggests increasing task complexity for delegates as they prove competent. DevClaw could track QA pass rates per model tier and automatically promote — if Haiku consistently passes QA on borderline tasks, start routing more work to it. This is how good managers develop their people, and it reduces cost over time.
+**Progressive delegation.** Blanchard's model suggests increasing task complexity for delegates as they prove competent. DevClaw could track QA pass rates per model level and automatically promote — if junior consistently passes QA on borderline tasks, start routing more work to it. This is how good managers develop their people, and it reduces cost over time.

 **Delegation authority expansion.** The Vroom-Yetton decision model maps when a leader should decide alone versus consulting the team. Currently, sub-agents have narrow authority — they execute tasks but can't restructure the backlog. Selectively expanding this (e.g., allowing a DEV agent to split a task it judges too large) would reduce orchestrator bottlenecks, mirroring how managers gradually give high-performers more autonomy.

-**Outcome-based learning.** Delegation research emphasizes that the _delegator_ learns from outcomes too. Aggregated metrics — QA fail rate by model tier, average cycles to Done, time-in-state distributions — would help both the orchestrator agent and the human calibrate their delegation patterns over time.
+**Outcome-based learning.** Delegation research emphasizes that the _delegator_ learns from outcomes too. Aggregated metrics — QA fail rate by model level, average cycles to Done, time-in-state distributions — would help both the orchestrator agent and the human calibrate their delegation patterns over time.

 ---