refactor: rename QA role to Tester and update related documentation

- Updated role references from "QA" to "Tester" in workflow and code comments. - Revised documentation to reflect the new role structure, including role instructions and completion rules. - Enhanced the testing guide with clearer instructions and examples for unit and E2E tests. - Improved tools reference to align with the new role definitions and completion rules. - Adjusted the roadmap to highlight recent changes in role configuration and workflow state machine.
2026-02-16 13:55:38 +08:00
parent 371e760d94
commit f7aa47102f
8 changed files with 928 additions and 634 deletions
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -1,53 +1,77 @@
 # DevClaw — Roadmap

-## Configurable Roles
+## Recently Completed

-Currently DevClaw has two hardcoded roles: **DEV** and **QA**. Each project gets one worker slot per role. The pipeline is fixed: DEV writes code, QA reviews it.
+### Dynamic Roles and Role Registry

-This works for the common case but breaks down when you want:
- A **design** role that creates mockups before DEV starts
- A **devops** role that handles deployment after QA passes
- A **PM** role that triages and prioritizes the backlog
- Multiple DEV workers in parallel (e.g. frontend + backend)
- A project with no QA step at all
+Roles are no longer hardcoded. The `ROLE_REGISTRY` in `lib/roles/registry.ts` defines three built-in roles — **developer**, **tester**, **architect** — each with configurable levels, models, emoji, and completion results. Adding a new role means adding one entry to the registry; everything else (workers, sessions, labels, prompts) derives from it.

-### Planned: role configuration per project
+All roles use a unified junior/medior/senior level scheme (architect uses junior/senior). Per-role model overrides live in `workflow.yaml`.

-Roles become a configurable list instead of a hardcoded pair. Each role defines:
- **Name** — e.g. `design`, `dev`, `qa`, `devops`
- **Levels** — which developer levels can be assigned (e.g. design only needs `medior`)
- **Pipeline position** — where it sits in the task lifecycle
- **Worker count** — how many concurrent workers (default: 1)
+### Workflow State Machine

-```json
-{
-  "roles": {
-    "dev": { "levels": ["junior", "medior", "senior"], "workers": 1 },
-    "qa": { "levels": ["reviewer", "tester"], "workers": 1 },
-    "devops": { "levels": ["medior", "senior"], "workers": 1 }
-  },
-  "pipeline": ["dev", "qa", "devops"]
-}
+The issue lifecycle is now a configurable state machine defined in `workflow.yaml`. The default workflow defines 11 states:
+
+```
+Planning → To Do → Doing → To Test → Testing → Done
+                         ↘ In Review → (PR merged) → To Test
+                                    ↘ To Improve → Doing
+                                    ↘ Refining → (human decision)
+To Design → Designing → Planning
 ```

-The pipeline definition replaces the hardcoded `Doing → To Test → Testing → Done` flow. Labels and transitions are generated from the pipeline config. The scheduler follows the pipeline order when filling free slots.
+States have types (`queue`, `active`, `hold`, `review`, `terminal`), transitions with actions (`gitPull`, `detectPr`, `closeIssue`, `reopenIssue`), and review checks (`prMerged`, `prApproved`).

-### Open questions
+### Three-Layer Configuration

- How do custom labels map? Generate from role names, or let users define?
- Should roles have their own instruction files (`projects/roles/<project>/<role>.md`) — yes, this already works
- How to handle parallel roles (e.g. frontend + backend DEV in parallel before QA)?
+Config resolution follows three layers, each partially overriding the one below:
+
+1. **Built-in defaults** — `ROLE_REGISTRY` + `DEFAULT_WORKFLOW`
+2. **Workspace** — `<workspace>/devclaw/workflow.yaml`
+3. **Project** — `<workspace>/devclaw/projects/<project>/workflow.yaml`
+
+Validated at load time with Zod schemas (`lib/config/schema.ts`). Integrity checks verify transition targets exist, queue states have roles, and terminal states have no outgoing transitions.
+
+### Provider Resilience
+
+All issue tracker calls (GitHub via `gh`, GitLab via `glab`) are wrapped with cockatiel retry (3 attempts, exponential backoff) and circuit breaker (opens after 5 consecutive failures, half-opens after 30s). See `lib/providers/resilience.ts`.
+
+### Bootstrap Hook for Role Instructions
+
+Worker sessions receive role-specific instructions via the `agent:bootstrap` hook at session startup, not appended to the task message. The hook reads from `devclaw/projects/<project>/prompts/<role>.md`, falling back to `devclaw/prompts/<role>.md`. Supports source tracking with `loadRoleInstructions(dir, { withSource: true })`.
+
+### In Review State and PR Polling
+
+DEVELOPER can submit a PR for human review (`result: "review"`), which transitions the issue to `In Review`. The heartbeat's review pass polls PR status via `getPrStatus()` on the provider. When the PR is merged, the issue auto-transitions to `To Test` for TESTER pickup.
+
+### Architect Role
+
+The architect role enables design investigations. `design_task` creates a `To Design` issue and dispatches an architect worker. The architect completes with `done` (→ Planning) or `blocked` (→ Refining).
+
+### Workspace Layout Migration
+
+Data directory moved from `<workspace>/projects/` to `<workspace>/devclaw/`. Automatic migration on first load — see `lib/setup/migrate-layout.ts`.
+
+### E2E Test Infrastructure
+
+Purpose-built test harness (`lib/testing/`) with:
+- `TestProvider` — in-memory `IssueProvider` with call tracking
+- `createTestHarness()` — scaffolds temp workspace, mock `runCommand`, test provider
+- `simulateBootstrap()` — tests the full bootstrap hook chain without a live gateway
+- `CommandInterceptor` — captures and filters CLI calls

 ---

-## Channel-agnostic Groups
+## Planned
+
+### Channel-agnostic Groups

 Currently DevClaw maps projects to **Telegram group IDs**. The `projectGroupId` is a Telegram-specific negative number. This means:
 - WhatsApp groups can't be used as project channels (partially supported now via `channel` field)
 - Discord, Slack, or other channels are excluded
 - The naming (`groupId`, `groupName`) is Telegram-specific

-### Planned: abstract channel binding
+**Planned: abstract channel binding**

 Replace Telegram-specific group IDs with a generic channel identifier that works across any OpenClaw channel.

@@ -57,14 +81,12 @@ Replace Telegram-specific group IDs with a generic channel identifier that works
    "whatsapp:120363140032870788@g.us": {
      "name": "my-project",
      "channel": "whatsapp",
-      "peer": "120363140032870788@g.us",
-      ...
+      "peer": "120363140032870788@g.us"
    },
    "telegram:-1234567890": {
      "name": "other-project",
      "channel": "telegram",
-      "peer": "-1234567890",
-      ...
+      "peer": "-1234567890"
    }
  }
 }
@@ -79,7 +101,7 @@ Key changes:

 This enables any OpenClaw channel (Telegram, WhatsApp, Discord, Slack, etc.) to host a project.

-### Open questions
+#### Open questions

 - Should one project be bindable to multiple channels? (e.g. Telegram for devs, WhatsApp for stakeholder updates)
 - How does the orchestrator agent handle cross-channel context?
@@ -89,8 +111,9 @@ This enables any OpenClaw channel (Telegram, WhatsApp, Discord, Slack, etc.) to
 ## Other Ideas

 - **Jira provider** — `IssueProvider` interface already abstracts GitHub/GitLab; Jira is the obvious next addition
- **Deployment integration** — `work_finish` QA pass could trigger a deploy step via webhook or CLI
+- **Deployment integration** — `work_finish` TESTER pass could trigger a deploy step via webhook or CLI
 - **Cost tracking** — log token usage per task/level, surface in `status`
 - **Priority scoring** — automatic priority assignment based on labels, age, and dependencies
 - **Session archival** — auto-archive idle sessions after configurable timeout (currently indefinite)
- **Progressive delegation** — track QA pass rates per level and auto-promote (see [Management Theory](MANAGEMENT.md))
+- **Progressive delegation** — track TESTER pass rates per level and auto-promote (see [Management Theory](MANAGEMENT.md))
+- **Custom workflow actions** — user-defined actions in `workflow.yaml` (e.g. deploy scripts, notifications)