Show two projects running overnight with heartbeat-driven dispatch, auto-chaining, QA failures cycling back to DEV, and different developer levels — all without human involvement. Manual mode shown as secondary. https://claude.ai/code/session_01R3rGevPY748gP4uK2ggYag
388 lines
15 KiB
Markdown
388 lines
15 KiB
Markdown
<p align="center">
|
|
<img src="assets/DevClaw.png" width="300" alt="DevClaw Logo">
|
|
</p>
|
|
|
|
# DevClaw
|
|
|
|
**Turn any group chat into a dev team that ships.**
|
|
|
|
DevClaw is a plugin for [OpenClaw](https://openclaw.ai) that turns your orchestrator agent into a development manager. It hires developers, assigns tasks, reviews code, and keeps the pipeline moving — across as many projects as you have group chats.
|
|
|
|
---
|
|
|
|
## What it looks like
|
|
|
|
You have two projects in two Telegram groups. You go to bed. You wake up:
|
|
|
|
```
|
|
── Group: "Dev - My Webapp" ──────────────────────────────
|
|
|
|
Agent: "⚡ Sending DEV (medior) for #42: Add login page"
|
|
Agent: "✅ DEV DONE #42 — Login page with OAuth. Moved to QA."
|
|
Agent: "🔍 Sending QA (reviewer) for #42: Add login page"
|
|
Agent: "🎉 QA PASS #42. Issue closed."
|
|
Agent: "⚡ Sending DEV (junior) for #43: Fix button color on /settings"
|
|
Agent: "✅ DEV DONE #43 — Updated to brand blue. Moved to QA."
|
|
Agent: "🔍 Sending QA (reviewer) for #43: Fix button color on /settings"
|
|
Agent: "❌ QA FAIL #43 — Color doesn't match dark mode. Back to DEV."
|
|
Agent: "⚡ Sending DEV (junior) for #43: Fix button color on /settings"
|
|
|
|
── Group: "Dev - My API" ─────────────────────────────────
|
|
|
|
Agent: "🧠 Spawning DEV (senior) for #18: Migrate auth to OAuth2"
|
|
Agent: "✅ DEV DONE #18 — OAuth2 provider with refresh tokens. Moved to QA."
|
|
Agent: "🔍 Sending QA (reviewer) for #18: Migrate auth to OAuth2"
|
|
Agent: "🎉 QA PASS #18. Issue closed."
|
|
Agent: "⚡ Sending DEV (medior) for #19: Add rate limiting to /api/search"
|
|
```
|
|
|
|
Three issues shipped, one sent back for a fix (and auto-retried), another project's migration completed — all while you slept. The heartbeat scanned the queues, dispatched workers, chained DEV into QA, and chained QA failures back to DEV. No human in the loop.
|
|
|
|
You can also drive it manually:
|
|
|
|
```
|
|
You: "Check the queue"
|
|
Agent: "2 issues in To Do. DEV is idle. QA is idle."
|
|
|
|
You: "Pick up #44 for DEV"
|
|
Agent: "⚡ Sending DEV (medior) for #44: Refactor user profile page"
|
|
```
|
|
|
|
Same agent, as many groups as you want, fully isolated teams per project.
|
|
|
|
---
|
|
|
|
## The problem DevClaw solves
|
|
|
|
OpenClaw is a great multi-agent runtime. It handles sessions, tools, channels, gateway RPC — everything you need to run AI agents. But it's a general-purpose platform. It has no opinion about how software gets built.
|
|
|
|
Without DevClaw, your orchestrator agent has to figure out on its own how to:
|
|
- Pick the right model for the task complexity
|
|
- Create or reuse the right worker session
|
|
- Transition issue labels in the right order
|
|
- Track which worker is doing what across projects
|
|
- Chain DEV completion into QA review
|
|
- Detect crashed workers and recover
|
|
- Log everything for auditability
|
|
|
|
That's a lot of reasoning per task. LLMs do it imperfectly — they forget steps, corrupt state, pick the wrong model, lose session references. You end up babysitting the thing you built to avoid babysitting.
|
|
|
|
DevClaw moves all of that into deterministic plugin code. The agent says "pick up issue #42." The plugin handles the other 10 steps atomically. Every time, the same way, zero reasoning tokens spent on orchestration.
|
|
|
|
---
|
|
|
|
## Meet your team
|
|
|
|
DevClaw doesn't think in model IDs. It thinks in people.
|
|
|
|
When a task comes in, you don't configure `anthropic/claude-sonnet-4-5` — you assign a **medior developer**. The orchestrator evaluates task complexity and picks the right person for the job:
|
|
|
|
### Developers
|
|
|
|
| Level | Assigns to | Model |
|
|
|---|---|---|
|
|
| **Junior** | Typos, CSS fixes, renames, single-file changes | Haiku |
|
|
| **Medior** | Features, bug fixes, multi-file changes | Sonnet |
|
|
| **Senior** | Architecture, migrations, system-wide refactoring | Opus |
|
|
|
|
### QA
|
|
|
|
| Level | Assigns to | Model |
|
|
|---|---|---|
|
|
| **Reviewer** | Code review, test validation, PR inspection | Sonnet |
|
|
| **Tester** | Manual testing, smoke tests | Haiku |
|
|
|
|
A CSS typo gets the intern. A database migration gets the architect. You're not burning Opus tokens on a color change, and you're not sending Haiku to redesign your auth system.
|
|
|
|
Every mapping is [configurable](docs/CONFIGURATION.md#model-tiers) — swap in any model you want per level.
|
|
|
|
---
|
|
|
|
## How a task moves through the pipeline
|
|
|
|
Every issue follows the same path, no exceptions. DevClaw enforces it:
|
|
|
|
```
|
|
Planning → To Do → Doing → To Test → Testing → Done
|
|
```
|
|
|
|
```mermaid
|
|
stateDiagram-v2
|
|
[*] --> Planning
|
|
Planning --> ToDo: Ready for development
|
|
|
|
ToDo --> Doing: DEV picks up
|
|
Doing --> ToTest: DEV done
|
|
|
|
ToTest --> Testing: QA picks up (or auto-chains)
|
|
Testing --> Done: QA pass (issue closed)
|
|
Testing --> ToImprove: QA fail (back to DEV)
|
|
Testing --> Refining: QA needs human input
|
|
|
|
ToImprove --> Doing: DEV fixes (or auto-chains)
|
|
Refining --> ToDo: Human decides
|
|
|
|
Done --> [*]
|
|
```
|
|
|
|
These labels live on your actual GitHub/GitLab issues. Not in some internal database — in the tool you already use. Filter by `Doing` in GitHub to see what's in progress. Set up a webhook on `Done` to trigger deploys. The issue tracker is the source of truth.
|
|
|
|
### What "atomic" means here
|
|
|
|
When you say "pick up #42 for DEV", the plugin does all of this in one operation:
|
|
1. Verifies the issue is in the right state
|
|
2. Picks the developer level (or uses what you specified)
|
|
3. Transitions the label (`To Do` → `Doing`)
|
|
4. Creates or reuses the right worker session
|
|
5. Dispatches the task with project-specific instructions
|
|
6. Updates internal state
|
|
7. Logs an audit entry
|
|
|
|
If step 4 fails, step 3 is rolled back. No half-states, no orphaned labels, no "the issue says Doing but nobody's working on it."
|
|
|
|
---
|
|
|
|
## What happens behind the scenes
|
|
|
|
### Workers report back themselves
|
|
|
|
When a developer finishes, they call `work_finish` directly — no orchestrator involved:
|
|
|
|
- **DEV "done"** → label moves to `To Test`, QA starts automatically
|
|
- **DEV "blocked"** → label moves back to `To Do`, task returns to queue
|
|
- **QA "pass"** → label moves to `Done`, issue closes
|
|
- **QA "fail"** → label moves to `To Improve`, DEV gets re-dispatched
|
|
|
|
The orchestrator doesn't need to poll, check, or coordinate. Workers are self-reporting.
|
|
|
|
### Sessions accumulate context
|
|
|
|
Each developer level gets its own persistent session per project. Your medior dev that's done 5 features on `my-app` already knows the codebase — it doesn't re-read 50K tokens of source code every time it picks up a new task.
|
|
|
|
That's a **~40-60% token saving per task** from session reuse alone.
|
|
|
|
Combined with tier selection (not using Opus when Haiku will do) and the token-free heartbeat (more on that next), DevClaw significantly reduces your token bill versus running everything through one large model.
|
|
|
|
### Everything is logged
|
|
|
|
Every tool call writes an NDJSON line to `audit.log`:
|
|
|
|
```bash
|
|
cat audit.log | jq 'select(.event=="work_start")'
|
|
```
|
|
|
|
Full trace of every task, every level selection, every label transition, every health fix. No manual logging needed.
|
|
|
|
---
|
|
|
|
## Automatic scheduling
|
|
|
|
DevClaw doesn't wait for you to tell it what to do next. A background heartbeat service continuously scans for available work and dispatches workers — zero LLM tokens, pure deterministic code.
|
|
|
|
### The heartbeat
|
|
|
|
Every tick, the service runs two passes:
|
|
|
|
1. **Health pass** — detects workers stuck for >2 hours, reverts their labels back to queue, deactivates them. Catches crashed sessions, context overflows, or workers that died without reporting back.
|
|
2. **Queue pass** — scans for available tasks by priority (`To Improve` > `To Test` > `To Do`), fills free worker slots. DEV and QA slots are filled independently.
|
|
|
|
All CLI calls and JSON reads. Workers only consume tokens when they actually start coding or reviewing.
|
|
|
|
### Auto-chaining
|
|
|
|
When enabled, task completions automatically trigger the next step:
|
|
|
|
- **DEV "done"** → QA reviewer is dispatched immediately
|
|
- **QA "fail"** → DEV is re-dispatched at the same level that originally worked on it
|
|
- **QA "pass"** → issue closes, pipeline done
|
|
- **"blocked"** → task returns to queue for retry, no chaining
|
|
|
|
No orchestrator involvement. The worker calls `work_finish`, the plugin handles the rest.
|
|
|
|
### Execution modes
|
|
|
|
Each project is fully isolated — its own queue, workers, sessions, state. No cross-project contamination. Two levels of parallelism control how work gets scheduled:
|
|
|
|
- **Project-level (`roleExecution`)** — DEV and QA work simultaneously on different tasks (default: `parallel`) or take turns (`sequential`)
|
|
- **Plugin-level (`projectExecution`)** — all registered projects dispatch workers independently (default: `parallel`) or only one project runs at a time (`sequential`)
|
|
|
|
### Configuration
|
|
|
|
All scheduling behavior is configurable in `openclaw.json`:
|
|
|
|
```json
|
|
{
|
|
"plugins": {
|
|
"entries": {
|
|
"devclaw": {
|
|
"config": {
|
|
"work_heartbeat": {
|
|
"enabled": true,
|
|
"intervalSeconds": 60,
|
|
"maxPickupsPerTick": 4
|
|
},
|
|
"projectExecution": "parallel"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Per-project settings live in `projects.json`:
|
|
|
|
```json
|
|
{
|
|
"-1234567890": {
|
|
"name": "my-app",
|
|
"autoChain": true,
|
|
"roleExecution": "parallel"
|
|
}
|
|
}
|
|
```
|
|
|
|
| Setting | Where | Default | What it controls |
|
|
|---|---|---|---|
|
|
| `work_heartbeat.enabled` | `openclaw.json` | `true` | Turn the heartbeat on/off |
|
|
| `work_heartbeat.intervalSeconds` | `openclaw.json` | `60` | Seconds between ticks |
|
|
| `work_heartbeat.maxPickupsPerTick` | `openclaw.json` | `4` | Max workers dispatched per tick |
|
|
| `projectExecution` | `openclaw.json` | `"parallel"` | All projects at once, or one at a time |
|
|
| `autoChain` | `projects.json` | `false` | Auto-dispatch next step on completion |
|
|
| `roleExecution` | `projects.json` | `"parallel"` | DEV+QA at once, or one role at a time |
|
|
|
|
See the [Configuration reference](docs/CONFIGURATION.md) for the full schema.
|
|
|
|
---
|
|
|
|
## Task management
|
|
|
|
### Your issues stay in your tracker
|
|
|
|
DevClaw doesn't have its own task database. All task state lives in **GitHub Issues** or **GitLab Issues** — auto-detected from your git remote. The eight pipeline labels are created on your repo when you register a project. Your project manager sees progress in GitHub without knowing DevClaw exists. Your CI/CD can trigger on label changes. If you stop using DevClaw, your issues and labels stay exactly where they are.
|
|
|
|
The provider is pluggable (`IssueProvider` interface). GitHub and GitLab work today. Jira, Linear, or anything else just needs to implement the same interface.
|
|
|
|
### Creating, updating, and commenting
|
|
|
|
Tasks can come from anywhere — the orchestrator creates them from chat, workers file bugs they discover mid-task, or you create them directly in GitHub/GitLab:
|
|
|
|
```
|
|
You: "Create an issue: fix the broken OAuth redirect"
|
|
Agent: creates issue #43 with label "Planning"
|
|
|
|
You: "Move #43 to To Do"
|
|
Agent: transitions label Planning → To Do
|
|
|
|
You: "Add a comment on #42: needs to handle the edge case for expired tokens"
|
|
Agent: adds comment attributed to "orchestrator"
|
|
```
|
|
|
|
Workers can also comment during work — QA leaves review feedback, DEV posts implementation notes. Every comment carries role attribution so you know who said what.
|
|
|
|
### Custom instructions per project
|
|
|
|
Each project gets instruction files that workers receive with every task they pick up:
|
|
|
|
```
|
|
workspace/projects/roles/
|
|
├── my-webapp/
|
|
│ ├── dev.md "Run npm test before committing. Deploy URL: staging.example.com"
|
|
│ └── qa.md "Check OAuth flow. Verify mobile responsiveness."
|
|
├── my-api/
|
|
│ ├── dev.md "Run cargo test. Follow REST conventions in CONTRIBUTING.md"
|
|
│ └── qa.md "Verify all endpoints return correct status codes."
|
|
└── default/
|
|
├── dev.md (fallback for projects without custom instructions)
|
|
└── qa.md
|
|
```
|
|
|
|
Deployment steps, test commands, coding standards, acceptance criteria — all injected at dispatch time, per project, per role.
|
|
|
|
---
|
|
|
|
## Getting started
|
|
|
|
### Prerequisites
|
|
|
|
- [OpenClaw](https://openclaw.ai) installed (`openclaw --version`)
|
|
- Node.js >= 20
|
|
- `gh` CLI ([GitHub](https://cli.github.com)) or `glab` CLI ([GitLab](https://gitlab.com/gitlab-org/cli)), authenticated
|
|
|
|
### Install
|
|
|
|
```bash
|
|
cp -r devclaw ~/.openclaw/extensions/
|
|
```
|
|
|
|
### Set up through conversation
|
|
|
|
The easiest way to configure DevClaw is to just talk to your agent:
|
|
|
|
```
|
|
You: "Help me set up DevClaw"
|
|
Agent: "I'll walk you through it. Should I use this agent as the
|
|
orchestrator, or create a new one?"
|
|
You: "Use this one"
|
|
|
|
Agent: "Want to bind a messaging channel?"
|
|
You: "Telegram"
|
|
|
|
Agent: "Here are the default developer assignments:
|
|
Junior → Haiku, Medior → Sonnet, Senior → Opus
|
|
Reviewer → Sonnet, Tester → Haiku
|
|
Keep these or customize?"
|
|
You: "Keep them"
|
|
|
|
Agent: "Done. Want to register a project?"
|
|
You: "Yes — my-app at ~/git/my-app, main branch"
|
|
|
|
Agent: "Project registered. 8 labels created on your repo.
|
|
Role instructions scaffolded. Try: 'check the queue'"
|
|
```
|
|
|
|
You can also use the [CLI wizard or non-interactive setup](docs/ONBOARDING.md#step-2-run-setup) for scripted environments.
|
|
|
|
---
|
|
|
|
## The toolbox
|
|
|
|
DevClaw gives the orchestrator 11 tools. These aren't just convenience wrappers — they're **guardrails**. Each tool encodes a complex multi-step operation into a single atomic call. The agent provides intent, the plugin handles mechanics. The agent physically cannot skip a label transition, forget to update state, or dispatch to the wrong session — those decisions are made by deterministic code, not LLM reasoning.
|
|
|
|
| Tool | What it does |
|
|
|---|---|
|
|
| `work_start` | Pick up a task — resolves level, transitions label, dispatches session, logs audit |
|
|
| `work_finish` | Complete a task — transitions label, updates state, auto-chains next step, ticks queue |
|
|
| `task_create` | Create a new issue (used by workers to file bugs they discover) |
|
|
| `task_update` | Manually change an issue's state label |
|
|
| `task_comment` | Add a comment to an issue (with role attribution) |
|
|
| `status` | Dashboard: queue counts + who's working on what |
|
|
| `health` | Detect zombie workers, stale sessions, state inconsistencies |
|
|
| `work_heartbeat` | Manually trigger a health check + queue dispatch cycle |
|
|
| `project_register` | One-time project setup: creates labels, scaffolds instructions, initializes state |
|
|
| `setup` | Agent + workspace initialization |
|
|
| `onboard` | Conversational setup guide |
|
|
|
|
Full parameters and usage in the [Tools Reference](docs/TOOLS.md).
|
|
|
|
---
|
|
|
|
## Documentation
|
|
|
|
| | |
|
|
|---|---|
|
|
| **[Architecture](docs/ARCHITECTURE.md)** | System design, session model, data flow, end-to-end diagrams |
|
|
| **[Tools Reference](docs/TOOLS.md)** | Complete reference for all 11 tools |
|
|
| **[Configuration](docs/CONFIGURATION.md)** | `openclaw.json`, `projects.json`, heartbeat, notifications |
|
|
| **[Onboarding Guide](docs/ONBOARDING.md)** | Full step-by-step setup |
|
|
| **[QA Workflow](docs/QA_WORKFLOW.md)** | QA process and review templates |
|
|
| **[Context Awareness](docs/CONTEXT-AWARENESS.md)** | How tools adapt to group vs. DM vs. agent context |
|
|
| **[Testing](docs/TESTING.md)** | Test suite, fixtures, CI/CD |
|
|
| **[Management Theory](docs/MANAGEMENT.md)** | The delegation model behind the design |
|
|
| **[Roadmap](docs/ROADMAP.md)** | What's coming next |
|
|
|
|
---
|
|
|
|
## License
|
|
|
|
MIT
|