Go to file

Lauren ten Hoor 2893ba0507 research: document bootstrap hooks for context injection (#181 )

Comprehensive investigation of OpenClaw-native alternatives to the
file-read-network pattern in dispatch.ts that triggers security audits.

Key Findings:
- Bootstrap hooks are the recommended solution
- Purpose-built for dynamic workspace file injection
- Plugin-only implementation (no core changes needed)
- Eliminates audit false positive

Deliverables:
- Full research document with pros/cons analysis
- PoC code demonstrating implementation approach
- Migration checklist and testing plan
- Decision matrix comparing alternatives

Recommendation: Implement agent:bootstrap hook to inject role
instructions at system prompt construction time instead of appending
to task message payload.

Addresses issue #181

2026-02-14 14:03:14 +08:00

.github/workflows

fix: update package name and URLs to reflect new ownership

2026-02-12 00:32:35 +08:00

assets

feat: update README to include new DevClaw logo and adjust formatting

2026-02-09 19:27:40 +08:00

docs

research: document bootstrap hooks for context injection (#181 )

2026-02-14 14:03:14 +08:00

lib

chore: Add clarifying comment for security audit false positive in dispatch.ts

2026-02-14 13:53:59 +08:00

.gitignore

feat: LLM-powered model auto-configuration and improved onboarding

2026-02-12 20:37:15 +08:00

.npmignore

refactor: reorganize task management imports and update task handling tools

2026-02-10 21:39:41 +08:00

AGENTS.md

feat: implement runCommand wrapper and refactor command executions across modules

2026-02-13 10:50:35 +08:00

CHANGELOG.md

Release v1.1.0 — Security hardening, session resilience, heartbeat fixes

2026-02-13 10:57:21 +08:00

index.ts

feat: implement runCommand wrapper and refactor command executions across modules

2026-02-13 10:50:35 +08:00

LICENSE

Initial commit: DevClaw OpenClaw plugin

2026-02-08 15:26:29 +08:00

openclaw.plugin.json

refactor: remove context awareness documentation and related code; streamline tool registration and context detection

2026-02-12 00:25:34 +08:00

package-lock.json

Release v1.1.0 — Security hardening, session resilience, heartbeat fixes

2026-02-13 10:57:21 +08:00

package.json

chore: bump version to 1.2.1

2026-02-14 14:00:42 +08:00

PUBLISHING.md

fix: update installation commands to reflect new package name

2026-02-12 00:42:15 +08:00

README.md

docs: remove outdated auto-tick references from README.md (#174 ) (#176 )

2026-02-13 20:57:28 +08:00

tsconfig.json

fix: update TypeScript config for proper openclaw plugin-sdk type resolution

2026-02-12 20:47:25 +08:00

README.md

DevClaw — Development Plugin for OpenClaw

Turn any group chat into a dev team that ships.

DevClaw is a plugin for OpenClaw that turns your orchestrator agent into a development manager. It hires developers, assigns tasks, reviews code, and keeps the pipeline moving — across as many projects as you have group chats.

Prerequisites: OpenClaw must be installed and running.

openclaw plugins install @laurentenhoor/devclaw

Then start onboarding by chatting with your agent in any channel:

"Hey, can you help me set up DevClaw?"

What it looks like

You have two projects in two Telegram groups. You go to bed. You wake up:

── Group: "Dev - My Webapp" ──────────────────────────────

Agent:  "⚡ Sending DEV (medior) for #42: Add login page"
Agent:  "✅ DEV DONE #42 — Login page with OAuth. Moved to QA."
Agent:  "🔍 Sending QA (reviewer) for #42: Add login page"
Agent:  "🎉 QA PASS #42. Issue closed."
Agent:  "⚡ Sending DEV (junior) for #43: Fix button color on /settings"
Agent:  "✅ DEV DONE #43 — Updated to brand blue. Moved to QA."
Agent:  "❌ QA FAIL #43 — Color doesn't match dark mode. Back to DEV."
Agent:  "⚡ Sending DEV (junior) for #43: Fix button color on /settings"

  You:  "Create an issue for refactoring the profile page, pick it up."

Agent:  created #44 "Refactor user profile page" on GitHub — To Do
Agent:  "⚡ Sending DEV (medior) for #44: Refactor user profile page"

Agent:  "✅ DEV DONE #43 — Fixed dark-mode color. Back to QA."
Agent:  "🎉 QA PASS #43. Issue closed."

── Group: "Dev - My API" ─────────────────────────────────

Agent:  "🧠 Spawning DEV (senior) for #18: Migrate auth to OAuth2"
Agent:  "✅ DEV DONE #18 — OAuth2 provider with refresh tokens. Moved to QA."
Agent:  "🎉 QA PASS #18. Issue closed."
Agent:  "⚡ Sending DEV (medior) for #19: Add rate limiting to /api/search"

Multiple issues shipped, a QA failure automatically retried, and a second project's migration completed — all while you slept. When you dropped in mid-stream to create an issue, the scheduler kept going before, during, and after.

Why DevClaw

Autonomous multi-project development

Each project is fully isolated — own queue, workers, sessions, and state. DEV and QA execute in parallel within each project, and multiple projects run simultaneously. A token-free scheduling engine drives it all autonomously:

Scheduling engine — work_heartbeat continuously scans queues, dispatches workers, and drives DEV → QA → DEV feedback loops
Project isolation — parallel workers per project, parallel projects across the system
Role instructions — per-project, per-role prompts injected at dispatch time

Process enforcement

GitHub/GitLab issues are the single source of truth — not an internal database. Every tool call wraps the full operation into deterministic code with rollback on failure:

External task state — labels, transitions, and status queries go through your issue tracker
Atomic operations — label transition + state update + session dispatch + audit log in one call
Tool-based guardrails — 11 tools enforce the process; the agent provides intent, the plugin handles mechanics

~60-80% token savings

Three mechanisms compound to cut token usage dramatically versus running one large model with fresh context each time:

Tier selection — Haiku for typos, Sonnet for features, Opus for architecture (~30-50% on simple tasks)
Session reuse — workers accumulate codebase knowledge across tasks (~40-60% per task)
Token-free scheduling — work_heartbeat runs on pure CLI calls, zero LLM tokens for orchestration

The problem DevClaw solves

OpenClaw is a great multi-agent runtime. It handles sessions, tools, channels, gateway RPC — everything you need to run AI agents. But it's a general-purpose platform. It has no opinion about how software gets built.

Without DevClaw, your orchestrator agent has to figure out on its own how to:

Pick the right model for the task complexity
Create or reuse the right worker session
Transition issue labels in the right order
Track which worker is doing what across projects
Schedule QA after DEV completes, and re-schedule DEV after QA fails
Detect crashed workers and recover
Log everything for auditability

That's a lot of reasoning per task. LLMs do it imperfectly — they forget steps, corrupt state, pick the wrong model, lose session references. You end up babysitting the thing you built to avoid babysitting.

DevClaw moves all of that into deterministic plugin code. The agent says "pick up issue #42." The plugin handles the other 10 steps atomically. Every time, the same way, zero reasoning tokens spent on orchestration.

Meet your team

DevClaw doesn't think in model IDs. It thinks in people.

When a task comes in, you don't configure anthropic/claude-sonnet-4-5 — you assign a medior developer. The orchestrator evaluates task complexity and picks the right person for the job:

Developers

Level	Assigns to	Model
Junior	Typos, CSS fixes, renames, single-file changes	Haiku
Medior	Features, bug fixes, multi-file changes	Sonnet
Senior	Architecture, migrations, system-wide refactoring	Opus

QA

Level	Assigns to	Model
Reviewer	Code review, test validation, PR inspection	Sonnet
Tester	Manual testing, smoke tests	Haiku

A CSS typo gets the intern. A database migration gets the architect. You're not burning Opus tokens on a color change, and you're not sending Haiku to redesign your auth system.

Every mapping is configurable — swap in any model you want per level.

How a task moves through the pipeline

Every issue follows the same path, no exceptions. DevClaw enforces it:

Planning → To Do → Doing → To Test → Testing → Done

stateDiagram-v2
    [*] --> Planning
    Planning --> ToDo: Ready for development

    ToDo --> Doing: DEV picks up
    Doing --> ToTest: DEV done

    ToTest --> Testing: Scheduler picks up QA
    Testing --> Done: QA pass (issue closed)
    Testing --> ToImprove: QA fail (back to DEV)
    Testing --> Refining: QA needs human input

    ToImprove --> Doing: Scheduler picks up DEV fix
    Refining --> ToDo: Human decides

    Done --> [*]

These labels live on your actual GitHub/GitLab issues. Not in some internal database — in the tool you already use. Filter by Doing in GitHub to see what's in progress. Set up a webhook on Done to trigger deploys. The issue tracker is the source of truth.

What "atomic" means here

When you say "pick up #42 for DEV", the plugin does all of this in one operation:

Verifies the issue is in the right state
Picks the developer level (or uses what you specified)
Transitions the label (To Do → Doing)
Creates or reuses the right worker session
Dispatches the task with project-specific instructions
Updates internal state
Logs an audit entry

If step 4 fails, step 3 is rolled back. No half-states, no orphaned labels, no "the issue says Doing but nobody's working on it."

What happens behind the scenes

Workers report back themselves

When a developer finishes, they call work_finish directly — no orchestrator involved:

DEV "done" → label moves to To Test, scheduler picks up QA on next tick
DEV "blocked" → label moves back to To Do, task returns to queue
QA "pass" → label moves to Done, issue closes
QA "fail" → label moves to To Improve, scheduler picks up DEV on next tick

The orchestrator doesn't need to poll, check, or coordinate. Workers are self-reporting.

Sessions accumulate context

Each developer level gets its own persistent session per project. Your medior dev that's done 5 features on my-app already knows the codebase — it doesn't re-read 50K tokens of source code every time it picks up a new task.

That's a ~40-60% token saving per task from session reuse alone.

Combined with tier selection (not using Opus when Haiku will do) and the token-free heartbeat (more on that next), DevClaw significantly reduces your token bill versus running everything through one large model.

Everything is logged

Every tool call writes an NDJSON line to audit.log:

cat audit.log | jq 'select(.event=="work_start")'

Full trace of every task, every level selection, every label transition, every health fix. No manual logging needed.

Automatic scheduling

DevClaw doesn't wait for you to tell it what to do next. A background scheduling system continuously scans for available work and dispatches workers — zero LLM tokens, pure deterministic code. This is the engine that keeps the pipeline moving: when DEV finishes, the scheduler sees a To Test issue and dispatches QA. When QA fails, the scheduler sees a To Improve issue and dispatches DEV. No hand-offs, no orchestrator reasoning — just label-driven scheduling.

The `work_heartbeat`

Every tick (default: 60 seconds), the scheduler runs two passes:

Health pass — detects workers stuck for >2 hours, reverts their labels back to queue, deactivates them. Catches crashed sessions, context overflows, or workers that died without reporting back.
Queue pass — scans for available tasks by priority (To Improve > To Test > To Do), fills free worker slots. DEV and QA slots are filled independently.

All CLI calls and JSON reads. Workers only consume tokens when they actually start coding or reviewing. The heartbeat scheduler runs at regular intervals to pick up new tasks.

How tasks flow between roles

When a worker calls work_finish, the plugin transitions the label. The scheduler picks up the rest:

DEV "done" → label moves to To Test → next tick dispatches QA
QA "fail" → label moves to To Improve → next tick dispatches DEV (reuses previous level)
QA "pass" → label moves to Done, issue closes
"blocked" → label reverts to queue (To Do or To Test) for retry

No orchestrator involvement. Workers self-report, the scheduler fills free slots.

Execution modes

Each project is fully isolated — its own queue, workers, sessions, state. No cross-project contamination. Two levels of parallelism control how work gets scheduled:

Project-level (roleExecution) — DEV and QA work simultaneously on different tasks (default: parallel) or take turns (sequential)
Plugin-level (projectExecution) — all registered projects dispatch workers independently (default: parallel) or only one project runs at a time (sequential)

Configuration

All scheduling behavior is configurable in openclaw.json:

{
  "plugins": {
    "entries": {
      "devclaw": {
        "config": {
          "work_heartbeat": {
            "enabled": true,
            "intervalSeconds": 60,
            "maxPickupsPerTick": 4
          },
          "projectExecution": "parallel"
        }
      }
    }
  }
}

Per-project settings live in projects.json:

{
  "-1234567890": {
    "name": "my-app",
    "roleExecution": "parallel"
  }
}

Setting	Where	Default	What it controls
`work_heartbeat.enabled`	`openclaw.json`	`true`	Turn the heartbeat on/off
`work_heartbeat.intervalSeconds`	`openclaw.json`	`60`	Seconds between ticks
`work_heartbeat.maxPickupsPerTick`	`openclaw.json`	`4`	Max workers dispatched per tick
`projectExecution`	`openclaw.json`	`"parallel"`	All projects at once, or one at a time
`roleExecution`	`projects.json`	`"parallel"`	DEV+QA at once, or one role at a time

See the Configuration reference for the full schema.

Task management

Your issues stay in your tracker

DevClaw doesn't have its own task database. All task state lives in GitHub Issues or GitLab Issues — auto-detected from your git remote. The eight pipeline labels are created on your repo when you register a project. Your project manager sees progress in GitHub without knowing DevClaw exists. Your CI/CD can trigger on label changes. If you stop using DevClaw, your issues and labels stay exactly where they are.

The provider is pluggable (IssueProvider interface). GitHub and GitLab work today. Jira, Linear, or anything else just needs to implement the same interface.

Creating, updating, and commenting

Tasks can come from anywhere — the orchestrator creates them from chat, workers file bugs they discover mid-task, or you create them directly in GitHub/GitLab:

You:    "Create an issue: fix the broken OAuth redirect"
Agent:  creates issue #43 with label "Planning"

You:    "Move #43 to To Do"
Agent:  transitions label Planning → To Do

You:    "Add a comment on #42: needs to handle the edge case for expired tokens"
Agent:  adds comment attributed to "orchestrator"

Workers can also comment during work — QA leaves review feedback, DEV posts implementation notes. Every comment carries role attribution so you know who said what.

Custom instructions per project

Each project gets instruction files that workers receive with every task they pick up:

workspace/projects/roles/
├── my-webapp/
│   ├── dev.md     "Run npm test before committing. Deploy URL: staging.example.com"
│   └── qa.md      "Check OAuth flow. Verify mobile responsiveness."
├── my-api/
│   ├── dev.md     "Run cargo test. Follow REST conventions in CONTRIBUTING.md"
│   └── qa.md      "Verify all endpoints return correct status codes."
└── default/
    ├── dev.md     (fallback for projects without custom instructions)
    └── qa.md

Deployment steps, test commands, coding standards, acceptance criteria — all injected at dispatch time, per project, per role.

The orchestrator's role

The orchestrator is a planner and dispatcher — not a coder. This separation is intentional and enforced.

What the orchestrator does

Plans: Analyzes requirements, breaks down work, decides priorities
Dispatches: Creates issues, assigns developer levels, starts workers
Coordinates: Monitors queue, handles status checks, answers questions
Reads: Can inspect code to understand context (but never writes)

What goes through workers

All implementation work flows through the issue → worker pipeline:

Action	Goes through worker?	Why
Writing or editing code	✅ Yes	Audit trail, tier selection
Git operations (commits, branches, PRs)	✅ Yes	Workers own their worktrees
Running tests	✅ Yes	Part of the dev/QA workflow
Fixing bugs	✅ Yes	Even quick fixes need tracking
Refactoring	✅ Yes	Sonnet/Opus for complexity
Reading code to answer questions	❌ No	Orchestrator can read
Creating issues	❌ No	Orchestrator's job
Status checks	❌ No	Orchestrator's job
Architecture discussions	❌ No	Orchestrator's job

Why this boundary exists

Audit trail — Every code change links to an issue. You can trace any line of code back to a tracked task.
Right model for the job — A typo fix uses Haiku (~~$0.001). A migration uses Opus (~~$0.20). Without tier selection, you're either overpaying or underperforming on every task.
Parallelization — While workers code, the orchestrator stays free to handle new requests, answer questions, create more issues. No bottleneck.
QA pipeline — Code goes through review before merging. Skip the worker pipeline, skip QA.
Session reuse — Workers accumulate codebase context over multiple tasks. The orchestrator starting fresh every time wastes tokens.

The orchestrator saying "I'll just make this quick fix myself" is like a manager saying "I'll just write that feature instead of assigning it." Technically possible, but it breaks the system that makes everything else work.

Getting started

Prerequisites

OpenClaw installed (openclaw --version)
Node.js >= 20
gh CLI (GitHub) or glab CLI (GitLab), authenticated

Install

openclaw plugins install @laurentenhoor/devclaw

Or for local development:

openclaw plugins install -l ./devclaw

Start onboarding:

openclaw chat "Help me set up DevClaw"

Set up through conversation

The easiest way to configure DevClaw is to just talk to your agent:

You:   "Help me set up DevClaw"
Agent: "I'll walk you through it. Should I use this agent as the
        orchestrator, or create a new one?"
You:   "Use this one"

Agent: "Want to bind a messaging channel?"
You:   "Telegram"

Agent: "Here are the default developer assignments:
        Junior → Haiku, Medior → Sonnet, Senior → Opus
        Reviewer → Sonnet, Tester → Haiku
        Keep these or customize?"
You:   "Keep them"

Agent: "Done. Want to register a project?"
You:   "Yes — my-app at ~/git/my-app, main branch"

Agent: "Project registered. 8 labels created on your repo.
        Role instructions scaffolded. Try: 'check the queue'"

You can also use the CLI wizard or non-interactive setup for scripted environments.

The toolbox

DevClaw gives the orchestrator 11 tools. These aren't just convenience wrappers — they're guardrails. Each tool encodes a complex multi-step operation into a single atomic call. The agent provides intent, the plugin handles mechanics. The agent physically cannot skip a label transition, forget to update state, or dispatch to the wrong session — those decisions are made by deterministic code, not LLM reasoning.

Tool	What it does
`work_start`	Pick up a task — resolves level, transitions label, dispatches session, logs audit
`work_finish`	Complete a task — transitions label, updates state, closes/reopens issue
`task_create`	Create a new issue (used by workers to file bugs they discover)
`task_update`	Manually change an issue's state label
`task_comment`	Add a comment to an issue (with role attribution)
`status`	Dashboard: queue counts + who's working on what
`health`	Detect zombie workers, stale sessions, state inconsistencies
`work_heartbeat`	Manually trigger a health check + queue dispatch cycle
`project_register`	One-time project setup: creates labels, scaffolds instructions, initializes state
`setup`	Agent + workspace initialization
`onboard`	Conversational setup guide

Full parameters and usage in the Tools Reference.

Documentation


Architecture	System design, session model, data flow, end-to-end diagrams
Tools Reference	Complete reference for all 11 tools
Configuration	`openclaw.json`, `projects.json`, heartbeat, notifications
Onboarding Guide	Full step-by-step setup
QA Workflow	QA process and review templates
Testing	Test suite, fixtures, CI/CD
Management Theory	The delegation model behind the design
Roadmap	What's coming next

License

MIT

README.md

DevClaw — Development Plugin for OpenClaw

What it looks like

Why DevClaw

Autonomous multi-project development

Process enforcement

~60-80% token savings

The problem DevClaw solves

Meet your team

Developers

QA

How a task moves through the pipeline

What "atomic" means here

What happens behind the scenes

Workers report back themselves

Sessions accumulate context

Everything is logged

Automatic scheduling

The work_heartbeat

How tasks flow between roles

Execution modes

Configuration

Task management

Your issues stay in your tracker

Creating, updating, and commenting

Custom instructions per project

The orchestrator's role

What the orchestrator does

What goes through workers

Why this boundary exists

Getting started

Prerequisites

Install

Set up through conversation

The toolbox

Documentation

License

The `work_heartbeat`