DevBoy tools

CI Codecov License npm Ask Zread

A research-driven tool bundle for AI coding agents. A single curated set of dev-workflow tools (GitHub, GitLab, Jira, ClickUp, Confluence, Slack, Fireflies) reachable from any agent — Claude Code, Copilot CLI, Codex, Cursor, Kimi, Gemini, … — through three transports: MCP server, CLI, or installable agent skills. Output goes through a token-aware pipeline that compresses responses by 26–69% per call on the data-shape-friendly endpoints it targets (issues, pipelines, large lists) — see paper 2 measurements on a 144k-event production corpus.

npm install -g @devboy-tools/cli   # binary for your platform
devboy onboard                     # detects your AI agent, installs the right skills

That's it. Verify with devboy doctor.


Why DevBoy

DevBoy isn't another aggregator with a long tool list. Four things that aren't standard elsewhere:

  • Research-driven. Every optimisation in the pipeline traces back to a paper grounded in a real corpus — 523 Claude Code sessions, 10,644 MCP tool responses. We don't ship a heuristic without measuring it. See research index.
  • Three transports, one bundle. The exact same tool set is reachable as MCP server, CLI for humans / CI, or as agent skills that call individual tools. Pick whichever fits today; layer the rest later.
  • Privacy by default. Tokens live in OS keychain (macOS Keychain / Windows Credential Manager / Linux Secret Service) with env-var fallback for CI. No cloud round-trip just to authenticate.
  • Multi-project context. One server, many project contexts (different GitHub/GitLab/Jira combinations), instant switch — no respawn, no config edit. Concrete: devboy context use dashboard and the same MCP session now talks to a different project's APIs.
Generic MCP aggregatorDevBoy
Output efficiencyDefault API JSONKnapsack-based pagination + format-adaptive encoding (papers 1, 2) — measured per-call savings on the 144k-event corpus: 69% avg on get_issues, 92% top per call, 26% avg on *_pipeline. KV-cache pass on Sonnet 4.5 lifts ~40% of tokens off the input side.
Tool catalogueStaticDynamic per-project, with provider enrichers (custom field params, enum hints)
TransportMCP onlyMCP, CLI, or agent skills — same tools
OnboardingManual config + per-agent installdevboy onboard autodetects and bundles
CredentialsCloud / config filesOS keychain, env vars only as fallback
ExtensibilityForkingPlugin system (Rust today; WASM, TypeScript planned)

Quick start (60 seconds)

# 1. Install
npm install -g @devboy-tools/cli

# 2. Bootstrap — picks your agent + installs a curated skill bundle
devboy onboard

# 3. Configure your first provider (interactive)
devboy init

# 4. Verify
devboy doctor

After this devboy issues returns your open tickets, your agent has the relevant skills loaded, and the MCP server is registered with whatever client you use.

Install via plugin (Claude Code / Codex CLI)

If you live inside Claude Code or Codex CLI, skip the npm step entirely.

Claude Code:

/plugin marketplace add meteora-pro/devboy-tools
/plugin install devboy@meteora-devboy

Codex CLI — reads the same .claude-plugin/marketplace.json (one of the four official marketplace sources), so the install is symmetric:

codex plugin marketplace add meteora-pro/devboy-tools
codex plugin install devboy@meteora-devboy

Either way, the bundled setup skill installs the devboy CLI on first use (npm install -g with a SHA-256-verified GitHub Release tarball as fallback), wires up the MCP server, and runs devboy onboard. After the binary lands, run /reload-plugins (Claude Code) or restart your Codex session once.

OpenCode and Kimi CLI users get the same skills for free — both auto-read ~/.claude/skills/, so installing the Claude Code plugin or running devboy onboard covers them too. See the per-agent guides (Claude Code, Codex) and ADR-018 for the architecture.

If you'd rather pick everything by hand:

Manual install / configuration
# Configure GitHub (replace gitlab/clickup/jira similarly)
devboy config set github.owner meteora-pro
devboy config set github.repo devboy-tools
devboy config set-secret github.token <token>     # → OS keychain

# Or via env vars (CI / Docker — keychain unavailable)
export DEVBOY_GITHUB_TOKEN=ghp_...
# Compatibility: GITHUB_TOKEN is read too

# Pick skills explicitly instead of using a profile
devboy skills list
devboy skills install review-mr --agent claude
devboy skills install --all --agent all

Build from source:

git clone https://github.com/meteora-pro/devboy-tools.git
cd devboy-tools && cargo build --release
./target/release/devboy --version

Skills & onboarding

DevBoy ships a catalogue of skills — one-page Markdown recipes that tell an AI agent how to use the bundle to accomplish a common task. Skills are CLI-first (devboy tools call <name> under the hood), agent-agnostic (Claude Code / Codex / Cursor / Kimi or a vendor-neutral path), and versioned with the binary.

devboy onboard is the fastest path: it scans ~/.claude/, ~/.copilot/, ~/.codex/, ~/.kimi/, Cursor's storage, ~/.gemini/, and ~/.gemini/antigravity/, scores each agent on freshness × volume (recency wins ties), and installs a profile-specific bundle.

devboy onboard                          # auto-detect + install `dev` bundle
devboy onboard --profile pm             # PM bundle (issues + meetings + chat)
devboy onboard --profile oncall         # diagnostics + notifications
devboy onboard --agent kimi --yes       # explicit agent + non-interactive
devboy agents list                      # show all detected agents with score

Three profiles ship today; categories below cover the full catalogue.

CategorySkills
self-bootstrapsetup, repair, tools-catalog, pipeline-tune
issue-trackingget-issues, create-issue, update-issue, link-issues, solve-issue
code-reviewreview-mr, fix-review-comments, self-review
self-feedbackrun-and-verify, daily-report, retro, knowledge-extract, qa-sweep, analyze-usage
meeting-notesmeeting-search, meeting-transcript, meeting-to-tasks
messengerchat-search, chat-summary, notify

Skill installs keep a per-location manifest with SHA-256s so upgrades leave user-modified files alone (ADR-014). Self-feedback skills read session traces from .devboy/sessions/ (ADR-015).

analyze-usage is a featured skill that ships in two parts: a thin baseline (one Markdown file, embedded in the binary) plus a heavier Python backend (~1 MB, sparse-checked-out via curl on first use). It produces graphic monthly / weekly digests of how your AI sessions actually went — biome aquariums, 8-archetype bars, DORA radar, friction markers — plus shareable anonymised parquet bundles. See ./.claude/skills/analyze-usage/.


Three integration modes

The same tool set, three transports — pick what your workflow already uses.

ModeWhen to useExample
MCP serverClaude Desktop, Claude Code, any MCP-compatible clientdevboy mcp (stdio)
CLIHumans at the terminal, CI jobs, shell scriptsdevboy issues, devboy mrs, devboy tools call get_issues '{"limit": 20}'
Agent skillsAgents that don't want the full MCP tool-list tax — call only the tools a skill needsdevboy tools call get_issues from inside a skill script

JSON arguments tip. devboy tools call <name> takes an optional positional JSON string (defaults to {}). POSIX shells: wrap in single quotes. Windows cmd.exe/PowerShell: escape inner quotes — devboy tools call get_issues "{\"limit\": 20}".

Claude Code

The fastest way to get started is devboy onboard — it auto-detects which AI agent you actively use (by scanning ~/.claude/, ~/.copilot/, ~/.codex/, ~/.kimi/, Cursor's storage, ~/.gemini/, ~/.gemini/antigravity/) and installs a curated skill bundle for that agent:

devboy onboard                              # detect primary agent, install the `dev` bundle
devboy onboard --profile pm                 # PM bundle (issue tracking + meetings + messenger)
devboy onboard --profile oncall             # on-call bundle (diagnostics + notifications)
devboy onboard --agent kimi --yes           # explicit agent + non-interactive (CI / dotfiles)
devboy agents list                          # show all detected agents with sessions / last-used / score

If you'd rather pick skills by hand:

claude mcp add devboy -- devboy mcp
claude mcp list

Claude Desktop

~/Library/Application Support/Claude/claude_desktop_config.json:

{ "mcpServers": { "devboy": { "command": "devboy", "args": ["mcp"] } } }

For Codex / Cursor / Kimi / Copilot CLI / Gemini CLI / Antigravity — devboy onboard autoconfigures the MCP entry; or follow the agent's docs for adding a stdio MCP server pointing at devboy mcp.


Providers

Seven provider plugins ship today — each with a dedicated client + schema enricher so the tool list adapts to your project's actual fields (custom fields, enum values, status taxonomies):

ProviderCrateWhat you get
GitHubdevboy-githubIssues, pull requests, comments, branches, repos
GitLabdevboy-gitlabIssues, merge requests, discussions, pipelines, MR diffs
Jiradevboy-jiraIssues with custom-field metadata, sprints, transitions, project versions (releases)
ClickUpdevboy-clickupTasks, custom fields, lists, custom task IDs
Confluencedevboy-confluenceKnowledge-base pages, search, spaces, create / update with labels (Server / Data Center, v1 + v2 API)
Slackdevboy-slackChat search, channel summary, post message
Firefliesdevboy-firefliesMeeting transcripts, search, action items

Adding a provider is a Rust crate implementing Provider + a ToolEnricher (ADR-007).


Secret management

Provider tokens, deploy keys, API tokens — devboy-tools ships a first-class secret framework so values never sit in plaintext config files and AI agents never see raw values.

  • Manifest-driven — projects declare required and optional secrets in .devboy/secrets.toml (ADR-020). The merged inventory is the source of truth for secrets list, doctor, and the inventory view.
  • Pluggable sources — keychain, local-vault, 1Password, Vault (HTTP KV v2), env-store ship in-tree; community plugins extend the set via a stdio JSON-RPC subprocess protocol (ADR-021).
  • Native UI — TUI on ratatui + GUI on egui sharing one view-model layer; backend autodetected from $DISPLAY/$WAYLAND_DISPLAY (ADR-023 §3.4).
  • Agent-safe by construction — MCP secrets_* tools return metadata only. Trust boundary is enforced by a marker trait, a CI grep gate, and a sentinel-based negative test (ADR-023 §3.7).
  • Three deployment modes — desktop (OS keychain), team (local-vault), CI (env-store). End-to-end smoke tests cover all three on every PR.

Four guides for the framework live under docs/guide/secrets/:

devboy secrets list                # see what the active context expects
devboy doctor --secrets            # is everything provisioned?
devboy secrets ui                  # TUI/GUI inventory
devboy secrets rotate <path>       # opens provider URL + prompts for new value

For AI-driven setup, point the agent at the setup-secrets skill — it walks the eight-step wizard with a state file at ~/.devboy/secrets/setup-state.toml.


Research

Every non-trivial optimisation in the pipeline is backed by a paper grounded in a real corpus — 523 Claude Code sessions, 10,644 MCP responses from production traffic. The full docs/research/INDEX.md tracks methods, datasets, and reproducibility scripts.

#PaperStatusHeadline result
1TrimTree: priority-driven pagination — binary knapsack within a token budget, p₁ metricdraft (820-line full draft, all experiments complete)3.3× p₁ vs uniform on power-law data; FIFO baseline 35% replicated across 3 corpora; KV-cache pass on Sonnet 4.5 ≈ 40% input-side savings (66.5% hit rate)
2Format-adaptive tree encoding — multi-choice knapsack picking CSV / table / key:value per subtreedraftPer-call savings on the corpus: avg 69% on get_issues (top 92%), avg 26% on *_pipeline; ≥ 20% bucket hits 1.25% of all events but most calls of the shape-friendly endpoints
3 (theory · implementation)Context Enrichment Hypothesis + tool-aware knapsack with provider value modelsdraft (prefetch dispatcher merged in v0.22; production telemetry pending)Pearson r = −0.280 between chars_per_item and follow-up enrichment calls; thin issues (< 200 chars/item) → 43% of turns add a get_issue; rich (1.5 k–4 k) → 2%
4Dataset-as-context — large responses become queryable Parquet artefacts the LLM pulls fromdraft (early concept, no measurements yet)Hypothesised 60–80% additional savings on top of TrimTree; evaluation harness not yet built

Other corpus baselines used across papers (the 523 Claude Code sessions / 10,644 MCP-response sample, paper 1 §B):

  • get_merge_request_diffs: P90 = 35 k chars ≈ 10 k tokens — 28% of responses exceed an 8 k-token budget
  • get_epics: P90 = 43 k chars ≈ 12 k tokens — 37% exceed budget
  • After overflow, agents always produce a text response on the next turn — they never retry / paginate (paper 1 §3, paper-1-trimtree.md:30 and §C)

Paper 3's prefetch dispatcher already runs in the format pipeline; papers 1 and 2 land in the next minor version. Paper 4 is at concept stage — no production code yet.


Architecture

Crate layout
crates/
├── devboy-core/        Traits (Provider, ToolEnricher), shared types, config
├── devboy-executor/    Tool execution engine + enrichment pipeline
├── devboy-mcp/         MCP server (JSON-RPC over stdio)
├── devboy-cli/         CLI binary (`devboy`)
├── devboy-skills/      Skill catalogue, install/upgrade, manifests, traces
├── devboy-storage/     Credential storage (keychain, env vars)
├── devboy-assets/      File attachments (ADR-010)
└── plugins/
    ├── api/            { github, gitlab, jira, clickup, slack, fireflies }
    └── format-pipeline The token-aware output pipeline (papers 1, 2, 3)
Multi-project contexts

One server, many contexts. Each context is its own provider config bundle:

┌─ DevBoy MCP / CLI ────────────────┐
│  context: devboy-tools             │
│    ├── GitHub: meteora-pro/devboy  │
│    └── Slack: #devboy              │
│  context: dashboard                │
│    ├── GitLab: project #42         │
│    ├── ClickUp: list abc123        │
│    └── Jira: DEV                   │
└────────────────────────────────────┘

Switch with devboy context use <name> (CLI) or the use_context tool (MCP). No respawn — the active session re-reads the new bindings on the next call.

Executor + enricher pipeline
Tool call → Executor
  1. Enrichers transform args   (e.g. cf_story_points → customFields)
  2. Provider factory builds the client from ProviderConfig
  3. Provider executes API calls → typed ToolOutput
  4. Format pipeline encodes output → text (markdown / compact / json)

Three enricher categories, single ToolEnricher trait:

  • Provider enrichers — adapt schemas per provider (drop unsupported params, surface custom-field params, populate enums from project metadata).
  • Pipeline enrichers — add output-control parameters (format enum, pagination knobs).
  • Custom enrichers — third-party plugins.

Architecture details: executor, enrichers, format pipeline.


Documentation map


Use as a library

Beyond the CLI, the workspace ships library crates on crates.io — embed devboy components directly in a Rust project. The catalogue covers the foundation, credential storage, the format pipeline, every API provider, the MCP server, the skills subsystem, and the CLI binary.

CrateDescriptionCrates.ioDocs
devboy-coreProvider traits, unified types, configuration, errorsCrates.ioDocs
devboy-storageOS-keychain credential storage with SecretString plumbingCrates.ioDocs
devboy-assetsOn-disk asset cache with LRU rotation (ADR-010)Crates.ioDocs
devboy-format-pipelineTOON encoding, MCKP-budget trimming, cursor paginationCrates.ioDocs
devboy-gitlabGitLab provider (issues, merge requests)Crates.ioDocs
devboy-githubGitHub provider (issues, pull requests)Crates.ioDocs
devboy-jiraJira provider (issues, project versions)Crates.ioDocs
devboy-clickupClickUp providerCrates.ioDocs
devboy-confluenceConfluence (self-hosted) providerCrates.ioDocs
devboy-firefliesFireflies meeting transcriptsCrates.ioDocs
devboy-slackSlack providerCrates.ioDocs
devboy-executorTool execution engine + provider factoryCrates.ioDocs
devboy-mcpMCP server (JSON-RPC 2.0 over stdio)Crates.ioDocs
devboy-skillsSkills subsystem (SKILL.md parser, install lifecycle)Crates.ioDocs
devboy-cliThe devboy CLI binary (npm is the primary channel)Crates.ioDocs

Example — embed a single provider:

[dependencies]
devboy-core = "0.26"
devboy-jira = "0.26"
// Illustrative — not run automatically.
use devboy_core::{Config, IssueProvider};
use devboy_jira::JiraClient;
use secrecy::SecretString;

async fn embed() -> anyhow::Result<()> {
    let cfg = Config::load()?;
    let jira_cfg = cfg.jira.expect("jira section missing in .devboy.toml");

    // In a real devboy setup the token comes from the OS keychain via
    // `devboy_storage::ChainStore`; for an embedded host pass any
    // `SecretString` source you trust.
    let token: SecretString = std::env::var("JIRA_TOKEN")?.into();

    let client = JiraClient::new(
        jira_cfg.url,
        jira_cfg.project_key,
        jira_cfg.email,
        token,
    );

    let issue = client.get_issue("PROJ-123").await?;
    println!("{}", issue.key);
    Ok(())
}

The release procedure is documented in docs/guide/contributing/release.md. See ADR-022 for the architectural decision behind the dual npm + crates.io distribution.


Development

cargo build                        # debug build
cargo test                         # runs the workspace test suite
cargo clippy --all-targets         # lint (CI uses RUSTFLAGS=-Dwarnings)
cargo fmt --all                    # format
cargo run -p devboy-cli -- doctor  # smoke

The CLI reference is gated in CI: after touching clap definitions, run

cargo run -p devboy-cli -- docs cli --output docs/guide/reference/cli.md

so the committed reference matches the binary. Same idea for devboy tools docs and the tool reference.

See CONTRIBUTING.md for the full guide (commit conventions, branch naming, ADR workflow, release process).


Community

  • Issues / feature requestsGitHub Issues
  • Design discussionsGitHub Discussions
  • Code review tooling — open a PR; CI runs Format, Clippy, Test on macOS / Linux / Windows, Coverage, and the docs drift gate

License

Apache License 2.0 — use it, modify it, ship it; if you build something interesting on top, we'd love a heads-up via Discussions.