Research

Every non-trivial optimisation in the format pipeline is grounded in a paper measured against a real corpus — 523 Claude Code sessions, 10,644 MCP tool responses from production traffic. The full source-of-truth (methods, datasets, reproducibility scripts) lives at docs/research/INDEX.md in the repo. This page is a quick orientation.

Papers

#PaperStatusHeadline result
1TrimTree: priority-driven pagination — binary knapsack within a token budget, p₁ metricdraft (820-line full draft, all experiments complete)3.3× p₁ vs uniform on power-law data; FIFO baseline 35% replicated across 3 corpora; KV-cache pass on Sonnet 4.5 ≈ 40% input-side savings (66.5% hit rate)
2Format-adaptive tree encoding — multi-choice knapsack picking CSV / table / key:value per subtreedraftPer-call savings on the corpus: avg 69% on get_issues (top 92%), avg 26% on *_pipeline; ≥ 20% bucket hits 1.25% of all events but most calls of the shape-friendly endpoints
3 (theory · implementation)Context Enrichment Hypothesis + tool-aware knapsack with provider value modelsdraft (prefetch dispatcher merged in v0.22; production telemetry pending)Pearson r = −0.280 between chars_per_item and follow-up enrichment calls; thin issues (< 200 chars/item) → 43% of turns add a get_issue; rich (1.5 k–4 k) → 2%
4Dataset-as-context — large responses become queryable Parquet artefacts the LLM pulls fromdraft (early concept, no measurements yet)Hypothesised 60–80% additional savings on top of TrimTree; evaluation harness not yet built

Corpus baselines

These numbers ground every paper above (paper 1 §B):

  • get_merge_request_diffs — P90 = 35 k chars ≈ 10 k tokens, 28% of responses exceed an 8 k-token budget
  • get_epics — P90 = 43 k chars ≈ 12 k tokens, 37% exceed budget
  • After overflow, agents always produce a text response on the next turn — they never retry / paginate

Status

  • Paper 1 — complete draft, replicated across 3 corpora, lands in the next minor version.
  • Paper 2 — complete draft, lands in the next minor version (the format-adaptive encoder is already in the codebase under feature flag).
  • Paper 3 — prefetch dispatcher merged in v0.22; production telemetry pending.
  • Paper 4 — concept stage, no production code yet.

Notebooks & data

Reproducibility scripts and the corpus notebook are under docs/research/ in the repo (paper-1-repro, benchmarks, data, notebooks).