Introduction
Coding agents are brilliant and forgetful.

Kimetsu
Give your coding agent a memory that gets sharper every run.
Kimetsu (鬼滅), "demon slayer." It slays the demon every agent fights: amnesia.

What is Kimetsu
Coding agents are brilliant and forgetful. Every session starts from zero: the same wrong turns, the same re-explained conventions, the same exploration you already paid for last week.
Kimetsu is a sidecar brain for your agent. One Rust binary and one SQLite file per project, wired into Claude Code, Codex, Pi, OpenClaw, or Cursor over MCP, or driven from its own terminal chat. It captures the lessons an agent earns, learns which ones actually help, and hands them back before the next task. The memory pipeline calls no LLM: storage and retrieval are 100% local, free, and offline-capable.
| 73.3% | BEAM 100K memory benchmark, matching the prior public state of the art, model-free |
| 66.0% | BEAM 1M, ahead of mem0's self-reported 62% at the same bucket |
| 83.0% | LongMemEval, the public long-term-memory benchmark |
| 13x | cheaper per solved task: $0.19 vs $2.47 on a 16-task Terminal-Bench slice |
| ~1M | memories held in ~3 GB RAM with sub-2s retrieval, one SQLite file |
| $0 | API cost to store and recall: zero LLM calls in the memory pipeline |
Quickstart
npm install -g kimetsu-ai
kimetsu npm-flavor embeddings # one-time: enable semantic retrieval
cd /your/project
kimetsu setup --host claude-code # or: codex | openclaw | pi
kimetsu doctor --selftest # records a memory and retrieves itOther install paths (cargo, prebuilt archives) and host-wiring details are in the install guide.
What it does
- Remembers what matters. Project conventions, failure patterns, the exact command that regenerates your schema. Captured once, retrieved by meaning, even when you phrase it differently.
- Speaks first. Most memory waits to be asked. Kimetsu is proactive: a session-start digest, an episodic resume, and pre-task context mean the agent's first turn already knows the repo, your conventions, and what you were doing last time.
- Learns what helps. Memories the model cites before solving a problem get promoted. Stale advice and silent passengers decay and get pruned.
- Answers, not just injects.
kimetsu askcomposes a grounded, cited answer from memory using a local model: zero frontier tokens, works offline. Lessons cited often enough graduate into runnable skills. - Pays for itself. ~13x cheaper per solved task than a no-brain baseline on a recorded Terminal-Bench slice, and the ROI ledger shows the savings on your own work.
- Stays yours. The whole brain is one SQLite file per project. No external
vector DB, no cloud, no telemetry. Back it up with
cp.
How it works
- Before a task, the broker walks your project brain and your cross-project user brain, scores every candidate, and injects the top few inside an adaptive token budget.
- While it works, Kimetsu surfaces known pitfalls before the first attempt, and the model cites the memories that actually help.
- After the task, cited memories get promoted, unused advice decays on a half-life curve, and non-trivial sessions auto-harvest their lessons.
Full mechanics: scoring, citations, decay, conflict detection, retrieval levels, and the daemon, in How Kimetsu Works.
Share your brain
A brain is a portable file. Export it as a pack, hand it to a teammate, merge theirs into yours, or swap a whole brain in and out. Onboard a new machine or a new hire with one import.
# export a shareable pack (always gzip-compressed, always security-scrubbed)
kimetsu brain export onboarding.json.gz --name rust-conventions --version 1.0.0
# merge a pack into your brain (additive, dedups against what you already know)
kimetsu brain import onboarding.json.gz
# install straight from a URL
kimetsu brain import https://example.com/packs/rust-conventions.json.gz
# swap: replace your current memories in the pack's scope (reversible)
kimetsu brain import other-brain.json.gz --mode replace --yesEvery export is scrubbed before it leaves your machine: credentials and PII
are redacted automatically, and --strict aborts the export if anything was
found. Merge is idempotent, so re-importing is safe. Replace supersedes rather
than deletes, so a swap can always be undone. For continuous sharing,
kimetsu brain sync replicates a brain across machines with no server, and
Kimetsu Remote serves one
brain per repository to a whole team.
Benchmarks vs other memory systems
Kimetsu's memory pipeline (ingest, store, retrieve, rerank) makes zero LLM calls: FTS5 + local embeddings + a local cross-encoder. mem0 / Cognee / Zep / Letta call a model to distill memories at write time and keep an LLM in the retrieval loop (mem0's own 2026 figures report ~7,000 tokens per retrieval call, a metered cost on every question). Kimetsu lands in the same accuracy band without the LLM, the bill, or the cloud.
| benchmark | Kimetsu (local, model-free) | mem0 | Cognee |
|---|---|---|---|
| BEAM 1M (matched bucket) | 66.0% | 62% | not reported |
| BEAM 100K | 73.3% | n/a | 79% |
| BEAM 10M | future work | 48.6% | 67% |
LongMemEval (_s) | 83.0% (200-q) · ~80.9% weighted | 94.4% (full set, their reader) | not reported |
Honest, not cherry-picked: our LongMemEval is a 200-question slice (not the full 500), our BEAM-1M is 15 of 35 conversations with a Codex reader vs mem0's full set on their own harness, Cognee (a knowledge-graph system with an LLM in the loop) leads at 100K/10M, and vendor numbers are self-reported. We ship the exact harness, reader, and settings so ours can be checked. Per-ability tables, caveats, and reproduction steps: the memory benchmark.
Retrieval itself is benchmarked too: recall@4 0.949 and MRR 0.914 at ~138 ms
with the default reranker (up to 0.975 / 0.933 with the quality-best one), on a
210-case dataset of real exported memories. Reproduce or re-tune on your own
corpus with kimetsu brain bench.
Command reference
| Command | What it does |
|---|---|
kimetsu setup --host <h> | Wire the brain into a host agent (init + install + selftest) |
kimetsu chat | Standalone terminal coding assistant with the same brain |
kimetsu brain memory add | Record a durable lesson by hand |
kimetsu brain context "<q>" | Broker-ranked context bundle for a query |
kimetsu ask "<q>" | Grounded, cited answer from memory (local model) |
kimetsu resume / kimetsu checkpoint | Pick up where the last session left off |
kimetsu brain export / import | Share brains: scrubbed packs, merge or replace, file or URL |
kimetsu brain sync | Replicate your brain across machines, no server |
kimetsu brain skills | Turn often-cited lessons into runnable skills |
kimetsu brain insights / roi | Is the brain helping, and did it pay for itself |
kimetsu brain tune | Self-tune retrieval against your own query history |
kimetsu brain bench | Benchmark retrieval on your own corpus |
The full command surface, configuration keys, and maintenance commands are in How Kimetsu Works and the install guide.
Kimetsu Remote (beta)
Share one brain per repository from a server over HTTP MCP, for a team or for yourself across machines:
# server
kimetsu-remote serve --addr 0.0.0.0:8787 --data /srv/kimetsu-brains --token <secret>
# each client
kimetsu plugin install claude-code --remote https://kimetsu.example.com:8787Bearer auth, per-repo brains, an optional shared org-brain, server-side repo ingest, TLS, Prometheus metrics, and a server-side reranker. Full setup in the Kimetsu Remote guide.
Docs
- Install & host wiring: every install path, host wiring, auto-harvest and distiller setup, maintenance commands.
- How Kimetsu Works: the brain, the broker, citations, decay, conflict detection, the MCP surface, retrieval models and benchmarking, configuration, the bridge, and doctor.
- Local models: run fully local with Ollama.
- Kimetsu Remote: server setup, org brain, TLS, clients.
- CHANGELOG: what shipped in each release.
License
Dual-licensed under MIT or Apache-2.0, your choice.