qmd
Search personal knowledge bases, notes, docs, and meeting transcripts locally using qmd — a hybrid retrieval engine with BM25, vector search, and LLM reranking. Supports CLI and MCP integration.
Search personal knowledge bases, notes, docs, and meeting transcripts locally using qmd — a hybrid retrieval engine with BM25, vector search, and LLM reranking. Supports CLI and MCP integration.
Real data. Real impact.
Emerging
Developers
Per week
Excellent
Skills give you superpowers. Install in 30 seconds.
Local, on-device search engine for personal knowledge bases. Indexes markdown notes, meeting transcripts, documentation, and any text-based files, then provides hybrid search combining keyword matching, semantic understanding, and LLM-powered reranking — all running locally with no cloud dependencies.
Created by Tobi Lütke. MIT licensed.
# Check version node --version # must be >= 22 # macOS — install or upgrade via Homebrew brew install node@22 # Linux — use NodeSource or nvm curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash - sudo apt-get install -y nodejs # or with nvm: nvm install 22 && nvm use 22
macOS system SQLite lacks extension loading. Install via Homebrew:
brew install sqlite
npm install -g @tobilu/qmd # or with Bun: bun install -g @tobilu/qmd
First run auto-downloads 3 local GGUF models (~2GB total):
| Model | Purpose | Size |
|---|---|---|
| embeddinggemma-300M-Q8_0 | Vector embeddings | ~300MB |
| qwen3-reranker-0.6b-q8_0 | Result reranking | ~640MB |
| qmd-query-expansion-1.7B | Query expansion | ~1.1GB |
qmd --version qmd status
| Command | What It Does | Speed |
|---|---|---|
| BM25 keyword search (no models) | ~0.2s |
| Semantic vector search (1 model) | ~3s |
| Hybrid + reranking (all 3 models) | ~2-3s warm, ~19s cold |
| Retrieve full document content | instant |
| Retrieve multiple files | instant |
| Add a directory as a collection | instant |
| Add context metadata to improve retrieval | instant |
| Generate/update vector embeddings | varies |
| Show index health and collection info | instant |
| Start MCP server (stdio) | persistent |
| Start MCP server (HTTP, warm models) | persistent |
Point qmd at directories containing your documents:
# Add a notes directory qmd collection add ~/notes --name notes # Add project docs qmd collection add ~/projects/myproject/docs --name project-docs # Add meeting transcripts qmd collection add ~/meetings --name meetings # List all collections qmd collection list
Context metadata helps the search engine understand what each collection contains. This significantly improves retrieval quality:
qmd context add qmd://notes "Personal notes, ideas, and journal entries" qmd context add qmd://project-docs "Technical documentation for the main project" qmd context add qmd://meetings "Meeting transcripts and action items from team syncs"
qmd embed
This processes all documents in all collections and generates vector embeddings. Re-run after adding new documents or collections.
qmd status # shows index health, collection stats, model info
Best for: exact terms, code identifiers, names, known phrases. No models loaded — near-instant results.
qmd search "authentication middleware" qmd search "handleError async"
Best for: natural language questions, conceptual queries. Loads embedding model (~3s first query).
qmd vsearch "how does the rate limiter handle burst traffic" qmd vsearch "ideas for improving onboarding flow"
Best for: important queries where quality matters most. Uses all 3 models — query expansion, parallel BM25+vector, reranking.
qmd query "what decisions were made about the database migration"
Combine different search types in a single query for precision:
# BM25 for exact term + vector for concept qmd query $'lex: rate limiter\nvec: how does throttling work under load' # With query expansion qmd query $'expand: database migration plan\nlex: "schema change"'
| Syntax | Effect | Example |
|---|---|---|
| Prefix match | matches "performance" |
| Exact phrase | |
| Exclude term | |
For complex topics, write what you expect the answer to look like:
qmd query $'hyde: The migration plan involves three phases. First, we add the new columns without dropping the old ones. Then we backfill data. Finally we cut over and remove legacy columns.'
qmd search "query" --collection notes qmd query "query" --collection project-docs
qmd search "query" --json # JSON output (best for parsing) qmd search "query" --limit 5 # Limit results qmd get "#abc123" # Get by document ID qmd get "path/to/file.md" # Get by file path qmd get "file.md:50" -l 100 # Get specific line range qmd multi-get "journals/*.md" --json # Batch retrieve by glob
qmd exposes an MCP server that provides search tools directly to Hermes Agent via the native MCP client. This is the preferred integration — once configured, the agent gets qmd tools automatically without needing to load this skill.
Add to
~/.hermes/config.yaml:
mcp_servers: qmd: command: "qmd" args: ["mcp"] timeout: 30 connect_timeout: 45
This registers tools:
mcp_qmd_search, mcp_qmd_vsearch,
mcp_qmd_deep_search, mcp_qmd_get, mcp_qmd_status.
Tradeoff: Models load on first search call (~19s cold start), then stay warm for the session. Acceptable for occasional use.
Start the qmd daemon separately — it keeps models warm in memory:
# Start daemon (persists across agent restarts) qmd mcp --http --daemon # Runs on http://localhost:8181 by default
Then configure Hermes Agent to connect via HTTP:
mcp_servers: qmd: url: "http://localhost:8181/mcp" timeout: 30
Tradeoff: Uses ~2GB RAM while running, but every query is fast (~2-3s). Best for users who search frequently.
cat > ~/Library/LaunchAgents/com.qmd.daemon.plist << 'EOF' <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>com.qmd.daemon</string> <key>ProgramArguments</key> <array> <string>qmd</string> <string>mcp</string> <string>--http</string> <string>--daemon</string> </array> <key>RunAtLoad</key> <true/> <key>KeepAlive</key> <true/> <key>StandardOutPath</key> <string>/tmp/qmd-daemon.log</string> <key>StandardErrorPath</key> <string>/tmp/qmd-daemon.log</string> </dict> </plist> EOF launchctl load ~/Library/LaunchAgents/com.qmd.daemon.plist
mkdir -p ~/.config/systemd/user cat > ~/.config/systemd/user/qmd-daemon.service << 'EOF' [Unit] Description=QMD MCP Daemon After=network.target [Service] ExecStart=qmd mcp --http --daemon Restart=on-failure RestartSec=10 Environment=PATH=/usr/local/bin:/usr/bin:/bin [Install] WantedBy=default.target EOF systemctl --user daemon-reload systemctl --user enable --now qmd-daemon systemctl --user status qmd-daemon
Once connected, these tools are available as
mcp_qmd_*:
| MCP Tool | Maps To | Description |
|---|---|---|
| | BM25 keyword search |
| | Semantic vector search |
| | Hybrid search + reranking |
| | Retrieve document by ID or path |
| | Index health and stats |
The MCP tools accept structured JSON queries for multi-mode search:
{ "searches": [ {"type": "lex", "query": "authentication middleware"}, {"type": "vec", "query": "how user login is verified"} ], "collections": ["project-docs"], "limit": 10 }
When MCP is not configured, use qmd directly via terminal:
terminal(command="qmd query 'what was decided about the API redesign' --json", timeout=30)
For setup and management tasks, always use terminal:
terminal(command="qmd collection add ~/Documents/notes --name notes") terminal(command="qmd context add qmd://notes 'Personal research notes and ideas'") terminal(command="qmd embed") terminal(command="qmd status")
Understanding the internals helps choose the right search mode:
Smart Chunking: Documents are split at natural break points (headings, code blocks, blank lines) targeting ~900 tokens with 15% overlap. Code blocks are never split mid-block.
qmd context add dramatically
improves retrieval accuracy. Describe what each collection contains.qmd embed must be re-run when
new files are added to collections.qmd search for speed — when you need fast keyword lookup
(code identifiers, exact names), BM25 is instant and needs no models.qmd query for quality — when the question is conceptual or
the user needs the best possible results, use hybrid search.Normal — qmd auto-downloads ~2GB of GGUF models on first use. This is a one-time operation.
This happens when models aren't loaded in memory. Solutions:
qmd mcp --http --daemon) to keep warmqmd search (BM25 only) when models aren't neededInstall Homebrew SQLite:
brew install sqlite
Then ensure it's on PATH before system SQLite.
Run
qmd collection add <path> --name <name> to add directories,
then qmd embed to index them.
Set
QMD_EMBED_MODEL environment variable for non-English content:
export QMD_EMBED_MODEL="your-multilingual-model"
~/.cache/qmd/index.sqliteMIT
mkdir -p ~/.hermes/skills/research/qmd && curl -o ~/.hermes/skills/research/qmd/SKILL.md https://raw.githubusercontent.com/NousResearch/hermes-agent/main/optional-skills/research/qmd/SKILL.md1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.