USP

QMD uniquely combines BM25, vector search, and local LLM re-ranking using GGUF models, all running on-device. It offers seamless integration with AI agents via a dedicated MCP server or CLI, ensuring privacy and speed.

Use cases

01Searching personal markdown notes
02Finding specific information in documentation
03Providing contextual data to AI agents
04Retrieving meeting transcripts
05Building local knowledge bases for AI

Detected files (3)

skills/qmd/SKILL.mdskill

Show content (4079 bytes)

---
name: qmd
description: Search markdown knowledge bases, notes, and documentation using QMD. Use when users ask to search notes, find documents, or look up information.
license: MIT
compatibility: Requires qmd CLI or MCP server. Install via `npm install -g @tobilu/qmd`.
metadata:
  author: tobi
  version: "2.0.0"
allowed-tools: Bash(qmd:*), mcp__qmd__*
---

# QMD - Quick Markdown Search

Local search engine for markdown content.

## Status

!`qmd status 2>/dev/null || echo "Not installed: npm install -g @tobilu/qmd"`

## MCP: `query`

```json
{
  "searches": [
    { "type": "lex", "query": "CAP theorem consistency" },
    { "type": "vec", "query": "tradeoff between consistency and availability" }
  ],
  "collections": ["docs"],
  "limit": 10
}
```

### Query Types

| Type | Method | Input |
|------|--------|-------|
| `lex` | BM25 | Keywords — exact terms, names, code |
| `vec` | Vector | Question — natural language |
| `hyde` | Vector | Answer — hypothetical result (50-100 words) |

### Writing Good Queries

**lex (keyword)**
- 2-5 terms, no filler words
- Exact phrase: `"connection pool"` (quoted)
- Exclude terms: `performance -sports` (minus prefix)
- Code identifiers work: `handleError async`

**vec (semantic)**
- Full natural language question
- Be specific: `"how does the rate limiter handle burst traffic"`
- Include context: `"in the payment service, how are refunds processed"`

**hyde (hypothetical document)**
- Write 50-100 words of what the *answer* looks like
- Use the vocabulary you expect in the result

**expand (auto-expand)**
- Use a single-line query (implicit) or `expand: question` on its own line
- Lets the local LLM generate lex/vec/hyde variations
- Do not mix `expand:` with other typed lines — it's either a standalone expand query or a full query document

### Intent (Disambiguation)

When a query term is ambiguous, add `intent` to steer results:

```json
{
  "searches": [
    { "type": "lex", "query": "performance" }
  ],
  "intent": "web page load times and Core Web Vitals"
}
```

Intent affects expansion, reranking, chunk selection, and snippet extraction. It does not search on its own — it's a steering signal that disambiguates queries like "performance" (web-perf vs team health vs fitness).

### Combining Types

| Goal | Approach |
|------|----------|
| Know exact terms | `lex` only |
| Don't know vocabulary | Use a single-line query (implicit `expand:`) or `vec` |
| Best recall | `lex` + `vec` |
| Complex topic | `lex` + `vec` + `hyde` |
| Ambiguous query | Add `intent` to any combination above |

First query gets 2x weight in fusion — put your best guess first.

### Lex Query Syntax

| Syntax | Meaning | Example |
|--------|---------|---------|
| `term` | Prefix match | `perf` matches "performance" |
| `"phrase"` | Exact phrase | `"rate limiter"` |
| `-term` | Exclude | `performance -sports` |

Note: `-term` only works in lex queries, not vec/hyde.

### Collection Filtering

```json
{ "collections": ["docs"] }              // Single
{ "collections": ["docs", "notes"] }     // Multiple (OR)
```

Omit to search all collections.

## Other MCP Tools

| Tool | Use |
|------|-----|
| `get` | Retrieve doc by path or `#docid` |
| `multi_get` | Retrieve multiple by glob/list |
| `status` | Collections and health |

## CLI

```bash
qmd query "question"              # Auto-expand + rerank
qmd query $'lex: X\nvec: Y'       # Structured
qmd query $'expand: question'     # Explicit expand
qmd query --json --explain "q"    # Show score traces (RRF + rerank blend)
qmd search "keywords"             # BM25 only (no LLM)
qmd get "#abc123"                 # By docid
qmd multi-get "journals/2026-*.md" -l 40  # Batch pull snippets by glob
qmd multi-get notes/foo.md,notes/bar.md   # Comma-separated list, preserves order
```

## HTTP API

```bash
curl -X POST http://localhost:8181/query \
  -H "Content-Type: application/json" \
  -d '{"searches": [{"type": "lex", "query": "test"}]}'
```

## Setup

```bash
npm install -g @tobilu/qmd
qmd collection add ~/notes --name notes
qmd embed
```

skills/release/SKILL.mdskill

Show content (5389 bytes)

---
name: release
description: Manage releases for this project. Validates changelog, installs git hooks, and cuts releases. Use when user says "/release", "release 1.0.5", "cut a release", or asks about the release process. NOT auto-invoked by the model.
disable-model-invocation: true
---

# Release

Cut a release, validate the changelog, and ensure git hooks are installed.

## Usage

`/release 1.0.5` or `/release patch` (bumps patch from current version).

## Process

When the user triggers `/release <version>`:

1. **Gather context** — run `skills/release/scripts/release-context.sh <version>`.
   This silently installs git hooks and prints everything needed: version info,
   working directory status, commits since last release, files changed, current
   `[Unreleased]` content, and the previous release entry for style reference.

2. **Commit outstanding work** — if the context shows staged, modified, or
   untracked files that belong in this release, commit them first. Use the
   /commit skill or make well-formed commits directly.

3. **Write the changelog** — if `[Unreleased]` is empty, write it now using
   the commits and file changes from the context output. Follow the changelog
   standard below. Re-run the context script after committing if needed.

4. **Cut the release** — run `scripts/release.sh <version>`. This renames
   `[Unreleased]` → `[X.Y.Z] - date`, inserts a fresh `[Unreleased]`,
   bumps `package.json`, commits, and tags.

5. **Show the final changelog** — print the full `[Unreleased]` +
   minor series rollup via `scripts/extract-changelog.sh <version>`.
   Ask the user to confirm before pushing.

6. **Push** — after explicit confirmation, run `git push origin main --tags`.

7. **Watch CI** — after the push, start a background dispatch to watch the
   publish workflow. Use `interactive_shell` in dispatch mode with:
   ```
   gh run watch $(gh run list --workflow=publish.yml --limit=1 --json databaseId --jq '.[0].databaseId') --exit-status
   ```
   The agent will be notified when CI completes and should report the result.

7. **Check dependency updates** — before cutting the release, check for
   updates to `sqlite-vec` (and platform packages), `node-llama-cpp`,
   and `better-sqlite3`. Run `pnpm outdated` and report any available
   updates for these packages. If updates exist, bump them (pinned, no
   `^` ranges) and re-run tests before proceeding.

If any step fails, stop and explain. Never force-push or skip validation.

## Dependency Policy

All dependencies must be pinned to exact versions (no `^` or `~` ranges).
The lockfile ensures reproducible installs. When adding or updating any
dependency, always use the exact version string (e.g. `"3.18.1"` not
`"^3.18.1"`).

## Changelog Standard

The changelog lives in `CHANGELOG.md` and follows [Keep a Changelog](https://keepachangelog.com/) conventions.

### Heading format

- `## [Unreleased]` — accumulates entries between releases
- `## [X.Y.Z] - YYYY-MM-DD` — released versions

### Structure of a release entry

Each version entry has two parts:

**1. Highlights (optional, 1-4 sentences of prose)**

Immediately after the version heading, before any `###` section. The elevator
pitch — what would you tell someone in 30 seconds? Only for significant
releases; skip for small patches.

```markdown
## [1.1.0] - 2026-03-01

QMD now runs on both Node.js and Bun, with up to 2.7x faster reranking
through parallel contexts. GPU auto-detection replaces the unreliable
`gpu: "auto"` with explicit CUDA/Metal/Vulkan probing.
```

**2. Detailed changelog (`### Changes` and `### Fixes`)**

```markdown
### Changes

- Runtime: support Node.js (>=22) alongside Bun. The `qmd` wrapper
  auto-detects a suitable install via PATH. #149 (thanks @igrigorik)
- Performance: parallel embedding & reranking — up to 2.7x faster on
  multi-core machines.

### Fixes

- Prevent VRAM waste from duplicate context creation during concurrent
  `embedBatch` calls. #152 (thanks @jkrems)
```

### Writing guidelines

- **Explain the why, not just the what.** The changelog is for users.
- **Include numbers.** "2.7x faster", "17x less memory".
- **Group by theme, not by file.** "Performance" not "Changes to llm.ts".
- **Don't list every commit.** Aggregate related changes.
- **Credit contributors:** end bullets with `#NNN (thanks @username)` for
  external PRs. No need to credit the repo owner.

### What not to include

- Internal refactors with no user-visible effect
- Dependency bumps (unless fixing a user-facing bug)
- CI/tooling changes (unless affecting the release artifact)
- Test additions (unless validating a fix worth mentioning)

## GitHub Release Notes

Each GitHub release includes the full changelog for the **minor series** back
to x.x.0. The `scripts/extract-changelog.sh` script handles this, and the
publish workflow (`publish.yml`) calls it to populate the GitHub release.

## Git Hooks

The pre-push hook (`scripts/pre-push`) blocks `v*` tag pushes unless:

1. `package.json` version matches the tag
2. `CHANGELOG.md` has a `## [X.Y.Z] - date` entry for the version
3. CI passed on GitHub (warns in non-interactive shells, blocks in terminals)

Hooks are installed silently by the context script. They can also be installed
manually via `skills/release/scripts/install-hooks.sh` or automatically via
`bun install` (prepare script).

.claude-plugin/marketplace.jsonmarketplace

Show content (546 bytes)

{
	"name": "qmd",
	"owner": {
		"name": "tobi",
		"email": "tobi@lutke.com"
	},
	"plugins": [
		{
			"name": "qmd",
			"source": "./",
			"description": "Search and retrieve documents from local markdown files.",
			"version": "0.1.0",
			"author": {
				"name": "tobi",
				"email": "tobi@lutke.com"
			},
			"repository": "https://github.com/tobi/qmd",
			"license": "MIT",
			"keywords": ["markdown", "search", "qmd"],
			"skills": ["./skills/"],
			"mcpServers": {
				"qmd": {
					"command": "qmd",
					"args": ["mcp"]
				}
			}
		}
	]
}

README

QMD - Query Markup Documents

An on-device search engine for everything you need to remember. Index your markdown notes, meeting transcripts, documentation, and knowledge bases. Search with keywords or natural language. Ideal for your agentic flows.

QMD combines BM25 full-text search, vector semantic search, and LLM re-ranking—all running locally via node-llama-cpp with GGUF models.

QMD Architecture

You can read more about QMD's progress in the CHANGELOG.

Quick Start

# Install globally (Node or Bun)
npm install -g @tobilu/qmd
# or
bun install -g @tobilu/qmd

# Or run directly
npx @tobilu/qmd ...
bunx @tobilu/qmd ...

# Create collections for your notes, docs, and meeting transcripts
qmd collection add ~/notes --name notes
qmd collection add ~/Documents/meetings --name meetings
qmd collection add ~/work/docs --name docs

# Add context to help with search results, each piece of context will be returned when matching sub documents are returned. This works as a tree. This is the key feature of QMD as it allows LLMs to make much better contextual choices when selecting documents. Don't sleep on it!
qmd context add qmd://notes "Personal notes and ideas"
qmd context add qmd://meetings "Meeting transcripts and notes"
qmd context add qmd://docs "Work documentation"

# Generate embeddings for semantic search
qmd embed

# Search across everything
qmd search "project timeline"           # Fast keyword search
qmd vsearch "how to deploy"             # Semantic search
qmd query "quarterly planning process"  # Hybrid + reranking (best quality)

# Get a specific document
qmd get "meetings/2024-01-15.md"

# Get a document by docid (shown in search results)
qmd get "#abc123"

# Get multiple documents by glob pattern
qmd multi-get "journals/2025-05*.md"

# Search within a specific collection
qmd search "API" -c notes

# Export all matches for an agent
qmd search "API" --all --files --min-score 0.3

Using with AI Agents

QMD's --json and --files output formats are designed for agentic workflows:

# Get structured results for an LLM
qmd search "authentication" --json -n 10

# List all relevant files above a threshold
qmd query "error handling" --all --files --min-score 0.4

# Retrieve full document content
qmd get "docs/api-reference.md" --full

MCP Server

Although the tool works perfectly fine when you just tell your agent to use it on the command line, it also exposes an MCP (Model Context Protocol) server for tighter integration.

Tools exposed:

query — Search with typed sub-queries (lex/vec/hyde), combined via RRF + reranking
get — Retrieve a document by path or docid (with fuzzy matching suggestions)
multi_get — Batch retrieve by glob pattern, comma-separated list, or docids
status — Index health and collection info

Claude Desktop configuration (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "qmd": {
      "command": "qmd",
      "args": ["mcp"]
    }
  }
}

Claude Code — Install the plugin (recommended):

claude plugin marketplace add tobi/qmd
claude plugin install qmd@qmd

Or configure MCP manually in ~/.claude/settings.json:

{
  "mcpServers": {
    "qmd": {
      "command": "qmd",
      "args": ["mcp"]
    }
  }
}

HTTP Transport

By default, QMD's MCP server uses stdio (launched as a subprocess by each client). For a shared, long-lived server that avoids repeated model loading, use the HTTP transport:

# Foreground (Ctrl-C to stop)
qmd mcp --http                    # localhost:8181
qmd mcp --http --port 8080        # custom port

# Background daemon
qmd mcp --http --daemon           # start, writes PID to ~/.cache/qmd/mcp.pid
qmd mcp stop                      # stop via PID file
qmd status                        # shows "MCP: running (PID ...)" when active

The HTTP server exposes two endpoints:

POST /mcp — MCP Streamable HTTP (JSON responses, stateless)
GET /health — liveness check with uptime

LLM models stay loaded in VRAM across requests. Embedding/reranking contexts are disposed after 5 min idle and transparently recreated on the next request (~1s penalty, models remain loaded).

Point any MCP client at http://localhost:8181/mcp to connect.

SDK / Library Usage

Use QMD as a library in your own Node.js or Bun applications.

Installation

npm install @tobilu/qmd

Quick Start

import { createStore } from '@tobilu/qmd'

const store = await createStore({
  dbPath: './my-index.sqlite',
  config: {
    collections: {
      docs: { path: '/path/to/docs', pattern: '**/*.md' },
    },
  },
})

const results = await store.search({ query: "authentication flow" })
console.log(results.map(r => `${r.title} (${Math.round(r.score * 100)}%)`))

await store.close()

Store Creation

createStore() accepts three modes:

import { createStore } from '@tobilu/qmd'

// 1. Inline config — no files needed besides the DB
const store = await createStore({
  dbPath: './index.sqlite',
  config: {
    collections: {
      docs: { path: '/path/to/docs', pattern: '**/*.md' },
      notes: { path: '/path/to/notes' },
    },
  },
})

// 2. YAML config file — collections defined in a file
const store2 = await createStore({
  dbPath: './index.sqlite',
  configPath: './qmd.yml',
})

// 3. DB-only — reopen a previously configured store
const store3 = await createStore({ dbPath: './index.sqlite' })

Search

The unified search() method handles both simple queries and pre-expanded structured queries:

// Simple query — auto-expanded via LLM, then BM25 + vector + reranking
const results = await store.search({ query: "authentication flow" })

// With options
const results2 = await store.search({
  query: "rate limiting",
  intent: "API throttling and abuse prevention",
  collection: "docs",
  limit: 5,
  minScore: 0.3,
  explain: true,
})

// Pre-expanded queries — skip auto-expansion, control each sub-query
const results3 = await store.search({
  queries: [
    { type: 'lex', query: '"connection pool" timeout -redis' },
    { type: 'vec', query: 'why do database connections time out under load' },
  ],
  collections: ["docs", "notes"],
})

// Skip reranking for faster results
const fast = await store.search({ query: "auth", rerank: false })

For direct backend access:

// BM25 keyword search (fast, no LLM)
const lexResults = await store.searchLex("auth middleware", { limit: 10 })

// Vector similarity search (embedding model, no reranking)
const vecResults = await store.searchVector("how users log in", { limit: 10 })

// Manual query expansion for full control
const expanded = await store.expandQuery("auth flow", { intent: "user login" })
const results4 = await store.search({ queries: expanded })

Retrieval

// Get a document by path or docid
const doc = await store.get("docs/readme.md")
const byId = await store.get("#abc123")

if (!("error" in doc)) {
  console.log(doc.title, doc.displayPath, doc.context)
}

// Get document body with line range
const body = await store.getDocumentBody("docs/readme.md", {
  fromLine: 50,
  maxLines: 100,
})

// Batch retrieve by glob or comma-separated list
const { docs, errors } = await store.multiGet("docs/**/*.md", {
  maxBytes: 20480,
})

Collections

// Add a collection
await store.addCollection("myapp", {
  path: "/src/myapp",
  pattern: "**/*.ts",
  ignore: ["node_modules/**", "*.test.ts"],
})

// List collections with document stats
const collections = await store.listCollections()
// => [{ name, pwd, glob_pattern, doc_count, active_count, last_modified, includeByDefault }]

// Get names of collections included in queries by default
const defaults = await store.getDefaultCollectionNames()

// Remove / rename
await store.removeCollection("myapp")
await store.renameCollection("old-name", "new-name")

Context

Context adds descriptive metadata that improves search relevance and is returned alongside results:

// Add context for a path within a collection
await store.addContext("docs", "/api", "REST API reference documentation")

// Set global context (applies to all collections)
await store.setGlobalContext("Internal engineering documentation")

// List all contexts
const contexts = await store.listContexts()
// => [{ collection, path, context }]

// Remove context
await store.removeContext("docs", "/api")
await store.setGlobalContext(undefined)  // clear global

Indexing

// Re-index collections by scanning the filesystem
const result = await store.update({
  collections: ["docs"],  // optional — defaults to all
  onProgress: ({ collection, file, current, total }) => {
    console.log(`[${collection}] ${current}/${total} ${file}`)
  },
})
// => { collections, indexed, updated, unchanged, removed, needsEmbedding }

// Generate vector embeddings
const embedResult = await store.embed({
  force: false,           // true to re-embed everything
  chunkStrategy: "auto",  // "regex" (default) or "auto" (AST for code files)
  onProgress: ({ current, total, collection }) => {
    console.log(`Embedding ${current}/${total}`)
  },
})

Types

Key types exported for SDK consumers:

import type {
  QMDStore,            // The store interface
  SearchOptions,       // Options for search()
  LexSearchOptions,    // Options for searchLex()
  VectorSearchOptions, // Options for searchVector()
  HybridQueryResult,   // Search result with score, snippet, context
  SearchResult,        // Result from searchLex/searchVector
  ExpandedQuery,       // Typed sub-query { type: 'lex'|'vec'|'hyde', query }
  DocumentResult,      // Document metadata + body
  DocumentNotFound,    // Error with similarFiles suggestions
  MultiGetResult,      // Batch retrieval result
  UpdateProgress,      // Progress callback info for update()
  UpdateResult,        // Aggregated update result
  EmbedProgress,       // Progress callback info for embed()
  EmbedResult,         // Embedding result
  StoreOptions,        // createStore() options
  CollectionConfig,    // Inline config shape
  IndexStatus,         // From getStatus()
  IndexHealthInfo,     // From getIndexHealth()
} from '@tobilu/qmd'

Utility exports:

import {
  extractSnippet,              // Extract a relevant snippet from text
  addLineNumbers,              // Add line numbers to text
  DEFAULT_MULTI_GET_MAX_BYTES, // Default max file size for multiGet (10KB)
  Maintenance,                 // Database maintenance operations
} from '@tobilu/qmd'

Lifecycle

// Close the store — disposes LLM models and DB connection
await store.close()

The SDK requires explicit dbPath — no defaults are assumed. This makes it safe to embed in any application without side effects.

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                         QMD Hybrid Search Pipeline                          │
└─────────────────────────────────────────────────────────────────────────────┘

                              ┌─────────────────┐
                              │   User Query    │
                              └────────┬────────┘
                                       │
                        ┌──────────────┴──────────────┐
                        ▼                             ▼
               ┌────────────────┐            ┌────────────────┐
               │ Query Expansion│            │  Original Query│
               │  (fine-tuned)  │            │   (×2 weight)  │
               └───────┬────────┘            └───────┬────────┘
                       │                             │
                       │ 2 alternative queries       │
                       └──────────────┬──────────────┘
                                      │
              ┌───────────────────────┼───────────────────────┐
              ▼                       ▼                       ▼
     ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
     │ Original Query  │     │ Expanded Query 1│     │ Expanded Query 2│
     └────────┬────────┘     └────────┬────────┘     └────────┬────────┘
              │                       │                       │
      ┌───────┴───────┐       ┌───────┴───────┐       ┌───────┴───────┐
      ▼               ▼       ▼               ▼       ▼               ▼
  ┌───────┐       ┌───────┐ ┌───────┐     ┌───────┐ ┌───────┐     ┌───────┐
  │ BM25  │       │Vector │ │ BM25  │     │Vector │ │ BM25  │     │Vector │
  │(FTS5) │       │Search │ │(FTS5) │     │Search │ │(FTS5) │     │Search │
  └───┬───┘       └───┬───┘ └───┬───┘     └───┬───┘ └───┬───┘     └───┬───┘
      │               │         │             │         │             │
      └───────┬───────┘         └──────┬──────┘         └──────┬──────┘
              │                        │                       │
              └────────────────────────┼───────────────────────┘
                                       │
                                       ▼
                          ┌───────────────────────┐
                          │   RRF Fusion + Bonus  │
                          │  Original query: ×2   │
                          │  Top-rank bonus: +0.05│
                          │     Top 30 Kept       │
                          └───────────┬───────────┘
                                      │
                                      ▼
                          ┌───────────────────────┐
                          │    LLM Re-ranking     │
                          │  (qwen3-reranker)     │
                          │  Yes/No + logprobs    │
                          └───────────┬───────────┘
                                      │
                                      ▼
                          ┌───────────────────────┐
                          │  Position-Aware Blend │
                          │  Top 1-3:  75% RRF    │
                          │  Top 4-10: 60% RRF    │
                          │  Top 11+:  40% RRF    │
                          └───────────────────────┘

Score Normalization & Fusion

Search Backends

Backend	Raw Score	Conversion	Range
FTS (BM25)	SQLite FTS5 BM25	`Math.abs(score)`	0 to ~25+
Vector	Cosine distance	`1 / (1 + distance)`	0.0 to 1.0
Reranker	LLM 0-10 rating	`score / 10`	0.0 to 1.0

Fusion Strategy

The query command uses Reciprocal Rank Fusion (RRF) with position-aware blending:

Query Expansion: Original query (×2 for weighting) + 1 LLM variation
Parallel Retrieval: Each query searches both FTS and vector indexes
RRF Fusion: Combine all result lists using score = Σ(1/(k+rank+1)) where k=60
Top-Rank Bonus: Documents ranking #1 in any list get +0.05, #2-3 get +0.02
Top-K Selection: Take top 30 candidates for reranking
Re-ranking: LLM scores each document (yes/no with logprobs confidence)
Position-Aware Blending:
- RRF rank 1-3: 75% retrieval, 25% reranker (preserves exact matches)
- RRF rank 4-10: 60% retrieval, 40% reranker
- RRF rank 11+: 40% retrieval, 60% reranker (trust reranker more)

Why this approach: Pure RRF can dilute exact matches when expanded queries don't match. The top-rank bonus preserves documents that score #1 for the original query. Position-aware blending prevents the reranker from destroying high-confidence retrieval results.

Score Interpretation

Score	Meaning
0.8 - 1.0	Highly relevant
0.5 - 0.8	Moderately relevant
0.2 - 0.5	Somewhat relevant
0.0 - 0.2	Low relevance

Requirements

System Requirements

Node.js >= 22
Bun >= 1.0.0
macOS: Homebrew SQLite (for extension support)
```
brew install sqlite
```

GGUF Models (via node-llama-cpp)

QMD uses three local GGUF models (auto-downloaded on first use):

Model	Purpose	Size
`embeddinggemma-300M-Q8_0`	Vector embeddings (default)	~300MB
`qwen3-reranker-0.6b-q8_0`	Re-ranking	~640MB
`qmd-query-expansion-1.7B-q4_k_m`	Query expansion (fine-tuned)	~1.1GB

Models are downloaded from HuggingFace and cached in ~/.cache/qmd/models/.

Custom Embedding Model

Override the default embedding model via the QMD_EMBED_MODEL environment variable. This is useful for multilingual corpora (e.g. Chinese, Japanese, Korean) where embeddinggemma-300M has limited coverage.

# Use Qwen3-Embedding-0.6B for better multilingual (CJK) support
export QMD_EMBED_MODEL="hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf"

# After changing the model, re-embed all collections:
qmd embed -f

Supported model families:

embeddinggemma (default) — English-optimized, small footprint
Qwen3-Embedding — Multilingual (119 languages including CJK), MTEB top-ranked

Note: When switching embedding models, you must re-index with qmd embed -f since vectors are not cross-compatible between models. The prompt format is automatically adjusted for each model family.

Installation

npm install -g @tobilu/qmd
# or
bun install -g @tobilu/qmd

Development

git clone https://github.com/tobi/qmd
cd qmd
npm install
npm link

Usage

Collection Management

# Create a collection from current directory
qmd collection add . --name myproject

# Create a collection with explicit path and custom glob mask
qmd collection add ~/Documents/notes --name notes --mask "**/*.md"

# List all collections
qmd collection list

# Remove a collection
qmd collection remove myproject

# Rename a collection
qmd collection rename myproject my-project

# List files in a collection
qmd ls notes
qmd ls notes/subfolder

Generate Vector Embeddings

# Embed all indexed documents (900 tokens/chunk, 15% overlap)
qmd embed

# Force re-embed everything
qmd embed -f

# Enable AST-aware chunking for code files (TS, JS, Python, Go, Rust)
qmd embed --chunk-strategy auto

# Also works with query for consistent chunk selection
qmd query "auth flow" --chunk-strategy auto

AST-aware chunking (--chunk-strategy auto) uses tree-sitter to chunk code files at function, class, and import boundaries instead of arbitrary text positions. This produces higher-quality chunks and better search results for codebases. Markdown and other file types always use regex-based chunking regardless of strategy.

The default is regex (existing behavior). Use --chunk-strategy auto to opt in. Run qmd status to verify which grammars are available.

Note: Tree-sitter grammars are optional dependencies. If they are not installed, --chunk-strategy auto falls back to regex-only chunking automatically. Tested on both Node.js and Bun.

Context Management

Context adds descriptive metadata to collections and paths, helping search understand your content.

# Add context to a collection (using qmd:// virtual paths)
qmd context add qmd://notes "Personal notes and ideas"
qmd context add qmd://docs/api "API documentation"

# Add context from within a collection directory
cd ~/notes && qmd context add "Personal notes and ideas"
cd ~/notes/work && qmd context add "Work-related notes"

# Add global context (applies to all collections)
qmd context add / "Knowledge base for my projects"

# List all contexts
qmd context list

# Remove context
qmd context rm qmd://notes/old

Search Commands

┌──────────────────────────────────────────────────────────────────┐
│                        Search Modes                              │
├──────────┬───────────────────────────────────────────────────────┤
│ search   │ BM25 full-text search only                           │
│ vsearch  │ Vector semantic search only                          │
│ query    │ Hybrid: FTS + Vector + Query Expansion + Re-ranking  │
└──────────┴───────────────────────────────────────────────────────┘

# Full-text search (fast, keyword-based)
qmd search "authentication flow"

# Vector search (semantic similarity)
qmd vsearch "how to login"

# Hybrid search with re-ranking (best quality)
qmd query "user authentication"

Options

# Search options
-n <num>           # Number of results (default: 5, or 20 for --files/--json)
-c, --collection   # Restrict search to a specific collection
--all              # Return all matches (use with --min-score to filter)
--min-score <num>  # Minimum score threshold (default: 0)
--full             # Show full document content
--line-numbers     # Add line numbers to output
--explain          # Include retrieval score traces (query, JSON/CLI output)
--index <name>     # Use named index

# Output formats (for search and multi-get)
--files            # Output: docid,score,filepath,context
--json             # JSON output with snippets
--csv              # CSV output
--md               # Markdown output
--xml              # XML output

# Get options
qmd get <file>[:line]  # Get document, optionally starting at line
-l <num>               # Maximum lines to return
--from <num>           # Start from line number

# Multi-get options
-l <num>           # Maximum lines per file
--max-bytes <num>  # Skip files larger than N bytes (default: 10KB)

Output Format

Default output is colorized CLI format (respects NO_COLOR env).

When stdout is a TTY, result paths are emitted as clickable terminal hyperlinks (OSC 8). Clicking a path opens the file in your editor using an editor URI template.

When stdout is not a TTY (for example piped to another command or redirected to a file), QMD emits plain text paths with no escape sequences.

TTY example:

docs/guide.md:42 #a1b2c3
Title: Software Craftsmanship
Context: Work documentation
Score: 93%

This section covers the **craftsmanship** of building
quality software with attention to detail.
See also: engineering principles


notes/meeting.md:15 #d4e5f6
Title: Q4 Planning
Context: Personal notes and ideas
Score: 67%

Discussion about code quality and craftsmanship
in the development process.

Configure the editor link target with QMD_EDITOR_URI (or editor_uri in config):

# VS Code (default)
export QMD_EDITOR_URI="vscode://file/{path}:{line}:{col}"

# Cursor
export QMD_EDITOR_URI="cursor://file/{path}:{line}:{col}"

# Zed
export QMD_EDITOR_URI="zed://file/{path}:{line}:{col}"

# Sublime Text
export QMD_EDITOR_URI="subl://open?url=file://{path}&line={line}"

Template placeholders:

{path} absolute filesystem path (URI-encoded)
{line} 1-based line number
{col} or {column} 1-based column number
Path: Collection-relative path (e.g., docs/guide.md)
Docid: Short hash identifier (e.g., #a1b2c3) - use with qmd get #a1b2c3
Title: Extracted from document (first heading or filename)
Context: Path context if configured via qmd context add
Score: Color-coded (green >70%, yellow >40%, dim otherwise)
Snippet: Context around match with query terms highlighted

Examples

# Get 10 results with minimum score 0.3
qmd query -n 10 --min-score 0.3 "API design patterns"

# Output as markdown for LLM context
qmd search --md --full "error handling"

# JSON output for scripting
qmd query --json "quarterly reports"

# Inspect how each result was scored (RRF + rerank blend)
qmd query --json --explain "quarterly reports"

# Use separate index for different knowledge base
qmd --index work search "quarterly reports"

Index Maintenance

# Show index status and collections with contexts
qmd status

# Re-index all collections
qmd update

# Re-index with git pull first (for remote repos)
qmd update --pull

# Get document by filepath (with fuzzy matching suggestions)
qmd get notes/meeting.md

# Get document by docid (from search results)
qmd get "#abc123"

# Get document starting at line 50, max 100 lines
qmd get notes/meeting.md:50 -l 100

# Get multiple documents by glob pattern
qmd multi-get "journals/2025-05*.md"

# Get multiple documents by comma-separated list (supports docids)
qmd multi-get "doc1.md, doc2.md, #abc123"

# Limit multi-get to files under 20KB
qmd multi-get "docs/*.md" --max-bytes 20480

# Output multi-get as JSON for agent processing
qmd multi-get "docs/*.md" --json

# Clean up cache and orphaned data
qmd cleanup

Data Storage

Index stored in: ~/.cache/qmd/index.sqlite

Schema

collections     -- Indexed directories with name and glob patterns
path_contexts   -- Context descriptions by virtual path (qmd://...)
documents       -- Markdown content with metadata and docid (6-char hash)
documents_fts   -- FTS5 full-text index
content_vectors -- Embedding chunks (hash, seq, pos, 900 tokens each)
vectors_vec     -- sqlite-vec vector index (hash_seq key)
llm_cache       -- Cached LLM responses (query expansion, rerank scores)

Environment Variables

Variable	Default	Description
`XDG_CACHE_HOME`	`~/.cache`	Cache directory location

How It Works

Indexing Flow

Collection ──► Glob Pattern ──► Markdown Files ──► Parse Title ──► Hash Content
    │                                                   │              │
    │                                                   │              ▼
    │                                                   │         Generate docid
    │                                                   │         (6-char hash)
    │                                                   │              │
    └──────────────────────────────────────────────────►└──► Store in SQLite
                                                                       │
                                                                       ▼
                                                                  FTS5 Index

Embedding Flow

Documents are chunked into ~900-token pieces with 15% overlap using smart boundary detection:

Document ──► Smart Chunk (~900 tokens) ──► Format each chunk ──► node-llama-cpp ──► Store Vectors
                │                           "title | text"        embedBatch()
                │
                └─► Chunks stored with:
                    - hash: document hash
                    - seq: chunk sequence (0, 1, 2...)
                    - pos: character position in original

Smart Chunking

Instead of cutting at hard token boundaries, QMD uses a scoring algorithm to find natural markdown break points. This keeps semantic units (sections, paragraphs, code blocks) together.

Break Point Scores:

Pattern	Score	Description
`# Heading`	100	H1 - major section
`## Heading`	90	H2 - subsection
`### Heading`	80	H3
`#### Heading`	70	H4
`##### Heading`	60	H5
`###### Heading`	50	H6
```	80	Code block boundary
`---` / `***`	60	Horizontal rule
Blank line	20	Paragraph boundary
`- item` / `1. item`	5	List item
Line break	1	Minimal break

Algorithm:

Scan document for all break points with scores
When approaching the 900-token target, search a 200-token window before the cutoff
Score each break point: finalScore = baseScore × (1 - (distance/window)² × 0.7)
Cut at the highest-scoring break point

The squared distance decay means a heading 200 tokens back (score ~30) still beats a simple line break at the target (score 1), but a closer heading wins over a distant one.

Code Fence Protection: Break points inside code blocks are ignored—code stays together. If a code block exceeds the chunk size, it's kept whole when possible.

AST-Aware Chunking (Code Files):

For supported code files, QMD also parses the source with tree-sitter and adds AST-derived break points that are merged with the regex scores above:

AST Node	Score	Languages
Class / interface / struct / impl / trait	100	All
Function / method	90	All
Type alias / enum	80	All
Import / use declaration	60	All

Supported for .ts, .tsx, .js, .jsx, .py, .go, and .rs files. Enable with --chunk-strategy auto. Markdown and other file types always use regex chunking.

Query Flow (Hybrid)

Query ──► LLM Expansion ──► [Original, Variant 1, Variant 2]
                │
      ┌─────────┴─────────┐
      ▼                   ▼
   For each query:     FTS (BM25)
      │                   │
      ▼                   ▼
   Vector Search      Ranked List
      │
      ▼
   Ranked List
      │
      └─────────┬─────────┘
                ▼
         RRF Fusion (k=60)
         Original query ×2 weight
         Top-rank bonus: +0.05/#1, +0.02/#2-3
                │
                ▼
         Top 30 candidates
                │
                ▼
         LLM Re-ranking
         (yes/no + logprob confidence)
                │
                ▼
         Position-Aware Blend
         Rank 1-3:  75% RRF / 25% reranker
         Rank 4-10: 60% RRF / 40% reranker
         Rank 11+:  40% RRF / 60% reranker
                │
                ▼
         Final Results

Model Configuration

Models are configured in src/llm.ts as HuggingFace URIs:

const DEFAULT_EMBED_MODEL = "hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf";
const DEFAULT_RERANK_MODEL = "hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf";
const DEFAULT_GENERATE_MODEL = "hf:tobil/qmd-query-expansion-1.7B-gguf/qmd-query-expansion-1.7B-q4_k_m.gguf";

EmbeddingGemma Prompt Format

// For queries
"task: search result | query: {query}"

// For documents
"title: {title} | text: {content}"

Qwen3-Reranker

Uses node-llama-cpp's createRankingContext() and rankAndSort() API for cross-encoder reranking. Returns documents sorted by relevance score (0.0 - 1.0).

Qwen3 (Query Expansion)

Used for generating query variations via LlamaChatSession.

License

MIT