llm-wiki
Karpathy's LLM Wiki — build and maintain a persistent, interlinked markdown knowledge base. Ingest sources, query compiled knowledge, and lint for consistency.
Karpathy's LLM Wiki — build and maintain a persistent, interlinked markdown knowledge base. Ingest sources, query compiled knowledge, and lint for consistency.
Real data. Real impact.
Emerging
Developers
Per week
Excellent
Skills give you superpowers. Install in 30 seconds.
Build and maintain a persistent, compounding knowledge base as interlinked markdown files. Based on Andrej Karpathy's LLM Wiki pattern.
Unlike traditional RAG (which rediscovers knowledge from scratch per query), the wiki compiles knowledge once and keeps it current. Cross-references are already there. Contradictions have already been flagged. Synthesis reflects everything ingested.
Division of labor: The human curates sources and directs analysis. The agent summarizes, cross-references, files, and maintains consistency.
Use this skill when the user:
Location: Set via
WIKI_PATH environment variable (e.g. in ~/.hermes/.env).
If unset, defaults to
~/wiki.
WIKI="${WIKI_PATH:-$HOME/wiki}"
The wiki is just a directory of markdown files — open it in Obsidian, VS Code, or any editor. No database, no special tooling required.
wiki/ ├── SCHEMA.md # Conventions, structure rules, domain config ├── index.md # Sectioned content catalog with one-line summaries ├── log.md # Chronological action log (append-only, rotated yearly) ├── raw/ # Layer 1: Immutable source material │ ├── articles/ # Web articles, clippings │ ├── papers/ # PDFs, arxiv papers │ ├── transcripts/ # Meeting notes, interviews │ └── assets/ # Images, diagrams referenced by sources ├── entities/ # Layer 2: Entity pages (people, orgs, products, models) ├── concepts/ # Layer 2: Concept/topic pages ├── comparisons/ # Layer 2: Side-by-side analyses └── queries/ # Layer 2: Filed query results worth keeping
Layer 1 — Raw Sources: Immutable. The agent reads but never modifies these. Layer 2 — The Wiki: Agent-owned markdown files. Created, updated, and cross-referenced by the agent. Layer 3 — The Schema:
SCHEMA.md defines structure, conventions, and tag taxonomy.
When the user has an existing wiki, always orient yourself before doing anything:
① Read
— understand the domain, conventions, and tag taxonomy.
② Read SCHEMA.md
— learn what pages exist and their summaries.
③ Scan recent index.md
— read the last 20-30 entries to understand recent activity.log.md
WIKI="${WIKI_PATH:-$HOME/wiki}" # Orientation reads at session start read_file "$WIKI/SCHEMA.md" read_file "$WIKI/index.md" read_file "$WIKI/log.md" offset=<last 30 lines>
Only after orientation should you ingest, query, or lint. This prevents:
For large wikis (100+ pages), also run a quick
search_files for the topic
at hand before creating anything new.
When the user asks to create or start a wiki:
$WIKI_PATH env var, or ask the user; default ~/wiki)SCHEMA.md customized to the domain (see template below)index.md with sectioned headerlog.md with creation entryAdapt to the user's domain. The schema constrains agent behavior and ensures consistency:
# Wiki Schema ## Domain [What this wiki covers — e.g., "AI/ML research", "personal health", "startup intelligence"] ## Conventions - File names: lowercase, hyphens, no spaces (e.g., `transformer-architecture.md`) - Every wiki page starts with YAML frontmatter (see below) - Use `[[wikilinks]]` to link between pages (minimum 2 outbound links per page) - When updating a page, always bump the `updated` date - Every new page must be added to `index.md` under the correct section - Every action must be appended to `log.md` - **Provenance markers:** On pages that synthesize 3+ sources, append `^[raw/articles/source-file.md]` at the end of paragraphs whose claims come from a specific source. This lets a reader trace each claim back without re-reading the whole raw file. Optional on single-source pages where the `sources:` frontmatter is enough. ## Frontmatter ```yaml --- title: Page Title created: YYYY-MM-DD updated: YYYY-MM-DD type: entity | concept | comparison | query | summary tags: [from taxonomy below] sources: [raw/articles/source-name.md] # Optional quality signals: confidence: high | medium | low # how well-supported the claims are contested: true # set when the page has unresolved contradictions contradictions: [other-page-slug] # pages this one conflicts with ---
confidence and contested are optional but recommended for opinion-heavy or fast-moving
topics. Lint surfaces contested: true and confidence: low pages for review so weak claims
don't silently harden into accepted wiki fact.
Raw sources ALSO get a small frontmatter block so re-ingests can detect drift:
--- source_url: https://example.com/article # original URL, if applicable ingested: YYYY-MM-DD sha256: <hex digest of the raw content below the frontmatter> ---
The
sha256: lets a future re-ingest of the same URL skip processing when content is unchanged,
and flag drift when it has changed. Compute over the body only (everything after the closing
---), not the frontmatter itself.
[Define 10-20 top-level tags for the domain. Add new tags here BEFORE using them.]
Example for AI/ML:
Rule: every tag on a page must appear in this taxonomy. If a new tag is needed, add it here first, then use it. This prevents tag sprawl.
_archive/, remove from indexOne page per notable entity. Include:
One page per concept or topic. Include:
Side-by-side analyses. Include:
When new information conflicts with existing content:
contradictions: [page-name]### index.md Template The index is sectioned by type. Each entry is one line: wikilink + summary. ```markdown # Wiki Index > Content catalog. Every wiki page listed under its type with a one-line summary. > Read this first to find relevant pages for any query. > Last updated: YYYY-MM-DD | Total pages: N ## Entities <!-- Alphabetical within section --> ## Concepts ## Comparisons ## Queries
Scaling rule: When any section exceeds 50 entries, split it into sub-sections by first letter or sub-domain. When the index exceeds 200 entries total, create a
_meta/topic-map.md that groups pages by theme for faster navigation.
# Wiki Log > Chronological record of all wiki actions. Append-only. > Format: `## [YYYY-MM-DD] action | subject` > Actions: ingest, update, query, lint, create, archive, delete > When this file exceeds 500 entries, rotate: rename to log-YYYY.md, start fresh. ## [YYYY-MM-DD] create | Wiki initialized - Domain: [domain] - Structure created with SCHEMA.md, index.md, log.md
When the user provides a source (URL, file, paste), integrate it into the wiki:
① Capture the raw source:
web_extract to get markdown, save to raw/articles/web_extract (handles PDFs), save to raw/papers/raw/ subdirectoryraw/articles/karpathy-llm-wiki-2026.mdsource_url, ingested, sha256 of the body).
On re-ingest of the same URL: recompute the sha256, compare to the stored value —
skip if identical, flag drift and update if different. This is cheap enough to
do on every re-ingest and catches silent source changes.② Discuss takeaways with the user — what's interesting, what matters for the domain. (Skip this in automated/cron contexts — proceed directly.)
③ Check what already exists — search index.md and use
search_files to find
existing pages for mentioned entities/concepts. This is the difference between
a growing wiki and a pile of duplicates.
④ Write or update wiki pages:
updated date.
When new info contradicts existing content, follow the Update Policy.[[wikilinks]]. Check that existing pages link back.^[raw/articles/source.md]
markers to paragraphs whose claims trace to a specific source.confidence: medium or low in frontmatter. Don't mark high unless the
claim is well-supported across multiple sources.⑤ Update navigation:
index.md under the correct section, alphabeticallylog.md: ## [YYYY-MM-DD] ingest | Source Title⑥ Report what changed — list every file created or updated to the user.
A single source can trigger updates across 5-15 wiki pages. This is normal and desired — it's the compounding effect.
When the user asks a question about the wiki's domain:
① Read
to identify relevant pages.
② For wikis with 100+ pages, also index.md
search_files across all .md files
for key terms — the index alone may miss relevant content.
③ Read the relevant pages using read_file.
④ Synthesize an answer from the compiled knowledge. Cite the wiki pages
you drew from: "Based on [[page-a]] and [[page-b]]..."
⑤ File valuable answers back — if the answer is a substantial comparison,
deep dive, or novel synthesis, create a page in queries/ or comparisons/.
Don't file trivial lookups — only answers that would be painful to re-derive.
⑥ Update log.md with the query and whether it was filed.
When the user asks to lint, health-check, or audit the wiki:
① Orphan pages: Find pages with no inbound
[[wikilinks]] from other pages.
# Use execute_code for this — programmatic scan across all wiki pages import os, re from collections import defaultdict wiki = "<WIKI_PATH>" # Scan all .md files in entities/, concepts/, comparisons/, queries/ # Extract all [[wikilinks]] — build inbound link map # Pages with zero inbound links are orphans
② Broken wikilinks: Find
[[links]] that point to pages that don't exist.
③ Index completeness: Every wiki page should appear in
index.md. Compare
the filesystem against index entries.
④ Frontmatter validation: Every wiki page must have all required fields (title, created, updated, type, tags, sources). Tags must be in the taxonomy.
⑤ Stale content: Pages whose
updated date is >90 days older than the most
recent source that mentions the same entities.
⑥ Contradictions: Pages on the same topic with conflicting claims. Look for pages that share tags/entities but state different facts. Surface all pages with
contested: true or contradictions: frontmatter for user review.
⑦ Quality signals: List pages with
confidence: low and any page that cites
only a single source but has no confidence field set — these are candidates
for either finding corroboration or demoting to confidence: medium.
⑧ Source drift: For each file in
raw/ with a sha256: frontmatter, recompute
the hash and flag mismatches. Mismatches indicate the raw file was edited
(shouldn't happen — raw/ is immutable) or ingested from a URL that has since
changed. Not a hard error, but worth reporting.
⑨ Page size: Flag pages over 200 lines — candidates for splitting.
⑩ Tag audit: List all tags in use, flag any not in the SCHEMA.md taxonomy.
⑪ Log rotation: If log.md exceeds 500 entries, rotate it.
⑫ Report findings with specific file paths and suggested actions, grouped by severity (broken links > orphans > source drift > contested pages > stale content > style issues).
⑬ Append to log.md:
## [YYYY-MM-DD] lint | N issues found
# Find pages by content search_files "transformer" path="$WIKI" file_glob="*.md" # Find pages by filename search_files "*.md" target="files" path="$WIKI" # Find pages by tag search_files "tags:.*alignment" path="$WIKI" file_glob="*.md" # Recent activity read_file "$WIKI/log.md" offset=<last 20 lines>
When ingesting multiple sources at once, batch the updates:
When content is fully superseded or the domain scope changes:
_archive/ directory if it doesn't exist_archive/ with its original path (e.g., _archive/entities/old-page.md)index.mdThe wiki directory works as an Obsidian vault out of the box:
[[wikilinks]] render as clickable linksraw/assets/ folder holds images referenced via ![[image.png]]For best results:
raw/assets/TABLE tags FROM "entities" WHERE contains(tags, "company")If using the Obsidian skill alongside this one, set
OBSIDIAN_VAULT_PATH to the
same directory as the wiki path.
On machines without a display, use
obsidian-headless instead of the desktop app.
It syncs vaults via Obsidian Sync without a GUI — perfect for agents running on
servers that write to the wiki while Obsidian desktop reads it on another device.
Setup:
# Requires Node.js 22+ npm install -g obsidian-headless # Login (requires Obsidian account with Sync subscription) ob login --email <email> --password '<password>' # Create a remote vault for the wiki ob sync-create-remote --name "LLM Wiki" # Connect the wiki directory to the vault cd ~/wiki ob sync-setup --vault "<vault-id>" # Initial sync ob sync # Continuous sync (foreground — use systemd for background) ob sync --continuous
Continuous background sync via systemd:
# ~/.config/systemd/user/obsidian-wiki-sync.service [Unit] Description=Obsidian LLM Wiki Sync After=network-online.target Wants=network-online.target [Service] ExecStart=/path/to/ob sync --continuous WorkingDirectory=/home/user/wiki Restart=on-failure RestartSec=10 [Install] WantedBy=default.target
systemctl --user daemon-reload systemctl --user enable --now obsidian-wiki-sync # Enable linger so sync survives logout: sudo loginctl enable-linger $USER
This lets the agent write to
~/wiki on a server while you browse the same
vault in Obsidian on your laptop/phone — changes appear within seconds.
raw/ — sources are immutable. Corrections go in wiki pages.log-YYYY.md and start fresh.
The agent should check log size during lint.llm-wiki-compiler is a Node.js CLI that compiles sources into a concept wiki with the same Karpathy inspiration. It's Obsidian-compatible, so users who want a scheduled/CLI-driven compile pipeline can point it at the same vault this skill maintains. Trade-offs: it owns page generation (replaces the agent's judgment on page creation) and is tuned for small corpora. Use this skill when you want agent-in-the-loop curation; use llmwiki when you want batch compile of a source directory.
MIT
mkdir -p ~/.hermes/skills/research/llm-wiki && curl -o ~/.hermes/skills/research/llm-wiki/SKILL.md https://raw.githubusercontent.com/NousResearch/hermes-agent/main/skills/research/llm-wiki/SKILL.md1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.