Swarm
Cut your LLM costs by 200x. Offload parallel, batch, and research work to Gemini Flash workers instead of burning your expensive primary model.
Cut your LLM costs by 200x. Offload parallel, batch, and research work to Gemini Flash workers instead of burning your expensive primary model.
Real data. Real impact.
Emerging
Developers
Per week
Open source
Skills give you superpowers. Install in 30 seconds.
Turn your expensive model into an affordable daily driver. Offload the boring stuff to Gemini Flash workers — parallel, batch, research — at a fraction of the cost.
| 30 tasks via | Time | Cost |
|---|---|---|
| Opus (sequential) | ~30s | ~$0.50 |
| Swarm (parallel) | ~1s | ~$0.003 |
Swarm is ideal for:
# Check daemon (do this every session) swarm statusStart if not running
swarm start
Parallel prompts
swarm parallel "What is X?" "What is Y?" "What is Z?"
Research multiple subjects
swarm research "OpenAI" "Anthropic" "Mistral" --topic "AI safety"
Discover capabilities
swarm capabilities
N prompts → N workers simultaneously. Best for independent tasks.
swarm parallel "prompt1" "prompt2" "prompt3"
Multi-phase: search → fetch → analyze. Uses Google Search grounding.
swarm research "Buildertrend" "Jobber" --topic "pricing 2026"
Data flows through multiple stages, each with a different perspective/filter. Stages run in sequence; tasks within a stage run in parallel.
Stage modes:
parallel — N inputs → N workers (same perspective)single — merged input → 1 workerfan-out — 1 input → N workers with DIFFERENT perspectivesreduce — N inputs → 1 synthesized outputAuto-chain — describe what you want, get an optimal pipeline:
curl -X POST http://localhost:9999/chain/auto \ -d '{"task":"Find business opportunities","data":"...market data...","depth":"standard"}'
Manual chain:
swarm chain pipeline.json # or echo '{"stages":[...]}' | swarm chain --stdin
Depth presets:
quick (2 stages), standard (4), deep (6), exhaustive (8)
Built-in perspectives: extractor, filter, enricher, analyst, synthesizer, challenger, optimizer, strategist, researcher, critic
Preview without executing:
curl -X POST http://localhost:9999/chain/preview \ -d '{"task":"...","depth":"standard"}'
Compare single vs parallel vs chain on the same task with LLM-as-judge scoring.
curl -X POST http://localhost:9999/benchmark \ -d '{"task":"Analyze X","data":"...","depth":"standard"}'
Scores on 6 FLASK dimensions: accuracy (2x weight), depth (1.5x), completeness, coherence, actionability (1.5x), nuance.
Lets the orchestrator discover what execution modes are available:
swarm capabilities # or curl http://localhost:9999/capabilities
LRU cache for LLM responses. 212x speedup on cache hits (parallel), 514x on chains.
task.cache = false# View cache stats curl http://localhost:9999/cacheClear cache
curl -X DELETE http://localhost:9999/cache
Cache stats show in
swarm status.
If tasks fail within a chain stage, only the failed tasks get retried (not the whole stage). Default: 1 retry. Configurable per-phase via
phase.retries or globally via options.stageRetries.
All endpoints return cost data in their
complete event:
session — current daemon session totalsdaily — persisted across restarts, accumulates all dayswarm status # Shows session + daily cost swarm savings # Monthly savings report
Workers search the live web via Google Search grounding (Gemini only, no extra cost).
# Research uses web search by default swarm research "Subject" --topic "angle"Parallel with web search
curl -X POST http://localhost:9999/parallel
-d '{"prompts":["Current price of X?"],"options":{"webSearch":true}}'
const { parallel, research } = require('~/clawd/skills/node-scaling/lib'); const { SwarmClient } = require('~/clawd/skills/node-scaling/lib/client');// Simple parallel const result = await parallel(['prompt1', 'prompt2', 'prompt3']);
// Client with streaming const client = new SwarmClient(); for await (const event of client.parallel(prompts)) { ... } for await (const event of client.research(subjects, topic)) { ... }
// Chain const result = await client.chainSync({ task, data, depth });
swarm start # Start daemon (background) swarm stop # Stop daemon swarm status # Status, cost, cache stats swarm restart # Restart daemon swarm savings # Monthly savings report swarm logs [N] # Last N lines of daemon log
| Mode | Tasks | Time | Notes |
|---|---|---|---|
| Parallel (simple) | 5 | ~700ms | 142ms/task effective |
| Parallel (stress) | 10 | ~1.2s | 123ms/task effective |
| Chain (standard) | 5 | ~14s | 3-stage multi-perspective |
| Chain (quick) | 2 | ~3s | 2-stage extract+synthesize |
| Cache hit | any | ~3-5ms | 200-500x speedup |
| Research (web) | 2 | ~15s | Google grounding latency |
Location:
~/.config/clawdbot/node-scaling.yaml
node_scaling: enabled: true limits: max_nodes: 16 max_concurrent_api: 16 provider: name: gemini model: gemini-2.0-flash web_search: enabled: true parallel_default: false cost: max_daily_spend: 10.00
| Issue | Fix |
|---|---|
| Daemon not running | |
| No API key | Set or run |
| Rate limited | Lower in config |
| Web search not working | Ensure provider is gemini + web_search.enabled |
| Cache stale results | |
| Chain too slow | Use or check context size |
Force JSON output with schema validation — zero parse failures on structured tasks.
# With built-in schema curl -X POST http://localhost:9999/structured \ -d '{"prompt":"Extract entities from: Tim Cook announced iPhone 17","schema":"entities"}'With custom schema
curl -X POST http://localhost:9999/structured
-d '{"prompt":"Classify this text","data":"...","schema":{"type":"object","properties":{"category":{"type":"string"}}}}'JSON mode (no schema, just force JSON)
curl -X POST http://localhost:9999/structured
-d '{"prompt":"Return a JSON object with name, age, city for a fictional person"}'List available schemas
curl http://localhost:9999/structured/schemas
Built-in schemas:
entities, summary, comparison, actions, classification, qa
Uses Gemini's native
response_mime_type: application/json + responseSchema for guaranteed JSON output. Includes schema validation on the response.
Same prompt → N parallel executions → pick the best answer. Higher accuracy on factual/analytical tasks.
# Judge strategy (LLM picks best — most reliable) curl -X POST http://localhost:9999/vote \ -d '{"prompt":"What are the key factors in SaaS pricing?","n":3,"strategy":"judge"}'Similarity strategy (consensus — zero extra cost)
curl -X POST http://localhost:9999/vote
-d '{"prompt":"What year was Python released?","n":3,"strategy":"similarity"}'Longest strategy (heuristic — zero extra cost)
curl -X POST http://localhost:9999/vote
-d '{"prompt":"Explain recursion","n":3,"strategy":"longest"}'
Strategies:
judge — LLM scores all candidates on accuracy/completeness/clarity/actionability, picks winner (N+1 calls)similarity — Jaccard word-set similarity, picks consensus answer (N calls, zero extra cost)longest — Picks longest response as heuristic for thoroughness (N calls, zero extra cost)When to use: Factual questions, critical decisions, or any task where accuracy > speed.
| Strategy | Calls | Extra Cost | Quality |
|---|---|---|---|
| similarity | N | $0 | Good (consensus) |
| longest | N | $0 | Decent (heuristic) |
| judge | N+1 | ~$0.0001 | Best (LLM-scored) |
Optional critic pass after chain/skeleton output. Scores 5 dimensions, auto-refines if below threshold.
# Add reflect:true to any chain or skeleton request curl -X POST http://localhost:9999/chain/auto \ -d '{"task":"Analyze the AI chip market","data":"...","reflect":true}'curl -X POST http://localhost:9999/skeleton
-d '{"task":"Write a market analysis","reflect":true}'
Proven: improved weak output from 5.0 → 7.6 avg score. Skeleton + reflect scored 9.4/10.
Generate outline → expand each section in parallel → merge into coherent document. Best for long-form content.
curl -X POST http://localhost:9999/skeleton \ -d '{"task":"Write a comprehensive guide to SaaS pricing","maxSections":6,"reflect":true}'
Performance: 14,478 chars in 21s (675 chars/sec) — 5.1x more content than chain at 2.9x higher throughput.
| Metric | Chain | Skeleton-of-Thought | Winner |
|---|---|---|---|
| Output size | 2,856 chars | 14,478 chars | SoT (5.1x) |
| Throughput | 234 chars/sec | 675 chars/sec | SoT (2.9x) |
| Duration | 12s | 21s | Chain (faster) |
| Quality (w/ reflect) | ~7-8/10 | 9.4/10 | SoT |
When to use what:
| Method | Path | Description |
|---|---|---|
| GET | /health | Health check |
| GET | /status | Detailed status + cost + cache |
| GET | /capabilities | Discover execution modes |
| POST | /parallel | Execute N prompts in parallel |
| POST | /research | Multi-phase web research |
| POST | /skeleton | Skeleton-of-Thought (outline → expand → merge) |
| POST | /chain | Manual chain pipeline |
| POST | /chain/auto | Auto-build + execute chain |
| POST | /chain/preview | Preview chain without executing |
| POST | /chain/template | Execute pre-built template |
| POST | /structured | Forced JSON with schema validation |
| GET | /structured/schemas | List built-in schemas |
| POST | /vote | Majority voting (best-of-N) |
| POST | /benchmark | Quality comparison test |
| GET | /templates | List chain templates |
| GET | /cache | Cache statistics |
| DELETE | /cache | Clear cache |
| Model | Cost per 1M tokens | Relative |
|---|---|---|
| Claude Opus 4 | ~$15 input / $75 output | 1x |
| GPT-4o | ~$2.50 input / $10 output | ~7x cheaper |
| Gemini Flash | ~$0.075 input / $0.30 output | 200x cheaper |
Cache hits are essentially free (~3-5ms, no API call).
No automatic installation available. Please visit the source repository for installation instructions.
View Installation Instructions1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.