Swarm — Cut Your LLM Costs by 200x

Turn your expensive model into an affordable daily driver. Offload the boring stuff to Gemini Flash workers — parallel, batch, research — at a fraction of the cost.

At a Glance

30 tasks via	Time	Cost
Opus (sequential)	~30s	~$0.50
Swarm (parallel)	~1s	~$0.003

When to Use

Swarm is ideal for:

3+ independent tasks (research, summaries, comparisons)
Comparing or researching multiple subjects
Multiple URLs to fetch/analyze
Batch processing (documents, entities, facts)
Complex analysis needing multiple perspectives → use chain

Quick Reference

# Check daemon (do this every session)
swarm status
Start if not running
swarm start
Parallel prompts
swarm parallel "What is X?" "What is Y?" "What is Z?"
Research multiple subjects
swarm research "OpenAI" "Anthropic" "Mistral" --topic "AI safety"
Discover capabilities

swarm capabilities

Execution Modes

Parallel (v1.0)

N prompts → N workers simultaneously. Best for independent tasks.

swarm parallel "prompt1" "prompt2" "prompt3"

Research (v1.1)

Multi-phase: search → fetch → analyze. Uses Google Search grounding.

swarm research "Buildertrend" "Jobber" --topic "pricing 2026"

Chain (v1.3) — Refinement Pipelines

Data flows through multiple stages, each with a different perspective/filter. Stages run in sequence; tasks within a stage run in parallel.

Stage modes:

```
parallel
```
— N inputs → N workers (same perspective)
```
single
```
— merged input → 1 worker
```
fan-out
```
— 1 input → N workers with DIFFERENT perspectives
```
reduce
```
— N inputs → 1 synthesized output

Auto-chain — describe what you want, get an optimal pipeline:

curl -X POST http://localhost:9999/chain/auto \
  -d '{"task":"Find business opportunities","data":"...market data...","depth":"standard"}'

Manual chain:

swarm chain pipeline.json
# or
echo '{"stages":[...]}' | swarm chain --stdin

Depth presets:

quick

(2 stages),

standard

(4),

deep

(6),

exhaustive

(8)

Built-in perspectives: extractor, filter, enricher, analyst, synthesizer, challenger, optimizer, strategist, researcher, critic

Preview without executing:

curl -X POST http://localhost:9999/chain/preview \
  -d '{"task":"...","depth":"standard"}'

Benchmark (v1.3)

Compare single vs parallel vs chain on the same task with LLM-as-judge scoring.

curl -X POST http://localhost:9999/benchmark \
  -d '{"task":"Analyze X","data":"...","depth":"standard"}'

Scores on 6 FLASK dimensions: accuracy (2x weight), depth (1.5x), completeness, coherence, actionability (1.5x), nuance.

Capabilities Discovery (v1.3)

Lets the orchestrator discover what execution modes are available:

swarm capabilities
# or
curl http://localhost:9999/capabilities

Prompt Cache (v1.3.2)

LRU cache for LLM responses. 212x speedup on cache hits (parallel), 514x on chains.

Keyed by hash of instruction + input + perspective
500 entries max, 1 hour TTL
Skips web search tasks (need fresh data)
Persists to disk across daemon restarts
Per-task bypass: set
```
task.cache = false
```

# View cache stats
curl http://localhost:9999/cache
Clear cache

curl -X DELETE http://localhost:9999/cache

Cache stats show in

swarm status

Stage Retry (v1.3.2)

If tasks fail within a chain stage, only the failed tasks get retried (not the whole stage). Default: 1 retry. Configurable per-phase via

phase.retries

or globally via

options.stageRetries

Cost Tracking (v1.3.1)

All endpoints return cost data in their

complete

event:

```
session
```
— current daemon session totals
```
daily
```
— persisted across restarts, accumulates all day

swarm status        # Shows session + daily cost
swarm savings       # Monthly savings report

Web Search (v1.1)

Workers search the live web via Google Search grounding (Gemini only, no extra cost).

# Research uses web search by default
swarm research "Subject" --topic "angle"
Parallel with web search
curl -X POST http://localhost:9999/parallel 

-d '{"prompts":["Current price of X?"],"options":{"webSearch":true}}'

JavaScript API

const { parallel, research } = require('~/clawd/skills/node-scaling/lib');
const { SwarmClient } = require('~/clawd/skills/node-scaling/lib/client');
// Simple parallel
const result = await parallel(['prompt1', 'prompt2', 'prompt3']);
// Client with streaming
const client = new SwarmClient();
for await (const event of client.parallel(prompts)) { ... }
for await (const event of client.research(subjects, topic)) { ... }
// Chain
const result = await client.chainSync({ task, data, depth });

Daemon Management

swarm start              # Start daemon (background)
swarm stop               # Stop daemon
swarm status             # Status, cost, cache stats
swarm restart            # Restart daemon
swarm savings            # Monthly savings report
swarm logs [N]           # Last N lines of daemon log

Performance (v1.3.2)

Mode	Tasks	Time	Notes
Parallel (simple)	5	~700ms	142ms/task effective
Parallel (stress)	10	~1.2s	123ms/task effective
Chain (standard)	5	~14s	3-stage multi-perspective
Chain (quick)	2	~3s	2-stage extract+synthesize
Cache hit	any	~3-5ms	200-500x speedup
Research (web)	2	~15s	Google grounding latency

Config

Location:

~/.config/clawdbot/node-scaling.yaml

node_scaling:
  enabled: true
  limits:
    max_nodes: 16
    max_concurrent_api: 16
  provider:
    name: gemini
    model: gemini-2.0-flash
  web_search:
    enabled: true
    parallel_default: false
  cost:
    max_daily_spend: 10.00

Troubleshooting

Issue	Fix
Daemon not running	`swarm start`
No API key	Set `GEMINI_API_KEY` or run `npm run setup`
Rate limited	Lower `max_concurrent_api` in config
Web search not working	Ensure provider is gemini + web_search.enabled
Cache stale results	`curl -X DELETE http://localhost:9999/cache`
Chain too slow	Use `depth: "quick"` or check context size

Structured Output (v1.3.7)

Force JSON output with schema validation — zero parse failures on structured tasks.

# With built-in schema
curl -X POST http://localhost:9999/structured \
  -d '{"prompt":"Extract entities from: Tim Cook announced iPhone 17","schema":"entities"}'
With custom schema
curl -X POST http://localhost:9999/structured 

-d '{"prompt":"Classify this text","data":"...","schema":{"type":"object","properties":{"category":{"type":"string"}}}}'
JSON mode (no schema, just force JSON)
curl -X POST http://localhost:9999/structured 

-d '{"prompt":"Return a JSON object with name, age, city for a fictional person"}'
List available schemas

curl http://localhost:9999/structured/schemas

Built-in schemas:

entities

summary

comparison

actions

classification

qa

Uses Gemini's native

response_mime_type: application/json

responseSchema

for guaranteed JSON output. Includes schema validation on the response.

Majority Voting (v1.3.7)

Same prompt → N parallel executions → pick the best answer. Higher accuracy on factual/analytical tasks.

# Judge strategy (LLM picks best — most reliable)
curl -X POST http://localhost:9999/vote \
  -d '{"prompt":"What are the key factors in SaaS pricing?","n":3,"strategy":"judge"}'
Similarity strategy (consensus — zero extra cost)
curl -X POST http://localhost:9999/vote 

-d '{"prompt":"What year was Python released?","n":3,"strategy":"similarity"}'
Longest strategy (heuristic — zero extra cost)

curl -X POST http://localhost:9999/vote 

-d '{"prompt":"Explain recursion","n":3,"strategy":"longest"}'

Strategies:

```
judge
```
— LLM scores all candidates on accuracy/completeness/clarity/actionability, picks winner (N+1 calls)
```
similarity
```
— Jaccard word-set similarity, picks consensus answer (N calls, zero extra cost)
```
longest
```
— Picks longest response as heuristic for thoroughness (N calls, zero extra cost)

When to use: Factual questions, critical decisions, or any task where accuracy > speed.

Strategy	Calls	Extra Cost	Quality
similarity	N	$0	Good (consensus)
longest	N	$0	Decent (heuristic)
judge	N+1	~$0.0001	Best (LLM-scored)

Self-Reflection (v1.3.5)

Optional critic pass after chain/skeleton output. Scores 5 dimensions, auto-refines if below threshold.

# Add reflect:true to any chain or skeleton request
curl -X POST http://localhost:9999/chain/auto \
  -d '{"task":"Analyze the AI chip market","data":"...","reflect":true}'

curl -X POST http://localhost:9999/skeleton 

-d '{"task":"Write a market analysis","reflect":true}'

Proven: improved weak output from 5.0 → 7.6 avg score. Skeleton + reflect scored 9.4/10.

Skeleton-of-Thought (v1.3.6)

Generate outline → expand each section in parallel → merge into coherent document. Best for long-form content.

curl -X POST http://localhost:9999/skeleton \
  -d '{"task":"Write a comprehensive guide to SaaS pricing","maxSections":6,"reflect":true}'

Performance: 14,478 chars in 21s (675 chars/sec) — 5.1x more content than chain at 2.9x higher throughput.

Metric	Chain	Skeleton-of-Thought	Winner
Output size	2,856 chars	14,478 chars	SoT (5.1x)
Throughput	234 chars/sec	675 chars/sec	SoT (2.9x)
Duration	12s	21s	Chain (faster)
Quality (w/ reflect)	~7-8/10	9.4/10	SoT

When to use what:

SoT → long-form content, reports, guides, docs (anything with natural sections)
Chain → analysis, research, adversarial review (anything needing multiple perspectives)
Parallel → independent tasks, batch processing
Structured → entity extraction, classification, any task needing reliable JSON
Voting → factual accuracy, critical decisions, consensus-building

API Endpoints

Method	Path	Description
GET	/health	Health check
GET	/status	Detailed status + cost + cache
GET	/capabilities	Discover execution modes
POST	/parallel	Execute N prompts in parallel
POST	/research	Multi-phase web research
POST	/skeleton	Skeleton-of-Thought (outline → expand → merge)
POST	/chain	Manual chain pipeline
POST	/chain/auto	Auto-build + execute chain
POST	/chain/preview	Preview chain without executing
POST	/chain/template	Execute pre-built template
POST	/structured	Forced JSON with schema validation
GET	/structured/schemas	List built-in schemas
POST	/vote	Majority voting (best-of-N)
POST	/benchmark	Quality comparison test
GET	/templates	List chain templates
GET	/cache	Cache statistics
DELETE	/cache	Clear cache

Cost Comparison

Model	Cost per 1M tokens	Relative
Claude Opus 4	~$15 input / $75 output	1x
GPT-4o	~$2.50 input / $10 output	~7x cheaper
Gemini Flash	~$0.075 input / $0.30 output	200x cheaper

Cache hits are essentially free (~3-5ms, no API call).

Swarm

AI Skill Market Insights

Be Part of the 0+ Developer Community

Swarm — Cut Your LLM Costs by 200x

At a Glance

When to Use

Quick Reference

Start if not running

Parallel prompts

Research multiple subjects

Discover capabilities

Execution Modes

Parallel (v1.0)

Research (v1.1)

Chain (v1.3) — Refinement Pipelines

Benchmark (v1.3)

Capabilities Discovery (v1.3)

Prompt Cache (v1.3.2)

Clear cache

Stage Retry (v1.3.2)

Cost Tracking (v1.3.1)

Web Search (v1.1)

Parallel with web search

JavaScript API

Daemon Management

Performance (v1.3.2)

Config

Troubleshooting

Structured Output (v1.3.7)

With custom schema

JSON mode (no schema, just force JSON)

List available schemas

Majority Voting (v1.3.7)

Similarity strategy (consensus — zero extra cost)

Longest strategy (heuristic — zero extra cost)

Self-Reflection (v1.3.5)

Skeleton-of-Thought (v1.3.6)

API Endpoints

Cost Comparison

Quick Start

Manual Installation

TEAR & SHARE

Tags

Channels

Learn

Compare

Company