Parallel Workstreams: Dispatching Multiple Claude Instances from One Hermes Host
Using Hermes's delegate_task() to run several Claude instances in parallel, each with isolated context. A concrete CVE-triage example and notes on rate limits.
The trick most agent frameworks miss is that interesting work is rarely serial. You do not want to investigate five security CVEs one at a time. You want five investigations running in parallel, each with its own focused context, and a final consolidation pass that merges the findings. Hermes's delegate_task() primitive is designed for exactly that shape.
This post is about the pattern: one Hermes host orchestrates several Claude instances simultaneously, each working on an isolated subtask. We will look at the mechanics, a concrete example, and the gotchas — rate limits, context isolation, result aggregation.
Key Takeaways
delegate_task()is Hermes's primitive for spawning a subagent; each delegated task runs with its own context window, its own memory slice, and its own tool budget.- Multiple
delegate_task()calls can fan out concurrently — the orchestrator waits for all to return, then consolidates. - Per-task context isolation matters: subagent A's mistakes or irrelevant context do not pollute subagent B's reasoning.
- Rate limits apply at the API key level. Ten parallel Claude calls share your account's requests-per-minute budget.
- The orchestrator pattern maps to a lot of real work: multi-repo analysis, multi-source research, multi-angle code review.
- Result aggregation is where agent quality is actually made or lost — merge carefully.
Why Parallel Beats Serial
A single Claude conversation that tries to "investigate CVE-2024-1111, CVE-2024-2222, CVE-2024-3333, CVE-2024-4444, and CVE-2024-5555, then write a combined report" has problems. The context window fills up with details of the first CVE, which crowds out careful thinking on the second. Token spend is linear in conversation length, so a 50-turn conversation on five items is more expensive than five 10-turn conversations. Worst of all, reasoning quality degrades as irrelevant detail accumulates.
Running each investigation as its own subagent solves all three. Each subagent has a fresh context window focused entirely on its CVE. Each can be capped independently with --max-turns. And the orchestrator is freed to do what it does well — routing and aggregation — without wading through details.
The delegate_task() Shape
delegate_task() takes a prompt and a set of parameters (model, max turns, budget, memory scope) and returns the subagent's final output when it completes. The canonical usage pattern, conceptual (see docs for exact signature):
# orchestrator agent logic, illustrative
cves = ["CVE-2024-1111", "CVE-2024-2222", "CVE-2024-3333",
"CVE-2024-4444", "CVE-2024-5555"]
results = parallel([
delegate_task(
prompt=f"Investigate {cve}: affected packages, severity, "
f"whether we are affected (check requirements.txt in our repos), "
f"and recommended mitigation.",
model="claude-sonnet-4-6",
max_turns=20,
max_budget_usd=0.25,
memory_scope="isolated",
)
for cve in cves
])
consolidate(results)
Each of those five delegate_task() calls spawns its own Claude instance. They run concurrently. The orchestrator blocks until all five return, then runs a consolidation pass over the collected outputs.
Context Isolation
The memory_scope="isolated" parameter is the important one. Options typically include:
- inherited: subagent sees the orchestrator's memory and can update it.
- isolated: subagent starts with a blank memory, nothing it does affects the parent.
- scoped: subagent sees a named slice of the parent's memory.
For the CVE case you want isolated. You do not want subagent A's exploration of CVE-2024-1111's kernel details to bleed into subagent B's investigation of a totally unrelated PHP library bug. Clean rooms, separate conversations, merge at the end.
For other patterns — say, five subagents refactoring different files of the same codebase — inherited or scoped memory makes more sense, because they share context about the codebase itself.
Rate Limits: The Hidden Ceiling
Anthropic API keys have requests-per-minute and tokens-per-minute limits. Ten parallel Claude calls from one Hermes host all share your account's budget. A fan-out of twenty subagents will hit the ceiling on most plans and start getting 429s.
Mitigations:
- Batch the fan-out. If you have 50 tasks, dispatch 10 at a time, wait, dispatch the next 10. Hermes typically supports a concurrency limit — set it.
- Use the fallback model chain. Configure a fallback to a cheaper model (Haiku, GPT-4o-mini) for subagents so quota exhaustion degrades gracefully. See cost-control-hermes-max-turns-budget-fallback.
- Distribute across providers. If you have access to multiple providers, spread subagents across them. Hermes's model-agnostic design makes this natural.
A Concrete Example: 5 CVEs, 5 Subagents, One Report
Prompt to the orchestrator:
"Investigate five security CVEs in parallel. For each, produce: affected packages, severity rating, whether our codebase is affected (check our requirements files), and a recommended action. When all five complete, consolidate into a single prioritized action list sorted by severity and our exposure."
The orchestrator parses the request, dispatches five delegate_task() calls. Each subagent:
- Reads the CVE description from NVD or GitHub advisory.
- Greps our repos (via the filesystem tool) for affected packages.
- Determines severity from the CVSS score and our exposure level.
- Writes a short report with a recommendation.
When all five return, the orchestrator gets five structured outputs. It produces the consolidated view — sorted, deduplicated, with cross-cutting observations ("two of these affect the same dependency; upgrade once, fix both").
Total wall-clock time: roughly the duration of the slowest subagent, not the sum of all five. That is the value.
Result Aggregation: The Quality Bottleneck
The aggregation step is where parallel agent workflows most often fail. Five subagents produce five good local reports, and the consolidation pass produces incoherent slop because the orchestrator is trying to merge too much detail.
Two patterns that work:
- Structured output. Require each subagent to return JSON or YAML with specified fields. The orchestrator merges structured data, which is deterministic.
- Progressive consolidation. If fanout is wide (say, 20 subagents), do tree-shaped consolidation — pair up results, consolidate pairs, then consolidate pairs of pairs. Keeps any single consolidation step small enough to reason about.
When Parallel Is Not The Answer
Not every task benefits. Three anti-patterns:
- Sequential dependencies. "Research topic A, then based on that, research topic B" cannot parallelize by definition.
- Shared-state edits. Five subagents writing to the same file will conflict. Use worktrees or serialize.
- Small total task. Overhead of spawning subagents is non-trivial. For two small queries, just do them in one conversation.
Closing Thought
delegate_task() turns Hermes from a chat interface into an orchestrator. That is the conceptual shift. You stop writing long prompts for one agent and start writing routing logic for many agents, each focused. For multi-source research, multi-repo analysis, and anything with a natural fanout, the parallel pattern is the right shape.
Sources
- Hermes GitHub: github.com/NousResearch/hermes-agent
- Hermes docs: hermes-agent.nousresearch.com/docs/
- Series: cost-control-hermes-max-turns-budget-fallback, spawning-claude-code-as-hermes-subagent, hermes-on-modal-serverless-claude-agents
- Anthropic rate limits: docs.anthropic.com/en/api/rate-limits
- Anthropic API docs: docs.anthropic.com