Red-Teaming Skills in Hermes: The Bundled Red-Teaming Skill Library Tour
Tour the red-teaming skill category that ships with Hermes Agent. What each category of skill does, responsible-use norms, and how skills chain in an engagement.
Hermes Agent ships with 100+ bundled skills spread across categories like software-development, devops, mlops, writing-communication, and — the one this post is about — red-teaming. The red-teaming category is for authorized security work: testing your own systems, running approved engagements, building defensive evals. It is not a toolkit for attacking things you do not own.
This post is a tour. What is in the category, what each kind of skill does, the responsible-use norms Nous Research documents alongside them, and a realistic example of chaining several skills across a single engagement.
Key Takeaways
- Hermes bundles a
red-teaming/skill category alongside devops, mlops, software-development, and others. - The skills are framed for defenders: prompt-injection probing, jailbreak analysis, safety evals, attack-surface mapping.
- Responsible-use rules apply: authorization, scope, non-destructive by default, no exfiltration.
- Skills are markdown files with SKILL.md front-matter; they are readable and auditable before you run them.
- A typical engagement chains four or five skills: scope the surface, probe, analyze, report.
- Persistence matters here — Hermes's memory keeps engagement context across days, which a per-session CLI cannot.
What Ships in red-teaming/
Categories inside red-teaming/ (at time of writing; the exact catalog evolves):
prompt-injection-probing/— systematic prompting patterns for surfacing instruction override, context hijack, and tool confusion issues against LLM-backed apps.jailbreak-analysis/— analysis-oriented skills for classifying and documenting jailbreak attempts against a target model or app.safety-evals/— running structured evaluations against a model or agent to produce a scored report on refusal, misuse potential, etc.attack-surface-mapping/— enumerating the public surface of an AI-backed product (endpoints, tools, MCP servers, prompts) as a prelude to testing.tool-abuse-testing/— skills focused on agent tool abuse (asking an agent to use its tools in harmful or unintended ways).report-writing/— templates for producing client-ready findings with severity ratings and remediations.
Because skills are just markdown with SKILL.md front-matter, you can read any of them before execution. That matters in red-team work, where you need to know exactly what a skill will cause the agent to do.
Responsible-Use Norms
The Hermes docs frame the red-teaming category tightly: these skills are for authorized security work. Concretely, the norms documented alongside the skills:
- Authorization: you have written permission to test the target.
- Scope: the skills respect an explicit scope (URLs, endpoints, accounts); they do not expand it.
- Non-destructive by default: no data mutation, no persistent changes, no denial-of-service patterns.
- No exfiltration: findings stay in the engagement workspace; nothing leaves without operator confirmation.
- Logging: every action goes to the Hermes memory store in
~/.hermes/so you can produce a clean audit trail.
This is ordinary pentesting discipline, but it matters more with agents because a sloppy agent can do a lot of damage quickly. The skills encode the norms; they do not replace human judgment.
Example Engagement: Chaining Four Skills
Here is a realistic flow for a one-week engagement testing an internal RAG-backed support agent that a customer has authorized you to assess.
Day 1: Attack-Surface Mapping
Start a Hermes session with the attack-surface-mapping skill and a scope file:
hermes chat --skill red-teaming/attack-surface-mapping
Feed the scope:
"Target is https://support-agent.example.internal. Authorized accounts are in ~/engagements/acme/creds.yaml. Scope is the chat endpoint and the two MCP servers it exposes (docs-search and ticket-lookup). Out of scope: anything else on the domain."
Output: a structured map of endpoints, tool surfaces, and observed system-prompt boundaries. Hermes stores this in memory under the engagement tag.
Day 2: Prompt-Injection Probing
Load the probing skill and reference the previous day's map:
hermes chat --skill red-teaming/prompt-injection-probing
"Using the attack-surface map from engagement
acme-2026-04, generate and run 30 injection probes against the chat endpoint. Respect the authorized accounts. Log every probe and response."
Probes run, results log to ~/.hermes/engagements/acme-2026-04/probes.md. Hermes's FTS5 memory makes every probe searchable for the rest of the week.
Day 3: Jailbreak Analysis
Take the raw probe results and run them through the analysis skill:
hermes chat --skill red-teaming/jailbreak-analysis
"Classify yesterday's probes by mechanism (context override, persona swap, encoded instruction, tool confusion). Flag any that succeeded and estimate severity."
Output: a classified table with severity, which feeds the final report. Because memory persists, you do not have to re-paste probe data — Hermes pulls it from storage.
Day 4: Safety Evals
Run the safety-evals skill against a known eval suite the customer cares about:
hermes chat --skill red-teaming/safety-evals --eval-suite ./evals/acme-specific.yaml
Produces scored outputs, stored alongside the engagement.
Day 5: Report Writing
Finally, the report-writing skill pulls everything together:
hermes chat --skill red-teaming/report-writing
"Write the engagement report for
acme-2026-04. Include executive summary, methodology, findings ranked by severity, remediations, and appendix with probe logs."
Because every prior day's artifacts are in Hermes memory, the report draft is grounded in real data, not the model's imagination.
Why Persistence Matters for Red Team Work
An interactive CLI agent loses context between sessions. A week-long engagement needs:
- Continuity across days (yesterday's findings inform today's probes).
- Searchable history of every probe and response.
- Reproducibility — the ability to re-run a probe and compare.
Hermes's FTS5-indexed memory handles all three. Claude Code can do this manually if you discipline yourself with CLAUDE.md and notes, but Hermes is the first runtime I have used where the memory model was clearly designed for multi-day work.
For how the memory system works, see Hermes memory deep dive: FTS5 markdown recall. For how these skills relate to the broader ecosystem, see the agentskills.io standard in Hermes and Claude Code.
Responsible Disclosure and Scope Discipline
One more norm worth calling out: if your probes surface a serious vulnerability, the report-writing skill's default template includes a responsible-disclosure section. Fill it out with the customer before sharing externally. Hermes does not call external services on its own during red-team engagements unless you wire it up — the defaults stay local to your engagement directory. That is a feature, not a limitation.
What This Category Is Not
The red-teaming/ category is not:
- A penetration-testing framework against arbitrary targets.
- A collection of exploits or malware.
- A toolkit for bypassing Anthropic's safety training at scale.
If you are looking for the first, use Metasploit or Burp. If you are looking for the second or third, Hermes is not for you; the skills are structured to keep red-team work scoped, documented, and lawful.
Sources
- GitHub: NousResearch/hermes-agent — see
skills/red-teaming/ - Hermes docs: hermes-agent.nousresearch.com/docs/
- Related: Hermes memory deep dive: FTS5 markdown recall
- Related: The agentskills.io standard across Hermes, Claude Code, and Cursor
- Related: Hermes for DevOps: bundled skills with Claude