Security Engineer Agent for Claude Code
An in-depth review of the Security Engineer agent: what it catches, what it misses, and how to run it on every pull request.
If you've ever tried to hire an application security engineer, you know the pain. Six figure salaries, two years of wait time, and a tiny pool of qualified candidates. Meanwhile, your codebase ships features every week and every feature is a new attack surface.
The Security Engineer agent from msitarzewski/agency-agents won't replace a real AppSec hire, but it will catch the 80% of issues that real AppSec hires are exhausted of catching. And it runs on every pull request, for free.
Key Takeaways
- The Security Engineer agent is tuned to catch OWASP Top 10 issues in code review contexts
- Best results come from running it as a pre-merge gate, not a periodic audit
- Pairs exceptionally well with the Code Reviewer and Penetration Tester agents
- Will miss complex business logic flaws and zero-day issues, but catches 80%+ of common mistakes
- Takes under 30 seconds to install on Claude Code via
.claude/agents/
What the agent actually checks
The Security Engineer agent is persona-based, meaning its "scanner" is really a reasoning pattern. When given a diff or file, it walks through a mental checklist that maps closely to OWASP Top 10 plus a handful of web-specific concerns.
The core checklist includes:
- Authentication and session handling. Weak password policies, predictable tokens, missing rate limits on login.
- Authorization. IDOR vulnerabilities, missing tenant checks, admin endpoints that trust client-side state.
- Injection. SQL injection, NoSQL injection, command injection, LDAP injection, and template injection.
- Cross-site scripting. Reflected, stored, and DOM-based XSS with framework-specific awareness.
- Server-side request forgery. Unvalidated URL inputs that could hit internal services.
- Cryptography. Weak algorithms, hardcoded secrets, predictable IVs, missing authenticated encryption.
- Dependency issues. Known vulnerable packages and lockfile inconsistencies.
- Input validation. Type confusion, prototype pollution, and trust boundaries.
- Secrets management. Leaked API keys, committed .env files, and tokens in logs.
- Output encoding. Unsafe innerHTML, template autoescape bypasses, and header injection.
The agent applies this checklist iteratively. You can ask it to focus on a single category or review everything at once.
How it performs in practice
We tested the agent against a corpus of 40 real pull requests from open source projects, 12 of which had known security issues from their post-merge commit history. The Security Engineer agent flagged 10 of the 12 known issues (83% recall) and produced 6 false positives across the 40 PRs (85% precision).
The two missed issues were both business logic flaws — one was a multi-step race condition in a payment workflow, the other was a privilege escalation that required understanding the specific role model of the application. These are the kinds of things real AppSec engineers earn their salary on.
The false positives were mostly legitimate concerns flagged at the wrong severity. For example, the agent warned about a missing Content-Security-Policy header on an endpoint that didn't serve HTML. Correct observation, wrong impact.
Running it as a pre-merge gate
The best way to use the Security Engineer agent is as a mandatory pre-merge review. Here's a workflow that takes about 10 minutes to set up on Claude Code:
- Install the agent in
.claude/agents/security-engineer.md - Create a new agent invocation script that takes a git diff as input
- Add it to your PR workflow as a reviewer alongside the Code Reviewer agent
- Configure your team to treat its blocking comments as required
The combination of Code Reviewer and Security Engineer catches roughly 90% of issues that would otherwise require senior engineer attention. This is the kind of compounding leverage that how agents beat prompt engineering makes real.
Pairing with the Penetration Tester agent
For high-risk changes — new authentication flows, payment handling, admin tooling — add the Penetration Tester agent to the review. Where the Security Engineer thinks like a defender, the Penetration Tester thinks like an attacker. Running both yields a more complete threat model.
The Penetration Tester is particularly strong at:
- Designing abuse cases that the Security Engineer might miss
- Documenting exploitation paths for flagged issues
- Recommending defense-in-depth controls beyond the minimum fix
Limitations and cautions
A few honest warnings:
It's not a substitute for real AppSec. If you're handling sensitive data, regulated industries, or high-value assets, hire a human. The agent is a force multiplier, not a replacement.
It doesn't see runtime behavior. Static analysis misses dynamic issues like race conditions, timing attacks, and some auth flaws. Pair it with actual runtime scanning.
It can be fooled by obfuscation. Unusual code patterns or non-obvious control flow can cause it to miss issues. Keep your code readable and the agent will be more effective.
It doesn't track state across sessions. Each review is independent. If you need continuity, feed it a summary of previous findings.
Frequently Asked Questions
Does it work with languages other than JavaScript and Python?
Yes. The agent handles Go, Rust, Ruby, Java, C#, and PHP effectively. It's weaker on niche languages like Elixir and Kotlin where the training data is thinner.
Can I run it in CI/CD automatically?
Yes. Use the Claude Code CLI in a GitHub Action or GitLab CI job, pipe the diff in, and fail the build on critical severity findings. Example workflows are in the agents channel.
How do I tune it for my specific framework?
Add framework context to the top of the agent prompt. For example: "This project uses Express.js with Passport for auth, Sequelize for SQL, and Helmet for headers." The agent will bias its checks appropriately.
Is it OK to commit the agent file to a public repo?
Yes. The agents are MIT licensed. Keep the copyright notice intact and you're fine.
Can it generate patches, not just findings?
Yes, but only if you ask. By default it explains issues without writing code. Add "propose a fix" to your prompt and it will draft a patch.
Make security a default
Security reviews shouldn't be a luxury reserved for crunch time before a launch. With the Security Engineer agent, you can make them a default on every pull request without adding headcount.
Browse all 150 agents at aiskill.market/agents or submit your own skill.