Screenshot Skills for AI Agents
Build visual context capture skills for Claude Code. Take, annotate, and analyze screenshots to give AI agents the visual information they need.
Claude Code is a text-based tool operating in a terminal. It reads files, runs commands, and generates code. But when you are working on UI, there is a massive context gap: Claude cannot see what the interface actually looks like. It writes CSS based on class names, not visual output. It positions elements based on code structure, not rendered layout.
Screenshot skills bridge this gap. They capture what is on screen and feed it to Claude as visual context. Since Claude is multimodal and can analyze images, a screenshot is often worth a thousand lines of HTML description.
Key Takeaways
- Visual context dramatically improves Claude's UI suggestions because it can see layout issues, spacing problems, and visual inconsistencies directly
- Screenshots should be targeted, not full-screen -- capture the specific component or area you want Claude to analyze
- Annotated screenshots with arrows and highlights improve AI comprehension by directing attention to the specific issue
- Automated screenshot capture after code changes creates a visual feedback loop where Claude writes code, sees the result, and iterates
- The skill should support both manual capture and programmatic capture for different workflow needs
Why Visual Context Matters
Consider this interaction without visual context:
You: "The spacing between the cards on the browse page looks off."
Claude: "I can adjust the gap. The current grid uses gap-6. Should I increase or decrease it?"
You: "It's not the gap between cards. It's the padding inside the cards. The title is too close to the top edge."
Claude: "I see CardHeader has pb-4. Should I add pt-6 for more top padding?"
You: "That's still not right. Let me try to describe it better..."
Now with visual context:
You: takes screenshot of browse page, sends to Claude
Claude: "I can see the issue. The card title is crowding the top badge. The CardHeader needs pt-6 and the badge container needs mb-3 for separation. I can also see that the quality star icon is slightly misaligned with the title text -- I'll fix that too."
Claude identified two issues from one screenshot. The visual context let it see the relationship between elements that text descriptions could not convey.
Building the Screenshot Skill
Core Capture Utility
The screenshot skill needs a reliable way to capture screen content from the terminal. On macOS, the screencapture command works without additional dependencies.
#!/bin/bash
# scripts/capture.sh
OUTPUT_DIR="${TMPDIR:-/tmp}/claude-screenshots"
mkdir -p "$OUTPUT_DIR"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
FILENAME="screenshot_${TIMESTAMP}.png"
OUTPUT_PATH="${OUTPUT_DIR}/${FILENAME}"
case "$1" in
"full")
# Capture full screen
screencapture -x "$OUTPUT_PATH"
;;
"window")
# Capture active window
screencapture -x -w "$OUTPUT_PATH"
;;
"area")
# Interactive area selection
screencapture -x -s "$OUTPUT_PATH"
;;
"url")
# Capture a URL using a headless browser (for automated workflows)
npx playwright screenshot "$2" "$OUTPUT_PATH" --full-page
;;
*)
echo "Usage: capture.sh [full|window|area|url <url>]"
exit 1
;;
esac
echo "$OUTPUT_PATH"
The Skill Definition
---
description: "Capture and analyze screenshots for UI development. Use when
the user wants to show Claude what something looks like, compare visual
states, debug layout issues, or verify UI changes visually."
---
# Screenshot Capture
## When to Use
- User asks to "look at" or "see" a UI element
- After making CSS or layout changes, to verify the visual result
- When debugging visual bugs described by the user
- For comparing before/after states of UI changes
## Commands
### Capture Screenshots
- Full screen: `bash scripts/capture.sh full`
- Active window: `bash scripts/capture.sh window`
- Select area: `bash scripts/capture.sh area`
- URL (headless): `bash scripts/capture.sh url http://localhost:3000/browse`
### Read Screenshots
After capturing, read the screenshot file to analyze it:
- The screenshot path is printed to stdout
- Use the Read tool to view the image file
## Workflow
1. Capture the current state (before changes)
2. Make the requested code changes
3. Capture the new state (after changes)
4. Compare and report what changed visually
5. Iterate if the visual result does not match the intent
## Guidelines
- Always capture at consistent viewport sizes for comparison
- Include enough surrounding context to understand the element's position
- For responsive testing, capture at 375px, 768px, and 1280px widths
- Store screenshots in the temp directory, not in the project
Annotation for Better AI Comprehension
Raw screenshots work, but annotated screenshots work better. Annotations direct Claude's attention to specific areas and reduce ambiguity.
Programmatic Annotation
// scripts/annotate.ts
import sharp from 'sharp'
interface Annotation {
type: 'circle' | 'arrow' | 'box'
x: number
y: number
width?: number
height?: number
color: string
label?: string
}
async function annotateScreenshot(
inputPath: string,
outputPath: string,
annotations: Annotation[]
): Promise<void> {
const image = sharp(inputPath)
const metadata = await image.metadata()
// Create SVG overlay with annotations
const svgAnnotations = annotations.map(a => {
switch (a.type) {
case 'box':
return `<rect x="${a.x}" y="${a.y}" width="${a.width}" height="${a.height}"
fill="none" stroke="${a.color}" stroke-width="3" rx="4"/>`
case 'circle':
return `<circle cx="${a.x}" cy="${a.y}" r="20"
fill="none" stroke="${a.color}" stroke-width="3"/>`
default:
return ''
}
}).join('\n')
const svg = `<svg width="${metadata.width}" height="${metadata.height}">
${svgAnnotations}
</svg>`
await image
.composite([{ input: Buffer.from(svg), top: 0, left: 0 }])
.toFile(outputPath)
}
Common Annotation Patterns
Red box around a bug: Highlights the specific element that has an issue.
Green box around correct behavior: Shows Claude what "good" looks like for reference.
Numbered circles: When multiple issues need attention, number them for sequential fixing.
Side-by-side comparison: Place before/after screenshots next to each other with labels.
Automated Visual Feedback Loop
The most powerful use of screenshot skills is an automated feedback loop: Claude changes code, captures a screenshot, analyzes the result, and iterates if needed.
## Automated Visual Iteration
When making UI changes:
1. Ensure the dev server is running on localhost:3000
2. Capture baseline screenshot with `bash scripts/capture.sh url http://localhost:3000/browse`
3. Make the requested code changes
4. Wait 2 seconds for HMR to update
5. Capture result screenshot
6. Read both screenshots and compare
7. If the result doesn't match the user's intent, make adjustments
8. Repeat steps 4-7 up to 3 times
9. Present the final screenshot to the user for approval
This loop means Claude can self-correct visual issues without human intervention. It writes CSS, sees the result, and adjusts -- the same process a human frontend developer follows, but faster.
Responsive Testing With Screenshots
One screenshot is informative. Three screenshots at different viewport sizes are a complete responsive audit.
# Capture at standard breakpoints
npx playwright screenshot http://localhost:3000 /tmp/desktop.png --viewport-size=1280,800
npx playwright screenshot http://localhost:3000 /tmp/tablet.png --viewport-size=768,1024
npx playwright screenshot http://localhost:3000 /tmp/mobile.png --viewport-size=375,812
Claude can then analyze all three screenshots and identify responsive layout issues:
- Elements that overflow on mobile
- Text that is too small on narrow viewports
- Navigation that does not collapse correctly
- Images that do not resize proportionally
For more on browser automation with Claude Code, see our guide on browser-based terminals for AI agents.
Visual Regression Detection
Compare screenshots across code changes to detect unintended visual regressions.
// scripts/visual-diff.ts
import { PNG } from 'pngjs'
import pixelmatch from 'pixelmatch'
async function compareScreenshots(
beforePath: string,
afterPath: string
): Promise<{ diffPercentage: number; diffPath: string }> {
const before = PNG.sync.read(await readFile(beforePath))
const after = PNG.sync.read(await readFile(afterPath))
const diff = new PNG({ width: before.width, height: before.height })
const mismatchedPixels = pixelmatch(
before.data, after.data, diff.data,
before.width, before.height,
{ threshold: 0.1 }
)
const totalPixels = before.width * before.height
const diffPercentage = (mismatchedPixels / totalPixels) * 100
const diffPath = '/tmp/visual-diff.png'
await writeFile(diffPath, PNG.sync.write(diff))
return { diffPercentage, diffPath }
}
If the diff exceeds a threshold (say, 5%), Claude can investigate what changed and whether it was intentional.
Integration With the Skill Ecosystem
Screenshot skills compose well with other skills in the marketplace. Combine them with:
- Build watchers from our build watcher guide to automatically screenshot after build errors are fixed
- Testing skills from the testing guide to add visual verification to test suites
- Browser terminal skills for interactive testing scenarios
FAQ
Can Claude really understand screenshots?
Yes. Claude is a multimodal model that can analyze images. It can identify UI elements, read text in screenshots, detect layout issues, and describe visual relationships between elements. The analysis is not pixel-perfect but is sufficient for most UI development tasks.
What resolution should screenshots be?
Use your display's native resolution. For comparison consistency, always capture at the same resolution. Retina/HiDPI screenshots are larger files but provide better detail for Claude's analysis.
Does this work on Linux and Windows?
The screencapture command is macOS-specific. On Linux, use scrot or gnome-screenshot. On Windows, use Nircmd or PowerShell's screen capture. The Playwright-based URL capture works on all platforms.
How large are screenshot files?
A full-screen PNG at 1920x1080 is about 1-3MB. For AI analysis, you can downsample to 50% resolution without losing meaningful detail. JPEG at 80% quality is a good compromise between file size and quality.
Can I use this for automated visual testing in CI?
The Playwright-based capture (not the macOS screencapture) works in CI environments. Combine it with the visual diff utility for automated visual regression testing in your pipeline. See installing your first skill for setup guidance.
Explore production-ready AI skills at aiskill.market/browse or submit your own skill to the marketplace.
Sources
- Playwright Screenshots - Headless browser screenshot documentation
- Claude Vision - Multimodal image understanding capabilities
- Pixelmatch - Pixel-level image comparison library