Screenshot Skills for AI Agents

Claude Code is a text-based tool operating in a terminal. It reads files, runs commands, and generates code. But when you are working on UI, there is a massive context gap: Claude cannot see what the interface actually looks like. It writes CSS based on class names, not visual output. It positions elements based on code structure, not rendered layout.

Screenshot skills bridge this gap. They capture what is on screen and feed it to Claude as visual context. Since Claude is multimodal and can analyze images, a screenshot is often worth a thousand lines of HTML description.

Key Takeaways

Visual context dramatically improves Claude's UI suggestions because it can see layout issues, spacing problems, and visual inconsistencies directly
Screenshots should be targeted, not full-screen -- capture the specific component or area you want Claude to analyze
Annotated screenshots with arrows and highlights improve AI comprehension by directing attention to the specific issue
Automated screenshot capture after code changes creates a visual feedback loop where Claude writes code, sees the result, and iterates
The skill should support both manual capture and programmatic capture for different workflow needs

Why Visual Context Matters

Consider this interaction without visual context:

You: "The spacing between the cards on the browse page looks off."

Claude: "I can adjust the gap. The current grid uses gap-6. Should I increase or decrease it?"

You: "It's not the gap between cards. It's the padding inside the cards. The title is too close to the top edge."

Claude: "I see CardHeader has pb-4. Should I add pt-6 for more top padding?"

You: "That's still not right. Let me try to describe it better..."

Now with visual context:

You: takes screenshot of browse page, sends to Claude

Claude: "I can see the issue. The card title is crowding the top badge. The CardHeader needs pt-6 and the badge container needs mb-3 for separation. I can also see that the quality star icon is slightly misaligned with the title text -- I'll fix that too."

Claude identified two issues from one screenshot. The visual context let it see the relationship between elements that text descriptions could not convey.

Building the Screenshot Skill

Core Capture Utility

The screenshot skill needs a reliable way to capture screen content from the terminal. On macOS, the screencapture command works without additional dependencies.

#!/bin/bash
# scripts/capture.sh

OUTPUT_DIR="${TMPDIR:-/tmp}/claude-screenshots"
mkdir -p "$OUTPUT_DIR"

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
FILENAME="screenshot_${TIMESTAMP}.png"
OUTPUT_PATH="${OUTPUT_DIR}/${FILENAME}"

case "$1" in
  "full")
    # Capture full screen
    screencapture -x "$OUTPUT_PATH"
    ;;
  "window")
    # Capture active window
    screencapture -x -w "$OUTPUT_PATH"
    ;;
  "area")
    # Interactive area selection
    screencapture -x -s "$OUTPUT_PATH"
    ;;
  "url")
    # Capture a URL using a headless browser (for automated workflows)
    npx playwright screenshot "$2" "$OUTPUT_PATH" --full-page
    ;;
  *)
    echo "Usage: capture.sh [full|window|area|url <url>]"
    exit 1
    ;;
esac

echo "$OUTPUT_PATH"

The Skill Definition

---
description: "Capture and analyze screenshots for UI development. Use when 
the user wants to show Claude what something looks like, compare visual 
states, debug layout issues, or verify UI changes visually."
---

# Screenshot Capture

## When to Use
- User asks to "look at" or "see" a UI element
- After making CSS or layout changes, to verify the visual result
- When debugging visual bugs described by the user
- For comparing before/after states of UI changes

## Commands

### Capture Screenshots
- Full screen: `bash scripts/capture.sh full`
- Active window: `bash scripts/capture.sh window`
- Select area: `bash scripts/capture.sh area`
- URL (headless): `bash scripts/capture.sh url http://localhost:3000/browse`

### Read Screenshots
After capturing, read the screenshot file to analyze it:
- The screenshot path is printed to stdout
- Use the Read tool to view the image file

## Workflow
1. Capture the current state (before changes)
2. Make the requested code changes
3. Capture the new state (after changes)
4. Compare and report what changed visually
5. Iterate if the visual result does not match the intent

## Guidelines
- Always capture at consistent viewport sizes for comparison
- Include enough surrounding context to understand the element's position
- For responsive testing, capture at 375px, 768px, and 1280px widths
- Store screenshots in the temp directory, not in the project

Annotation for Better AI Comprehension

Raw screenshots work, but annotated screenshots work better. Annotations direct Claude's attention to specific areas and reduce ambiguity.

Programmatic Annotation

// scripts/annotate.ts
import sharp from 'sharp'

interface Annotation {
  type: 'circle' | 'arrow' | 'box'
  x: number
  y: number
  width?: number
  height?: number
  color: string
  label?: string
}

async function annotateScreenshot(
  inputPath: string,
  outputPath: string,
  annotations: Annotation[]
): Promise<void> {
  const image = sharp(inputPath)
  const metadata = await image.metadata()
  
  // Create SVG overlay with annotations
  const svgAnnotations = annotations.map(a => {
    switch (a.type) {
      case 'box':
        return `<rect x="${a.x}" y="${a.y}" width="${a.width}" height="${a.height}" 
                  fill="none" stroke="${a.color}" stroke-width="3" rx="4"/>`
      case 'circle':
        return `<circle cx="${a.x}" cy="${a.y}" r="20" 
                  fill="none" stroke="${a.color}" stroke-width="3"/>`
      default:
        return ''
    }
  }).join('\n')
  
  const svg = `<svg width="${metadata.width}" height="${metadata.height}">
    ${svgAnnotations}
  </svg>`
  
  await image
    .composite([{ input: Buffer.from(svg), top: 0, left: 0 }])
    .toFile(outputPath)
}

Common Annotation Patterns

Red box around a bug: Highlights the specific element that has an issue.

Green box around correct behavior: Shows Claude what "good" looks like for reference.

Numbered circles: When multiple issues need attention, number them for sequential fixing.

Side-by-side comparison: Place before/after screenshots next to each other with labels.

Automated Visual Feedback Loop

The most powerful use of screenshot skills is an automated feedback loop: Claude changes code, captures a screenshot, analyzes the result, and iterates if needed.

## Automated Visual Iteration

When making UI changes:
1. Ensure the dev server is running on localhost:3000
2. Capture baseline screenshot with `bash scripts/capture.sh url http://localhost:3000/browse`
3. Make the requested code changes
4. Wait 2 seconds for HMR to update
5. Capture result screenshot
6. Read both screenshots and compare
7. If the result doesn't match the user's intent, make adjustments
8. Repeat steps 4-7 up to 3 times
9. Present the final screenshot to the user for approval

This loop means Claude can self-correct visual issues without human intervention. It writes CSS, sees the result, and adjusts -- the same process a human frontend developer follows, but faster.

Responsive Testing With Screenshots

One screenshot is informative. Three screenshots at different viewport sizes are a complete responsive audit.

# Capture at standard breakpoints
npx playwright screenshot http://localhost:3000 /tmp/desktop.png --viewport-size=1280,800
npx playwright screenshot http://localhost:3000 /tmp/tablet.png --viewport-size=768,1024
npx playwright screenshot http://localhost:3000 /tmp/mobile.png --viewport-size=375,812

Claude can then analyze all three screenshots and identify responsive layout issues:

Elements that overflow on mobile
Text that is too small on narrow viewports
Navigation that does not collapse correctly
Images that do not resize proportionally

For more on browser automation with Claude Code, see our guide on browser-based terminals for AI agents.

Visual Regression Detection

Compare screenshots across code changes to detect unintended visual regressions.

// scripts/visual-diff.ts
import { PNG } from 'pngjs'
import pixelmatch from 'pixelmatch'

async function compareScreenshots(
  beforePath: string, 
  afterPath: string
): Promise<{ diffPercentage: number; diffPath: string }> {
  const before = PNG.sync.read(await readFile(beforePath))
  const after = PNG.sync.read(await readFile(afterPath))
  
  const diff = new PNG({ width: before.width, height: before.height })
  
  const mismatchedPixels = pixelmatch(
    before.data, after.data, diff.data,
    before.width, before.height,
    { threshold: 0.1 }
  )
  
  const totalPixels = before.width * before.height
  const diffPercentage = (mismatchedPixels / totalPixels) * 100
  
  const diffPath = '/tmp/visual-diff.png'
  await writeFile(diffPath, PNG.sync.write(diff))
  
  return { diffPercentage, diffPath }
}

If the diff exceeds a threshold (say, 5%), Claude can investigate what changed and whether it was intentional.

Integration With the Skill Ecosystem

Screenshot skills compose well with other skills in the marketplace. Combine them with:

Build watchers from our build watcher guide to automatically screenshot after build errors are fixed
Testing skills from the testing guide to add visual verification to test suites
Browser terminal skills for interactive testing scenarios

FAQ

Can Claude really understand screenshots?

Yes. Claude is a multimodal model that can analyze images. It can identify UI elements, read text in screenshots, detect layout issues, and describe visual relationships between elements. The analysis is not pixel-perfect but is sufficient for most UI development tasks.

What resolution should screenshots be?

Use your display's native resolution. For comparison consistency, always capture at the same resolution. Retina/HiDPI screenshots are larger files but provide better detail for Claude's analysis.

Does this work on Linux and Windows?

The screencapture command is macOS-specific. On Linux, use scrot or gnome-screenshot. On Windows, use Nircmd or PowerShell's screen capture. The Playwright-based URL capture works on all platforms.

How large are screenshot files?

A full-screen PNG at 1920x1080 is about 1-3MB. For AI analysis, you can downsample to 50% resolution without losing meaningful detail. JPEG at 80% quality is a good compromise between file size and quality.

Can I use this for automated visual testing in CI?

The Playwright-based capture (not the macOS screencapture) works in CI environments. Combine it with the visual diff utility for automated visual regression testing in your pipeline. See installing your first skill for setup guidance.

Explore production-ready AI skills at aiskill.market/browse or submit your own skill to the marketplace.

Sources

Playwright Screenshots - Headless browser screenshot documentation
Claude Vision - Multimodal image understanding capabilities
Pixelmatch - Pixel-level image comparison library