PDF Processing Skills Compared: Which One Should You Use?
Compare PDF processing skills for Claude Code. From simple extraction to complex document analysis, find the right PDF skill for your workflow needs.
PDF Processing Skills Compared: Which One Should You Use?
PDFs are everywhere. Contracts, research papers, financial statements, technical documentation, invoices, reports. They're also notoriously difficult to work with programmatically. The PDF format prioritizes precise visual rendering over data extraction, which makes extracting meaningful information from PDFs one of the more challenging document processing tasks.
The Claude Code ecosystem has responded with multiple PDF processing skills, each taking a different approach to this challenge. In this comparison, we'll examine the major PDF skills available, understand their architectural differences, and help you choose the right one for your specific needs.
The PDF Processing Landscape
PDF processing falls into several categories, and different skills excel at different types:
Text Extraction: Pulling plain text from PDFs. Simple in theory, complicated by multi-column layouts, headers/footers, and embedded fonts.
Structured Data Extraction: Extracting tables, forms, and structured content. Requires understanding document layout, not just text.
Document Analysis: Understanding what a document is about, summarizing content, answering questions about the document.
PDF Generation: Creating new PDFs or modifying existing ones. A different problem entirely from extraction.
Most Claude Code PDF skills focus on extraction and analysis, as these integrate naturally with the AI assistance model.
Skill 1: pdf-reader (Basic Extraction)
The simplest approach to PDF processing uses readily available command-line tools orchestrated by Claude Code.
How It Works
# PDF Reader
Extract text content from PDF files for analysis.
## Requirements
- pdftotext (from poppler-utils)
- macOS: `brew install poppler`
- Ubuntu: `apt install poppler-utils`
## Process
1. Run `pdftotext input.pdf -layout output.txt`
2. Read the extracted text file
3. Analyze content as needed
## Options
- `-layout`: Preserve original layout
- `-raw`: Extract in reading order
- `-table`: Optimize for tabular data
Strengths
- Zero dependencies beyond system tools: Uses poppler, which is available on all major platforms
- Fast: Native tools are highly optimized
- Reliable: Decades of development in pdftotext
Limitations
- Layout destruction: Complex layouts often become garbled
- No intelligence: Just extraction, no understanding
- Tables: Tabular data rarely survives extraction intact
- Scanned PDFs: Requires OCR preprocessing
Best For
Simple PDFs with flowing text: articles, books, plain documentation. Not suitable for structured documents, forms, or complex layouts.
Example Usage
# User: "Extract text from this research paper"
pdftotext paper.pdf -layout -
# Claude Code reads and analyzes the extracted text
Skill 2: pdf-extractor-pro (Layout-Aware)
This skill takes a more sophisticated approach by using layout analysis libraries.
How It Works
# PDF Extractor Pro
Intelligent PDF extraction preserving document structure.
## Requirements
- Python 3.8+
- pdfplumber (`pip install pdfplumber`)
## Process
1. Open PDF with pdfplumber
2. For each page:
- Detect tables
- Extract tables as structured data
- Extract remaining text
3. Combine into structured output
## Output Format
Structured JSON:
{
"pages": [
{
"number": 1,
"tables": [...],
"text": "...",
"metadata": {...}
}
]
}
Implementation
The skill typically wraps a Python script:
import pdfplumber
import json
def extract_pdf(path):
with pdfplumber.open(path) as pdf:
result = {"pages": []}
for i, page in enumerate(pdf.pages):
page_data = {
"number": i + 1,
"tables": [],
"text": page.extract_text() or ""
}
for table in page.extract_tables():
page_data["tables"].append(table)
result["pages"].append(page_data)
return json.dumps(result, indent=2)
Strengths
- Table detection: Reliably identifies and extracts tabular data
- Structure preservation: Maintains relationship between elements
- Metadata access: Can read PDF metadata, annotations
Limitations
- Python dependency: Requires Python environment setup
- Performance: Slower than native tools
- Complex layouts: Still struggles with multi-column, nested structures
Best For
Business documents with tables: financial statements, invoices, reports with structured data. Also good for forms with clear field boundaries.
Example Usage
# User: "Extract the revenue table from this quarterly report"
# Claude Code uses pdfplumber to identify and extract just the table
Skill 3: pdf-vision-analyzer (AI-Powered)
This skill takes a fundamentally different approach: convert pages to images and use vision models for understanding.
How It Works
# PDF Vision Analyzer
Use vision AI to understand PDF content.
## Requirements
- pdf2image (`pip install pdf2image`)
- Vision model access (Claude, GPT-4V, etc.)
## Process
1. Convert PDF pages to images
2. For each page image:
- Send to vision model
- Request specific extraction/analysis
3. Combine vision model outputs
## Modes
### Extraction Mode
"Extract all text from this document page"
### Analysis Mode
"Summarize the key points on this page"
### Structured Extraction
"Extract the table on this page as JSON"
### Q&A Mode
"Based on this document, what is the total revenue?"
Implementation Pattern
from pdf2image import convert_from_path
import anthropic
def analyze_pdf(path, prompt):
images = convert_from_path(path, dpi=150)
client = anthropic.Anthropic()
results = []
for i, img in enumerate(images):
# Convert to base64 and send to vision model
response = client.messages.create(
model="claude-sonnet-4-20250514",
messages=[{
"role": "user",
"content": [
{"type": "image", "source": {"type": "base64", ...}},
{"type": "text", "text": prompt}
]
}]
)
results.append(response.content[0].text)
return results
Strengths
- Universal compatibility: Works with any PDF, including scanned documents
- Semantic understanding: Can answer questions about content, not just extract text
- Complex layouts: Vision models handle multi-column, mixed content well
- Handwriting: Can often read handwritten notes on documents
Limitations
- Cost: Vision API calls add up, especially for large documents
- Speed: Significantly slower than text extraction
- Accuracy: Vision models can hallucinate or misread text
- Page limit: Large documents become expensive and slow
Best For
Complex documents requiring understanding, not just extraction. Legal contracts where you need to find specific clauses. Scanned documents. Mixed-media content with charts and diagrams.
Example Usage
# User: "What are the key terms in this contract?"
# Claude Code converts pages to images, sends to vision model with analysis prompt
Skill 4: pdf-mcp-server (Model Context Protocol)
This approach packages PDF capabilities as an MCP server, making them available as tools.
How It Works
{
"mcpServers": {
"pdf": {
"command": "npx",
"args": ["-y", "@mcp/pdf-server"]
}
}
}
The MCP server exposes tools:
pdf_extract_text(path, options)pdf_extract_tables(path)pdf_get_metadata(path)pdf_search(path, query)
Strengths
- Integration: Appears as native Claude Code tools
- Composability: Tools can be combined in workflows
- Abstraction: Hides implementation complexity
- Multiple backends: Can switch backends without changing usage
Limitations
- Setup complexity: Requires MCP server configuration
- Debugging: Harder to troubleshoot than direct tool use
- Customization: Less flexible than direct library access
Best For
Teams standardizing on PDF processing workflows. Situations where you want PDF capabilities "just there" without manual invocation.
Skill Comparison Matrix
| Feature | pdf-reader | pdf-extractor-pro | pdf-vision-analyzer | pdf-mcp-server |
|---|---|---|---|---|
| Setup Complexity | Low | Medium | Medium | High |
| Speed | Fast | Medium | Slow | Varies |
| Cost | Free | Free | API costs | Varies |
| Table Extraction | Poor | Good | Good | Good |
| Scanned PDFs | No | No | Yes | Depends |
| Semantic Analysis | No | No | Yes | Depends |
| Layout Handling | Poor | Good | Excellent | Varies |
| Integration | CLI | Python | Python + API | MCP |
Decision Framework
Use this flowchart to choose the right PDF skill:
Question 1: Is the PDF scanned (image-based) or contains handwriting?
Yes -> Use pdf-vision-analyzer (it's the only option that works)
No -> Continue to Question 2
Question 2: Do you need semantic understanding (summarization, Q&A)?
Yes -> Use pdf-vision-analyzer
No -> Continue to Question 3
Question 3: Does the PDF contain tables you need to extract?
Yes -> Use pdf-extractor-pro
No -> Continue to Question 4
Question 4: Is this for team-wide use or individual use?
Team-wide -> Consider pdf-mcp-server for standardization
Individual -> Continue to Question 5
Question 5: Is the PDF simple text (like an article or book)?
Yes -> Use pdf-reader (simplest solution that works)
No -> Use pdf-extractor-pro (handles more complex layouts)
Combining Skills
Often the best approach combines multiple skills:
Pattern: Fast First, Smart Fallback
## PDF Processing Workflow
1. Try pdf-reader for initial extraction
2. If output is garbled or contains tables:
- Fall back to pdf-extractor-pro
3. If semantic understanding needed:
- Send extracted text to Claude for analysis
4. If extraction fails entirely:
- Use pdf-vision-analyzer as last resort
This pattern optimizes for speed and cost while maintaining capability for difficult cases.
Pattern: Hybrid Extraction
## Hybrid PDF Extraction
For large documents with mixed content:
1. Use pdf-extractor-pro for bulk text extraction
2. Identify pages with complex content (charts, diagrams)
3. Use pdf-vision-analyzer only on complex pages
4. Combine results
This reduces vision API costs while maintaining quality on difficult pages.
Practical Recommendations
For Invoice Processing
Use pdf-extractor-pro. Invoices have predictable structure (vendor, line items, totals) that table extraction handles well. Add a custom post-processing step to validate extracted amounts.
For Contract Analysis
Use pdf-vision-analyzer with focused prompts. Contracts require understanding context, not just extraction. Prompt the vision model for specific clauses rather than full extraction.
For Research Paper Ingestion
Use pdf-reader for initial extraction, then Claude for analysis. Research papers are mostly flowing text. Let Claude Code read the extracted text and summarize, cite, or answer questions.
For Form Processing
Use pdf-extractor-pro with custom table configuration. Forms are essentially tables. Configure pdfplumber for the specific form layout you're processing.
For Scanned Documents
Use pdf-vision-analyzer exclusively. It's the only option. Consider OCR preprocessing if you're processing many similar documents to reduce vision API costs.
Performance Considerations
Large Documents
For PDFs with 100+ pages:
- pdf-reader: Seconds
- pdf-extractor-pro: Minutes
- pdf-vision-analyzer: Many minutes + significant cost
Consider processing large documents in chunks or extracting only needed sections.
Batch Processing
When processing many PDFs:
- pdf-reader: Parallelize freely
- pdf-extractor-pro: Python parallelization helps
- pdf-vision-analyzer: Rate limits may apply
Consider queuing and rate limiting for vision-based processing.
Memory Usage
- pdf-reader: Minimal
- pdf-extractor-pro: Moderate (entire PDF in memory)
- pdf-vision-analyzer: High (images in memory)
For memory-constrained environments, process pages individually.
Building Your Own PDF Skill
If none of the existing skills fit your needs, consider building custom. Key decisions:
1. Choose Your Extraction Layer
- pdftotext: Fast, reliable, limited layout handling
- pdfplumber: Python, good table extraction
- PyMuPDF (fitz): Python, fast, full PDF access
- pdf.js: JavaScript, browser-compatible
2. Design Your Output Format
Decide what format serves your downstream needs:
- Raw text: For simple analysis
- Structured JSON: For programmatic processing
- Markdown: For human readability and LLM processing
3. Handle Edge Cases
Plan for:
- Password-protected PDFs
- Corrupted PDFs
- Extremely large PDFs
- PDFs with unusual encodings
4. Consider Caching
PDF extraction can be slow. Cache extracted content when the PDF hasn't changed.
Conclusion
PDF processing in Claude Code isn't a solved problem; it's a spectrum of tradeoffs. Simple extraction tools are fast but limited. Layout-aware libraries handle more cases but require more setup. Vision-based approaches handle everything but cost more and run slower.
The right choice depends on your specific documents and requirements:
- Simple text PDFs: pdf-reader
- Tables and structured data: pdf-extractor-pro
- Scanned or complex layouts: pdf-vision-analyzer
- Standardized team workflows: pdf-mcp-server
Most real-world solutions combine approaches, using fast extraction for most cases and falling back to sophisticated methods for difficult documents.
Start with the simplest approach that works for your most common cases. Add sophistication only where needed. The PDF skills ecosystem gives you options at every level of complexity.
Working with other document types? Check out our guides on Documentation Skills Roundup for generating docs, or explore Scientific Skills: Bioinformatics for research paper processing workflows.