PDF Processing Skills Compared: Which One Should You Use?

PDFs are everywhere. Contracts, research papers, financial statements, technical documentation, invoices, reports. They're also notoriously difficult to work with programmatically. The PDF format prioritizes precise visual rendering over data extraction, which makes extracting meaningful information from PDFs one of the more challenging document processing tasks.

The Claude Code ecosystem has responded with multiple PDF processing skills, each taking a different approach to this challenge. In this comparison, we'll examine the major PDF skills available, understand their architectural differences, and help you choose the right one for your specific needs.

The PDF Processing Landscape

PDF processing falls into several categories, and different skills excel at different types:

Text Extraction: Pulling plain text from PDFs. Simple in theory, complicated by multi-column layouts, headers/footers, and embedded fonts.

Structured Data Extraction: Extracting tables, forms, and structured content. Requires understanding document layout, not just text.

Document Analysis: Understanding what a document is about, summarizing content, answering questions about the document.

PDF Generation: Creating new PDFs or modifying existing ones. A different problem entirely from extraction.

Most Claude Code PDF skills focus on extraction and analysis, as these integrate naturally with the AI assistance model.

Skill 1: pdf-reader (Basic Extraction)

The simplest approach to PDF processing uses readily available command-line tools orchestrated by Claude Code.

How It Works

# PDF Reader

Extract text content from PDF files for analysis.

## Requirements
- pdftotext (from poppler-utils)
- macOS: `brew install poppler`
- Ubuntu: `apt install poppler-utils`

## Process

1. Run `pdftotext input.pdf -layout output.txt`
2. Read the extracted text file
3. Analyze content as needed

## Options
- `-layout`: Preserve original layout
- `-raw`: Extract in reading order
- `-table`: Optimize for tabular data

Strengths

Zero dependencies beyond system tools: Uses poppler, which is available on all major platforms
Fast: Native tools are highly optimized
Reliable: Decades of development in pdftotext

Limitations

Layout destruction: Complex layouts often become garbled
No intelligence: Just extraction, no understanding
Tables: Tabular data rarely survives extraction intact
Scanned PDFs: Requires OCR preprocessing

Best For

Simple PDFs with flowing text: articles, books, plain documentation. Not suitable for structured documents, forms, or complex layouts.

Example Usage

# User: "Extract text from this research paper"
pdftotext paper.pdf -layout -
# Claude Code reads and analyzes the extracted text

Skill 2: pdf-extractor-pro (Layout-Aware)

This skill takes a more sophisticated approach by using layout analysis libraries.

How It Works

# PDF Extractor Pro

Intelligent PDF extraction preserving document structure.

## Requirements
- Python 3.8+
- pdfplumber (`pip install pdfplumber`)

## Process

1. Open PDF with pdfplumber
2. For each page:
   - Detect tables
   - Extract tables as structured data
   - Extract remaining text
3. Combine into structured output

## Output Format

Structured JSON:
{
  "pages": [
    {
      "number": 1,
      "tables": [...],
      "text": "...",
      "metadata": {...}
    }
  ]
}

Implementation

The skill typically wraps a Python script:

import pdfplumber
import json

def extract_pdf(path):
    with pdfplumber.open(path) as pdf:
        result = {"pages": []}
        for i, page in enumerate(pdf.pages):
            page_data = {
                "number": i + 1,
                "tables": [],
                "text": page.extract_text() or ""
            }
            for table in page.extract_tables():
                page_data["tables"].append(table)
            result["pages"].append(page_data)
    return json.dumps(result, indent=2)

Strengths

Table detection: Reliably identifies and extracts tabular data
Structure preservation: Maintains relationship between elements
Metadata access: Can read PDF metadata, annotations

Limitations

Python dependency: Requires Python environment setup
Performance: Slower than native tools
Complex layouts: Still struggles with multi-column, nested structures

Best For

Business documents with tables: financial statements, invoices, reports with structured data. Also good for forms with clear field boundaries.

Example Usage

# User: "Extract the revenue table from this quarterly report"
# Claude Code uses pdfplumber to identify and extract just the table

Skill 3: pdf-vision-analyzer (AI-Powered)

This skill takes a fundamentally different approach: convert pages to images and use vision models for understanding.

How It Works

# PDF Vision Analyzer

Use vision AI to understand PDF content.

## Requirements
- pdf2image (`pip install pdf2image`)
- Vision model access (Claude, GPT-4V, etc.)

## Process

1. Convert PDF pages to images
2. For each page image:
   - Send to vision model
   - Request specific extraction/analysis
3. Combine vision model outputs

## Modes

### Extraction Mode
"Extract all text from this document page"

### Analysis Mode
"Summarize the key points on this page"

### Structured Extraction
"Extract the table on this page as JSON"

### Q&A Mode
"Based on this document, what is the total revenue?"

Implementation Pattern

from pdf2image import convert_from_path
import anthropic

def analyze_pdf(path, prompt):
    images = convert_from_path(path, dpi=150)
    client = anthropic.Anthropic()

    results = []
    for i, img in enumerate(images):
        # Convert to base64 and send to vision model
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {"type": "base64", ...}},
                    {"type": "text", "text": prompt}
                ]
            }]
        )
        results.append(response.content[0].text)
    return results

Strengths

Universal compatibility: Works with any PDF, including scanned documents
Semantic understanding: Can answer questions about content, not just extract text
Complex layouts: Vision models handle multi-column, mixed content well
Handwriting: Can often read handwritten notes on documents

Limitations

Cost: Vision API calls add up, especially for large documents
Speed: Significantly slower than text extraction
Accuracy: Vision models can hallucinate or misread text
Page limit: Large documents become expensive and slow

Best For

Complex documents requiring understanding, not just extraction. Legal contracts where you need to find specific clauses. Scanned documents. Mixed-media content with charts and diagrams.

Example Usage

# User: "What are the key terms in this contract?"
# Claude Code converts pages to images, sends to vision model with analysis prompt

Skill 4: pdf-mcp-server (Model Context Protocol)

This approach packages PDF capabilities as an MCP server, making them available as tools.

How It Works

{
  "mcpServers": {
    "pdf": {
      "command": "npx",
      "args": ["-y", "@mcp/pdf-server"]
    }
  }
}

The MCP server exposes tools:

pdf_extract_text(path, options)
pdf_extract_tables(path)
pdf_get_metadata(path)
pdf_search(path, query)

Strengths

Integration: Appears as native Claude Code tools
Composability: Tools can be combined in workflows
Abstraction: Hides implementation complexity
Multiple backends: Can switch backends without changing usage

Limitations

Setup complexity: Requires MCP server configuration
Debugging: Harder to troubleshoot than direct tool use
Customization: Less flexible than direct library access

Best For

Teams standardizing on PDF processing workflows. Situations where you want PDF capabilities "just there" without manual invocation.

Skill Comparison Matrix

Feature	pdf-reader	pdf-extractor-pro	pdf-vision-analyzer	pdf-mcp-server
Setup Complexity	Low	Medium	Medium	High
Speed	Fast	Medium	Slow	Varies
Cost	Free	Free	API costs	Varies
Table Extraction	Poor	Good	Good	Good
Scanned PDFs	No	No	Yes	Depends
Semantic Analysis	No	No	Yes	Depends
Layout Handling	Poor	Good	Excellent	Varies
Integration	CLI	Python	Python + API	MCP

Decision Framework

Use this flowchart to choose the right PDF skill:

Question 1: Is the PDF scanned (image-based) or contains handwriting?

Yes -> Use pdf-vision-analyzer (it's the only option that works)

No -> Continue to Question 2

Question 2: Do you need semantic understanding (summarization, Q&A)?

Yes -> Use pdf-vision-analyzer

No -> Continue to Question 3

Question 3: Does the PDF contain tables you need to extract?

Yes -> Use pdf-extractor-pro

No -> Continue to Question 4

Question 4: Is this for team-wide use or individual use?

Team-wide -> Consider pdf-mcp-server for standardization

Individual -> Continue to Question 5

Question 5: Is the PDF simple text (like an article or book)?

Yes -> Use pdf-reader (simplest solution that works)

No -> Use pdf-extractor-pro (handles more complex layouts)

Combining Skills

Often the best approach combines multiple skills:

Pattern: Fast First, Smart Fallback

## PDF Processing Workflow

1. Try pdf-reader for initial extraction
2. If output is garbled or contains tables:
   - Fall back to pdf-extractor-pro
3. If semantic understanding needed:
   - Send extracted text to Claude for analysis
4. If extraction fails entirely:
   - Use pdf-vision-analyzer as last resort

This pattern optimizes for speed and cost while maintaining capability for difficult cases.

Pattern: Hybrid Extraction

## Hybrid PDF Extraction

For large documents with mixed content:

1. Use pdf-extractor-pro for bulk text extraction
2. Identify pages with complex content (charts, diagrams)
3. Use pdf-vision-analyzer only on complex pages
4. Combine results

This reduces vision API costs while maintaining quality on difficult pages.

Practical Recommendations

For Invoice Processing

Use pdf-extractor-pro. Invoices have predictable structure (vendor, line items, totals) that table extraction handles well. Add a custom post-processing step to validate extracted amounts.

For Contract Analysis

Use pdf-vision-analyzer with focused prompts. Contracts require understanding context, not just extraction. Prompt the vision model for specific clauses rather than full extraction.

For Research Paper Ingestion

Use pdf-reader for initial extraction, then Claude for analysis. Research papers are mostly flowing text. Let Claude Code read the extracted text and summarize, cite, or answer questions.

For Form Processing

Use pdf-extractor-pro with custom table configuration. Forms are essentially tables. Configure pdfplumber for the specific form layout you're processing.

For Scanned Documents

Use pdf-vision-analyzer exclusively. It's the only option. Consider OCR preprocessing if you're processing many similar documents to reduce vision API costs.

Performance Considerations

Large Documents

For PDFs with 100+ pages:

pdf-reader: Seconds
pdf-extractor-pro: Minutes
pdf-vision-analyzer: Many minutes + significant cost

Consider processing large documents in chunks or extracting only needed sections.

Batch Processing

When processing many PDFs:

pdf-reader: Parallelize freely
pdf-extractor-pro: Python parallelization helps
pdf-vision-analyzer: Rate limits may apply

Consider queuing and rate limiting for vision-based processing.

Memory Usage

pdf-reader: Minimal
pdf-extractor-pro: Moderate (entire PDF in memory)
pdf-vision-analyzer: High (images in memory)

For memory-constrained environments, process pages individually.

Building Your Own PDF Skill

If none of the existing skills fit your needs, consider building custom. Key decisions:

1. Choose Your Extraction Layer

pdftotext: Fast, reliable, limited layout handling
pdfplumber: Python, good table extraction
PyMuPDF (fitz): Python, fast, full PDF access
pdf.js: JavaScript, browser-compatible

2. Design Your Output Format

Decide what format serves your downstream needs:

Raw text: For simple analysis
Structured JSON: For programmatic processing
Markdown: For human readability and LLM processing

3. Handle Edge Cases

Plan for:

Password-protected PDFs
Corrupted PDFs
Extremely large PDFs
PDFs with unusual encodings

4. Consider Caching

PDF extraction can be slow. Cache extracted content when the PDF hasn't changed.

Conclusion

PDF processing in Claude Code isn't a solved problem; it's a spectrum of tradeoffs. Simple extraction tools are fast but limited. Layout-aware libraries handle more cases but require more setup. Vision-based approaches handle everything but cost more and run slower.

The right choice depends on your specific documents and requirements:

Simple text PDFs: pdf-reader
Tables and structured data: pdf-extractor-pro
Scanned or complex layouts: pdf-vision-analyzer
Standardized team workflows: pdf-mcp-server

Most real-world solutions combine approaches, using fast extraction for most cases and falling back to sophisticated methods for difficult documents.

Start with the simplest approach that works for your most common cases. Add sophistication only where needed. The PDF skills ecosystem gives you options at every level of complexity.

Working with other document types? Check out our guides on Documentation Skills Roundup for generating docs, or explore Scientific Skills: Bioinformatics for research paper processing workflows.

PDF Processing Skills Compared: Which One Should You Use?

The PDF Processing Landscape

PDF processing falls into several categories, and different skills excel at different types:

Text Extraction: Pulling plain text from PDFs. Simple in theory, complicated by multi-column layouts, headers/footers, and embedded fonts.

Structured Data Extraction: Extracting tables, forms, and structured content. Requires understanding document layout, not just text.

Document Analysis: Understanding what a document is about, summarizing content, answering questions about the document.

PDF Generation: Creating new PDFs or modifying existing ones. A different problem entirely from extraction.

Most Claude Code PDF skills focus on extraction and analysis, as these integrate naturally with the AI assistance model.

Skill 1: pdf-reader (Basic Extraction)

The simplest approach to PDF processing uses readily available command-line tools orchestrated by Claude Code.

How It Works

# PDF Reader

Extract text content from PDF files for analysis.

## Requirements
- pdftotext (from poppler-utils)
- macOS: `brew install poppler`
- Ubuntu: `apt install poppler-utils`

## Process

1. Run `pdftotext input.pdf -layout output.txt`
2. Read the extracted text file
3. Analyze content as needed

## Options
- `-layout`: Preserve original layout
- `-raw`: Extract in reading order
- `-table`: Optimize for tabular data

Strengths

Zero dependencies beyond system tools: Uses poppler, which is available on all major platforms
Fast: Native tools are highly optimized
Reliable: Decades of development in pdftotext

Limitations

Layout destruction: Complex layouts often become garbled
No intelligence: Just extraction, no understanding
Tables: Tabular data rarely survives extraction intact
Scanned PDFs: Requires OCR preprocessing

Best For

Simple PDFs with flowing text: articles, books, plain documentation. Not suitable for structured documents, forms, or complex layouts.

Example Usage

# User: "Extract text from this research paper"
pdftotext paper.pdf -layout -
# Claude Code reads and analyzes the extracted text

Skill 2: pdf-extractor-pro (Layout-Aware)

This skill takes a more sophisticated approach by using layout analysis libraries.

How It Works

# PDF Extractor Pro

Intelligent PDF extraction preserving document structure.

## Requirements
- Python 3.8+
- pdfplumber (`pip install pdfplumber`)

## Process

1. Open PDF with pdfplumber
2. For each page:
   - Detect tables
   - Extract tables as structured data
   - Extract remaining text
3. Combine into structured output

## Output Format

Structured JSON:
{
  "pages": [
    {
      "number": 1,
      "tables": [...],
      "text": "...",
      "metadata": {...}
    }
  ]
}

Implementation

The skill typically wraps a Python script:

import pdfplumber
import json

def extract_pdf(path):
    with pdfplumber.open(path) as pdf:
        result = {"pages": []}
        for i, page in enumerate(pdf.pages):
            page_data = {
                "number": i + 1,
                "tables": [],
                "text": page.extract_text() or ""
            }
            for table in page.extract_tables():
                page_data["tables"].append(table)
            result["pages"].append(page_data)
    return json.dumps(result, indent=2)

Strengths

Table detection: Reliably identifies and extracts tabular data
Structure preservation: Maintains relationship between elements
Metadata access: Can read PDF metadata, annotations

Limitations

Python dependency: Requires Python environment setup
Performance: Slower than native tools
Complex layouts: Still struggles with multi-column, nested structures

Best For

Business documents with tables: financial statements, invoices, reports with structured data. Also good for forms with clear field boundaries.

Example Usage

# User: "Extract the revenue table from this quarterly report"
# Claude Code uses pdfplumber to identify and extract just the table

Skill 3: pdf-vision-analyzer (AI-Powered)

This skill takes a fundamentally different approach: convert pages to images and use vision models for understanding.

How It Works

# PDF Vision Analyzer

Use vision AI to understand PDF content.

## Requirements
- pdf2image (`pip install pdf2image`)
- Vision model access (Claude, GPT-4V, etc.)

## Process

1. Convert PDF pages to images
2. For each page image:
   - Send to vision model
   - Request specific extraction/analysis
3. Combine vision model outputs

## Modes

### Extraction Mode
"Extract all text from this document page"

### Analysis Mode
"Summarize the key points on this page"

### Structured Extraction
"Extract the table on this page as JSON"

### Q&A Mode
"Based on this document, what is the total revenue?"

Implementation Pattern

from pdf2image import convert_from_path
import anthropic

def analyze_pdf(path, prompt):
    images = convert_from_path(path, dpi=150)
    client = anthropic.Anthropic()

    results = []
    for i, img in enumerate(images):
        # Convert to base64 and send to vision model
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {"type": "base64", ...}},
                    {"type": "text", "text": prompt}
                ]
            }]
        )
        results.append(response.content[0].text)
    return results

Strengths

Universal compatibility: Works with any PDF, including scanned documents
Semantic understanding: Can answer questions about content, not just extract text
Complex layouts: Vision models handle multi-column, mixed content well
Handwriting: Can often read handwritten notes on documents

Limitations

Cost: Vision API calls add up, especially for large documents
Speed: Significantly slower than text extraction
Accuracy: Vision models can hallucinate or misread text
Page limit: Large documents become expensive and slow

Best For

Complex documents requiring understanding, not just extraction. Legal contracts where you need to find specific clauses. Scanned documents. Mixed-media content with charts and diagrams.

Example Usage

# User: "What are the key terms in this contract?"
# Claude Code converts pages to images, sends to vision model with analysis prompt

Skill 4: pdf-mcp-server (Model Context Protocol)

This approach packages PDF capabilities as an MCP server, making them available as tools.

How It Works

{
  "mcpServers": {
    "pdf": {
      "command": "npx",
      "args": ["-y", "@mcp/pdf-server"]
    }
  }
}

The MCP server exposes tools:

pdf_extract_text(path, options)
pdf_extract_tables(path)
pdf_get_metadata(path)
pdf_search(path, query)

Strengths

Integration: Appears as native Claude Code tools
Composability: Tools can be combined in workflows
Abstraction: Hides implementation complexity
Multiple backends: Can switch backends without changing usage

Limitations

Setup complexity: Requires MCP server configuration
Debugging: Harder to troubleshoot than direct tool use
Customization: Less flexible than direct library access

Best For

Teams standardizing on PDF processing workflows. Situations where you want PDF capabilities "just there" without manual invocation.

Skill Comparison Matrix

Feature	pdf-reader	pdf-extractor-pro	pdf-vision-analyzer	pdf-mcp-server
Setup Complexity	Low	Medium	Medium	High
Speed	Fast	Medium	Slow	Varies
Cost	Free	Free	API costs	Varies
Table Extraction	Poor	Good	Good	Good
Scanned PDFs	No	No	Yes	Depends
Semantic Analysis	No	No	Yes	Depends
Layout Handling	Poor	Good	Excellent	Varies
Integration	CLI	Python	Python + API	MCP

Decision Framework

Use this flowchart to choose the right PDF skill:

Question 1: Is the PDF scanned (image-based) or contains handwriting?

Yes -> Use pdf-vision-analyzer (it's the only option that works)

No -> Continue to Question 2

Question 2: Do you need semantic understanding (summarization, Q&A)?

Yes -> Use pdf-vision-analyzer

No -> Continue to Question 3

Question 3: Does the PDF contain tables you need to extract?

Yes -> Use pdf-extractor-pro

No -> Continue to Question 4

Question 4: Is this for team-wide use or individual use?

Team-wide -> Consider pdf-mcp-server for standardization

Individual -> Continue to Question 5

Question 5: Is the PDF simple text (like an article or book)?

Yes -> Use pdf-reader (simplest solution that works)

No -> Use pdf-extractor-pro (handles more complex layouts)

Combining Skills

Often the best approach combines multiple skills:

Pattern: Fast First, Smart Fallback

## PDF Processing Workflow

1. Try pdf-reader for initial extraction
2. If output is garbled or contains tables:
   - Fall back to pdf-extractor-pro
3. If semantic understanding needed:
   - Send extracted text to Claude for analysis
4. If extraction fails entirely:
   - Use pdf-vision-analyzer as last resort

This pattern optimizes for speed and cost while maintaining capability for difficult cases.

Pattern: Hybrid Extraction

## Hybrid PDF Extraction

For large documents with mixed content:

1. Use pdf-extractor-pro for bulk text extraction
2. Identify pages with complex content (charts, diagrams)
3. Use pdf-vision-analyzer only on complex pages
4. Combine results

This reduces vision API costs while maintaining quality on difficult pages.

Practical Recommendations

For Invoice Processing

Use pdf-extractor-pro. Invoices have predictable structure (vendor, line items, totals) that table extraction handles well. Add a custom post-processing step to validate extracted amounts.

For Contract Analysis

Use pdf-vision-analyzer with focused prompts. Contracts require understanding context, not just extraction. Prompt the vision model for specific clauses rather than full extraction.

For Research Paper Ingestion

Use pdf-reader for initial extraction, then Claude for analysis. Research papers are mostly flowing text. Let Claude Code read the extracted text and summarize, cite, or answer questions.

For Form Processing

Use pdf-extractor-pro with custom table configuration. Forms are essentially tables. Configure pdfplumber for the specific form layout you're processing.

For Scanned Documents

Use pdf-vision-analyzer exclusively. It's the only option. Consider OCR preprocessing if you're processing many similar documents to reduce vision API costs.

Performance Considerations

Large Documents

For PDFs with 100+ pages:

pdf-reader: Seconds
pdf-extractor-pro: Minutes
pdf-vision-analyzer: Many minutes + significant cost

Consider processing large documents in chunks or extracting only needed sections.

Batch Processing

When processing many PDFs:

pdf-reader: Parallelize freely
pdf-extractor-pro: Python parallelization helps
pdf-vision-analyzer: Rate limits may apply

Consider queuing and rate limiting for vision-based processing.

Memory Usage

pdf-reader: Minimal
pdf-extractor-pro: Moderate (entire PDF in memory)
pdf-vision-analyzer: High (images in memory)

For memory-constrained environments, process pages individually.

Building Your Own PDF Skill

If none of the existing skills fit your needs, consider building custom. Key decisions:

1. Choose Your Extraction Layer

pdftotext: Fast, reliable, limited layout handling
pdfplumber: Python, good table extraction
PyMuPDF (fitz): Python, fast, full PDF access
pdf.js: JavaScript, browser-compatible

2. Design Your Output Format

Decide what format serves your downstream needs:

Raw text: For simple analysis
Structured JSON: For programmatic processing
Markdown: For human readability and LLM processing

3. Handle Edge Cases

Plan for:

Password-protected PDFs
Corrupted PDFs
Extremely large PDFs
PDFs with unusual encodings

4. Consider Caching

PDF extraction can be slow. Cache extracted content when the PDF hasn't changed.

Conclusion

The right choice depends on your specific documents and requirements:

Simple text PDFs: pdf-reader
Tables and structured data: pdf-extractor-pro
Scanned or complex layouts: pdf-vision-analyzer
Standardized team workflows: pdf-mcp-server

Most real-world solutions combine approaches, using fast extraction for most cases and falling back to sophisticated methods for difficult documents.

Start with the simplest approach that works for your most common cases. Add sophistication only where needed. The PDF skills ecosystem gives you options at every level of complexity.

Working with other document types? Check out our guides on Documentation Skills Roundup for generating docs, or explore Scientific Skills: Bioinformatics for research paper processing workflows.