PDF Processing Advanced Reference
This document contains advanced PDF processing features, detailed examples, and additional libraries not covered in the main skill instructions.
pypdfium2 Library (Apache/BSD License)
Overview
pypdfium2 is a Python binding for PDFium (Chromium's PDF library). It's excellent for fast PDF rendering, image generation, and serves as a PyMuPDF replacement.
Render PDF to Images
import pypdfium2 as pdfium
from PIL import Image
# Load PDF
pdf = pdfium.PdfDocument("document.pdf")
# Render page to image
page = pdf[0] # First page
bitmap = page.render(
scale=2.0, # Higher resolution
rotation=0 # No rotation
)
# Convert to PIL Image
img = bitmap.to_pil()
img.save("page_1.png", "PNG")
# Process multiple pages
for i, page in enumerate(pdf):
bitmap = page.render(scale=1.5)
img = bitmap.to_pil()
img.save(f"page_{i+1}.jpg", "JPEG", quality=90)
Extract Text with pypdfium2
import pypdfium2 as pdfium
## Examples
- Example usage 1
- Example usage 2