Links to PDFs
Scrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows
Scrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows
Real data. Real impact.
Emerging
Developers
Per week
Open source
Skills give you superpowers. Install in 30 seconds.
CLI tool that scrapes documents from various sources into local PDF files using browser automation.
npm install -g docs-scraper
Scrape any document URL to PDF:
docs-scraper scrape https://example.com/document
Returns local path:
~/.docs-scraper/output/1706123456-abc123.pdf
Scrape with daemon (recommended, keeps browser warm):
docs-scraper scrape <url>
Scrape with named profile (for authenticated sites):
docs-scraper scrape <url> -p <profile-name>
Scrape with pre-filled data (e.g., email for DocSend):
docs-scraper scrape <url> -D email=user@example.com
Direct mode (single-shot, no daemon):
docs-scraper scrape <url> --no-daemon
When a document requires authentication (login, email verification, passcode):
Initial scrape returns a job ID:
docs-scraper scrape https://docsend.com/view/xxx # Output: Scrape blocked # Job ID: abc123
Retry with data:
docs-scraper update abc123 -D email=user@example.com # or with password docs-scraper update abc123 -D email=user@example.com -D password=1234
Profiles store session cookies for authenticated sites.
docs-scraper profiles list # List saved profiles docs-scraper profiles clear # Clear all profiles docs-scraper scrape <url> -p myprofile # Use a profile
The daemon keeps browser instances warm for faster scraping.
docs-scraper daemon status # Check status docs-scraper daemon start # Start manually docs-scraper daemon stop # Stop daemon
Note: Daemon auto-starts when running scrape commands.
PDFs are stored in
~/.docs-scraper/output/. The daemon automatically cleans up files older than 1 hour.
Manual cleanup:
docs-scraper cleanup # Delete all PDFs docs-scraper cleanup --older-than 1h # Delete PDFs older than 1 hour
docs-scraper jobs list # List blocked jobs awaiting auth
Each scraper accepts specific
-D data fields. Use the appropriate fields based on the URL type.
Handles: URLs ending in
.pdf
Data fields: None (downloads directly)
Example:
docs-scraper scrape https://example.com/document.pdf
Handles:
docsend.com/view/*, docsend.com/v/*, and subdomains (e.g., org-a.docsend.com)
URL patterns:
https://docsend.com/view/{id} or https://docsend.com/v/{id}https://docsend.com/view/s/{id}https://{subdomain}.docsend.com/view/{id}Data fields:
| Field | Type | Description |
|---|---|---|
| Email address for document access | |
| password | Passcode/password for protected documents |
| text | Your name (required for NDA-gated documents) |
Examples:
# Pre-fill email for DocSend docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.comWith password protection
docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D password=secret123
With NDA name requirement
docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D name="John Doe"
Retry blocked job
docs-scraper update abc123 -D email=user@example.com -D password=secret123
Notes:
Handles:
notion.so/*, *.notion.site/*
Data fields:
| Field | Type | Description |
|---|---|---|
| Notion account email | |
| password | Notion account password |
Examples:
# Public page (no auth needed) docs-scraper scrape https://notion.so/Public-Page-abc123Private page with login
docs-scraper scrape https://notion.so/Private-Page-abc123
-D email=user@example.com -D password=mypasswordCustom domain
docs-scraper scrape https://docs.company.notion.site/Page-abc123
Notes:
Handles: Any URL not matched by other scrapers (automatic fallback)
Data fields: Dynamic - determined by Claude analyzing the page
The LLM scraper uses Claude to analyze the page HTML and detect:
Common dynamic fields:
| Field | Type | Description |
|---|---|---|
| Login email (if detected) | |
| password | Login password (if detected) |
| text | Username (if login uses username) |
Examples:
# Generic webpage (no auth) docs-scraper scrape https://example.com/articleWebpage requiring login
docs-scraper scrape https://members.example.com/article
-D email=user@example.com -D password=secretWhen blocked, check the job for required fields
docs-scraper jobs list
Then retry with the fields the scraper detected
docs-scraper update abc123 -D username=myuser -D password=secret
Notes:
ANTHROPIC_API_KEY environment variable| Scraper | password | name | Other | |
|---|---|---|---|---|
| DirectPdf | - | - | - | - |
| DocSend | ✓ | ✓ | ✓ | - |
| Notion | ✓ | ✓ | - | - |
| LLM Fallback | ✓* | ✓* | - | Dynamic* |
*Fields detected dynamically from page analysis
Only needed for LLM fallback scraper:
export ANTHROPIC_API_KEY=your_key
Optional browser settings:
export BROWSER_HEADLESS=true # Set false for debugging
Archive a Notion page:
docs-scraper scrape https://notion.so/My-Page-abc123
Download protected DocSend:
docs-scraper scrape https://docsend.com/view/xxx # If blocked: docs-scraper update <job-id> -D email=user@example.com -D password=1234
Batch scraping with profiles:
docs-scraper scrape https://site.com/doc1 -p mysite docs-scraper scrape https://site.com/doc2 -p mysite
Success: Local file path (e.g.,
~/.docs-scraper/output/1706123456-abc123.pdf)
Blocked: Job ID + required credential types
docs-scraper daemon stop && docs-scraper daemon startdocs-scraper jobs list to check pending jobsdocs-scraper cleanup to remove old PDFsNo automatic installation available. Please visit the source repository for installation instructions.
View Installation Instructions1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.