PaddleOCR Document Parsing
Use this skill to extract structured Markdown/JSON from PDFs and document images—tables with cell-level precision, formulas as LaTeX, figures, seals, charts,...
Use this skill to extract structured Markdown/JSON from PDFs and document images—tables with cell-level precision, formulas as LaTeX, figures, seals, charts,...
Real data. Real impact.
Emerging
Developers
Per week
Open source
Skills give you superpowers. Install in 30 seconds.
Trigger keywords (routing): Bilingual trigger terms (Chinese and English) are listed in the YAML description above—use that field for discovery and routing.
Use this skill for:
Do not use for:
Scripts declare their dependencies inline (PEP 723). No separate install step is needed — uv resolves dependencies automatically:
uv run scripts/layout_caller.py --help
Working directory: All uv run scripts/... commands below should be run from this skill's root directory (the directory containing this SKILL.md file).
User provides URL: Use the --file-url parameter User provides local file path: Use the --file-path parameter
Or for local files: uv run scripts/layout_caller.py --file-path "file path" --pretty
Optional: explicitly set file type: uv run scripts/layout_caller.py --file-url "URL provided by user" --file-type 0 --pretty
--file-type 0: PDF --file-type 1: image If omitted, the type is auto-detected from the file extension. For local files, a recognized extension (.pdf, .png, .jpg, .jpeg, .bmp, .tiff, .tif, .webp) is required; otherwise pass --file-type explicitly. For URLs with unrecognized extensions, the service attempts inference.
Performance note: Parsing time scales with document complexity. Single-page images typically complete in 1-5 seconds; large PDFs (50+ pages) may take several minutes. Allow adequate time before assuming a timeout.
Default behavior: save raw JSON to a temp file:
If --output is omitted, the script saves automatically under the system temp directory
Default path pattern:
Check the ok field: true means success, false means error The output contains complete document data: text, tables, formulas (LaTeX), figures, seals, headers/footers, and reading order Use the appropriate field based on what the user needs:
text — full document text across all pages result.result.layoutParsingResults[n].markdown.text — page-level markdown result.result.layoutParsingResults[n].prunedResult — structured layout data with positions and confidence
Handle errors: If ok is false, display error.message
Display content based on what the user requested (see "Complete Output Display" below) If the content is empty, the document may contain no extractable text In save mode, always tell the user the saved file path and that full raw JSON is available there
Common next steps once you have the structured output:
Display the COMPLETE extracted content based on what the user asked for. The parsed output is only useful if the user receives all of it — truncation silently drops data.
Example - Correct:
User: "Extract all the text from this document" Agent: I've parsed the complete document. Here's all the extracted text: [Display entire text field or concatenated regions in reading order] Document Statistics: - Total regions: 25 - Text blocks: 15 - Tables: 3 - Formulas: 2 Quality: Excellent (confidence: 0.92)
Example - Incorrect:
User: "Extract all the text" Agent: "I found a document with multiple sections. Here's the beginning: 'Introduction...' (content truncated for brevity)"
The script returns an envelope with ok, text, result, and error. Use text for the full document content; navigate result.result.layoutParsingResults[n] for per-page structured data.
For the complete schema and field-level details, see references/output_schema.md.
Raw result location (default): the temp-file path printed by the script on stderr
Example 1: Extract Full Document Text
uv run scripts/layout_caller.py \ --file-url "https://example.com/paper.pdf" \ --pretty
Then use:
Example 2: Extract Structured Page Data
uv run scripts/layout_caller.py \ --file-path "./financial_report.pdf" \ --pretty
Then use:
Example 3: Print JSON to stdout (without saving to file)
uv run scripts/layout_caller.py \ --file-url "URL" \ --stdout \ --pretty
By default the script writes JSON to a temp file and prints the path to stderr. Add --stdout to print the full JSON directly to stdout instead. Use this when you need to inspect the result inline or pipe it to another tool.
When API is not configured, the script outputs:
{ "ok": false, "text": "", "result": null, "error": { "code": "CONFIG_ERROR", "message": "PADDLEOCR_DOC_PARSING_API_URL not configured. Get your API at: https://paddleocr.com" } }
Configuration workflow:
PADDLEOCR_DOC_PARSING_API_URL — full endpoint URL ending with /layout-parsing PADDLEOCR_ACCESS_TOKEN — 40-character alphanumeric string
Optionally configure PADDLEOCR_DOC_PARSING_TIMEOUT for request timeout. Recommend using the host application's standard configuration method rather than pasting credentials in chat.
User configured via the host UI: ask the user to confirm, then retry. User pastes credentials in chat: warn that they may be stored in conversation history, help the user persist them using the host's standard configuration method, then retry.
For PDFs, the maximum is 100 pages per request.
For large image files, compress before uploading — this reduces upload time and can improve processing stability:
uv run scripts/optimize_file.py input.png output.jpg --quality 85 uv run scripts/layout_caller.py --file-path "output.jpg" --pretty
--quality controls JPEG/WebP lossy compression (1-100, default 85); it has no effect on PNG output. Use --target-size (in MB, default 20) to set the max file size — the script iteratively downscales until the target is met.
For very large local files, prefer --file-url over --file-path to avoid base64 encoding overhead:
uv run scripts/layout_caller.py --file-url "https://your-server.com/large_file.pdf"
If you only need certain pages from a large PDF, extract them first:
# Extract pages 1-5 uv run scripts/split_pdf.py large.pdf pages_1_5.pdf --pages "1-5" # Mixed ranges are supported uv run scripts/split_pdf.py large.pdf selected_pages.pdf --pages "1-5,8,10-12" # Then process the smaller file uv run scripts/layout_caller.py --file-path "pages_1_5.pdf"
All errors return JSON with ok: false. Show the error message and stop — do not fall back to your own vision capabilities. Identify the issue from error.code and error.message:
Authentication failed (403) — error.message contains "Authentication failed"
Quota exceeded (429) — error.message contains "API rate limit exceeded"
Unsupported format — error.message contains "Unsupported file format"
No content detected:
If parsing quality is poor:
Note: Model version and capabilities are determined by your API endpoint (PADDLEOCR_DOC_PARSING_API_URL).
To verify the skill is working properly:
uv run scripts/smoke_test.py uv run scripts/smoke_test.py --skip-api-test uv run scripts/smoke_test.py --test-url "https://..."
The first form tests configuration and API connectivity. --skip-api-test checks configuration only. --test-url overrides the default sample document URL.
No automatic installation available. Please visit the source repository for installation instructions.
View Installation Instructions1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.