PaddleOCR Text Recognition
Use this skill whenever the user wants text extracted from images, photos, scans, screenshots, or scanned PDFs. Returns exact machine-readable strings with l...
Use this skill whenever the user wants text extracted from images, photos, scans, screenshots, or scanned PDFs. Returns exact machine-readable strings with l...
Real data. Real impact.
Emerging
Developers
Per week
Open source
Skills give you superpowers. Install in 30 seconds.
Trigger keywords (routing): Bilingual trigger terms (Chinese and English) are listed in the YAML
description above—use that field for discovery and routing.
Use this skill for:
Do not use for:
Scripts declare their dependencies inline (PEP 723). No separate install step is needed — uv resolves dependencies automatically:
uv run scripts/ocr_caller.py --help
Working directory: All
commands below should be run from this skill's root directory (the directory containing this SKILL.md file).uv run scripts/...
Identify the input source:
--file-url parameter--file-path parameterExecute OCR:
uv run scripts/ocr_caller.py --file-url "URL provided by user" --pretty
Or for local files:
uv run scripts/ocr_caller.py --file-path "file path" --pretty
Performance note: Parsing time scales with document complexity. Single-page images typically complete in 1-3 seconds; large PDFs (50+ pages) may take several minutes. Allow adequate time before assuming a timeout.
Default behavior: save raw JSON to a temp file:
--output is omitted, the script saves automatically under the system temp directory<system-temp>/paddleocr/text-recognition/results/result_<timestamp>_<id>.json--output is provided, it overrides the default temp-file destination--stdout is provided, JSON is printed to stdout and no file is savedResult saved to: /absolute/path/...--stdout only when you explicitly want to skip file persistenceParse JSON response:
ok field: true means success, false means errortext field contains all recognized text--stdout is used, parse the stdout JSON directlyok is false, display error.messagePresent results to user:
Common next steps once you have the recognized text:
text field to a .txt or .md filetext field is clean plain text, ready for downstream processingAlways display the COMPLETE recognized text to the user. The user typically needs the full content for downstream use — truncation silently loses data they may not notice is missing.
text field, no matter how longExample - Correct:
User: "Extract the text from this image" Agent: I've extracted the text from the image. Here's the complete content:[Display the entire text here]
Example - Incorrect:
User: "Extract the text from this image" Agent: I found some text in the image. Here's a preview: "The quick brown fox..." (truncated)
The script returns a JSON envelope with
ok, text, result, and error fields. Use text for the recognized content; result contains the raw API response for debugging.
For the full schema and field-level details, see
references/output_schema.md.
Raw result location (default): the temp-file path printed by the script on stderr
Example 1: URL OCR
uv run scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty
Example 2: Local File OCR
uv run scripts/ocr_caller.py --file-path "./document.pdf" --pretty
Example 3: OCR With Explicit File Type
uv run scripts/ocr_caller.py --file-url "https://example.com/input" --file-type 1 --pretty
--file-type 0: PDF--file-type 1: image.pdf, .png, .jpg, .jpeg, .bmp, .tiff, .tif, .webp) is required; otherwise pass --file-type explicitly. For URLs with unrecognized extensions, the service attempts inference.Example 4: Print JSON Without Saving
uv run scripts/ocr_caller.py --file-url "https://example.com/input" --stdout --pretty
When API is not configured, the script outputs:
{ "ok": false, "text": "", "result": null, "error": { "code": "CONFIG_ERROR", "message": "PADDLEOCR_OCR_API_URL not configured. Get your API at: https://paddleocr.com" } }
Configuration workflow:
Show the exact error message to the user.
Guide the user to obtain credentials: Visit the PaddleOCR website, click API, select the
PP-OCRv5 model, select the language, then copy the API_URL and Token. They map to these environment variables:
PADDLEOCR_OCR_API_URL — full endpoint URL ending with /ocrPADDLEOCR_ACCESS_TOKEN — 40-character alphanumeric stringOptionally configure
PADDLEOCR_OCR_TIMEOUT for request timeout. Recommend using the host application's standard configuration method rather than pasting credentials in chat.
Apply credentials — one of:
All errors return JSON with
ok: false. Show the error message and stop — do not fall back to your own vision capabilities. Identify the issue from error.code and error.message:
Authentication failed (403) —
error.message contains "Authentication failed"
Quota exceeded (429) —
error.message contains "API rate limit exceeded"
Unsupported format —
error.message contains "Unsupported file format"
No text detected:
text field is emptyIf recognition quality is poor:
result.result.ocrResults[n].prunedResult.rec_scores) shows per-line confidence scores — low values identify uncertain regions worth reviewingreferences/output_schema.md — Full output schema, field descriptions, and command examplesNote: Model version, capabilities, and supported file formats are determined by your API endpoint (
) and its official API documentation.PADDLEOCR_OCR_API_URL
To verify the skill is working properly:
uv run scripts/smoke_test.py uv run scripts/smoke_test.py --skip-api-test uv run scripts/smoke_test.py --test-url "https://..."
The first form tests configuration and API connectivity.
--skip-api-test checks configuration only. --test-url overrides the default sample image URL.No automatic installation available. Please visit the source repository for installation instructions.
View Installation Instructions1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.