PDF Text Extractor
Extract text from PDFs with OCR support. Perfect for digitizing documents, processing invoices, or analyzing content. Zero dependencies required.
Extract text from PDFs with OCR support. Perfect for digitizing documents, processing invoices, or analyzing content. Zero dependencies required.
Real data. Real impact.
Emerging
Developers
Per week
Open source
Skills give you superpowers. Install in 30 seconds.
Vernox Utility Skill - Perfect for document digitization.
PDF-Text-Extractor is a zero-dependency tool for extracting text content from PDF files. Supports both embedded text extraction (for text-based PDFs) and OCR (for scanned documents).
clawhub install pdf-text-extractor
const result = await extractText({ pdfPath: './document.pdf', options: { outputFormat: 'text', ocr: true, language: 'eng' } });console.log(result.text); console.log(); console.log(Pages: ${result.pages});Words: ${result.wordCount}
const results = await extractBatch({ pdfFiles: [ './document1.pdf', './document2.pdf', './document3.pdf' ], options: { outputFormat: 'json', ocr: true } });console.log();Extracted ${results.length} PDFs
const result = await extractText({ pdfPath: './scanned-document.pdf', options: { ocr: true, language: 'eng', ocrQuality: 'high' } });// OCR will be used (scanned document detected)
extractTextExtract text content from a single PDF file.
Parameters:
pdfPath (string, required): Path to PDF fileoptions (object, optional): Extraction options
outputFormat (string): 'text' | 'json' | 'markdown' | 'html'ocr (boolean): Enable OCR for scanned docslanguage (string): OCR language code ('eng', 'spa', 'fra', 'deu')preserveFormatting (boolean): Keep headings/structureminConfidence (number): Minimum OCR confidence score (0-100)Returns:
text (string): Extracted text contentpages (number): Number of pages processedwordCount (number): Total word countcharCount (number): Total character countlanguage (string): Detected languagemetadata (object): PDF metadata (title, author, creation date)method (string): 'text' or 'ocr' (extraction method)extractBatchExtract text from multiple PDF files at once.
Parameters:
pdfFiles (array, required): Array of PDF file pathsoptions (object, optional): Same as extractTextReturns:
results (array): Array of extraction resultstotalPages (number): Total pages across all PDFssuccessCount (number): Successfully extractedfailureCount (number): Failed extractionserrors (array): Error details for failurescountWordsCount words in extracted text.
Parameters:
text (string, required): Text to countoptions (object, optional):
minWordLength (number): Minimum characters per word (default: 3)excludeNumbers (boolean): Don't count numbers as wordscountByPage (boolean): Return word count per pageReturns:
wordCount (number): Total word countcharCount (number): Total character countpageCounts (array): Word count per pageaverageWordsPerPage (number): Average words per pagedetectLanguageDetect the language of extracted text.
Parameters:
text (string, required): Text to analyzeminConfidence (number): Minimum confidence for detectionReturns:
language (string): Detected language codelanguageName (string): Full language nameconfidence (number): Confidence score (0-100)config.json:{ "ocr": { "enabled": true, "defaultLanguage": "eng", "quality": "medium", "languages": ["eng", "spa", "fra", "deu"] }, "output": { "defaultFormat": "text", "preserveFormatting": true, "includeMetadata": true }, "batch": { "maxConcurrent": 3, "timeoutSeconds": 30 } }
const invoice = await extractText('./invoice.pdf'); console.log(invoice.text); // "INVOICE #12345 Date: 2026-02-04..."
const contract = await extractText('./scanned-contract.pdf', { ocr: true, language: 'eng', ocrQuality: 'high' }); console.log(contract.text); // "AGREEMENT This contract between..."
const docs = await extractBatch([ './doc1.pdf', './doc2.pdf', './doc3.pdf', './doc4.pdf' ]); console.log(`Processed ${docs.successCount}/${docs.results.length} documents`);
MIT
Extract text from PDFs. Fast, accurate, zero dependencies. 🔮
No automatic installation available. Please visit the source repository for installation instructions.
View Installation Instructions1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.