pdf2pdfocr
PDF extractor
A tool to extract text from PDFs and add a searchable layer to them
A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!
279 stars
12 watching
35 forks
Language: Python
last commit: 11 months ago
Linked from 1 awesome list
dockerocrpdfpdftkpythontesseract
Related projects:
Repository | Description | Stars |
---|---|---|
steelthread/mimeograph | A CoffeeScript library to extract text and create searchable PDF files using OCR when necessary. | 28 |
unidoc/unidoc | A Go library for extracting text from PDF files, particularly invoices. | 708 |
tabulapdf/tabula-java | Extracts tables from PDF files using Java | 1,859 |
uglytoad/pdfpig | A C# library for extracting and analyzing text from PDF files | 1,771 |
ckorzen/pdf-text-extraction-benchmark | Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles | 65 |
aeksco/aws-pdf-textract-pipeline | A data pipeline for extracting structured data from PDFs using AWS Textract and cloud-based services | 164 |
jesparza/peepdf | A Python tool for analyzing PDF files to identify potential security risks and malicious content. | 1,319 |
malfrats/xeuledoc | A tool to fetch information about public Google documents from various services | 856 |
docraptor/docraptor-ruby | A Ruby client library for converting HTML to PDF using the DocRaptor API. | 33 |
hiddenillusion/analyzepdf | A tool to analyze PDF files by examining their characteristics to determine if they are malicious or benign. | 178 |
pdf-archiver/pdf-archiver | A tool for digitizing and organizing paper documents by scanning and tagging files for easy searching. | 308 |
jonmagic/grim | A tool for extracting pages from PDFs and converting them to images and text strings. | 216 |
bikash/documentunderstanding | Research and development of tools and techniques for extracting information from images and PDFs using deep learning and graph neural networks. | 96 |
philsturgeon/codeigniter-unzip | A CodeIgniter extension that extracts ZIP files without requiring PECL extensions | 78 |
enferex/pdfresurrect | Analyzes and extracts previous versions of a PDF document to reconstruct its modification history | 81 |