pdf2pdfocr
PDF extractor
A tool to extract text from PDFs and add a searchable layer to them
A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!
279 stars
12 watching
35 forks
Language: Python
last commit: about 1 year ago
Linked from 1 awesome list
dockerocrpdfpdftkpythontesseract
Related projects:
Repository | Description | Stars |
---|---|---|
| A CoffeeScript library for extracting text from PDF files and creating searchable documents with OCR capabilities | 28 |
| A Go library for extracting text from PDF files, particularly invoices. | 708 |
| Extracts tables from PDF files using Java | 1,859 |
| A C# library for extracting and analyzing text from PDF files | 1,794 |
| Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles | 65 |
| A data pipeline for extracting structured data from PDFs using AWS Textract and cloud-based services | 164 |
| A Python tool for analyzing PDF files to identify potential security risks and malicious content. | 1,319 |
| A tool to fetch information about public Google documents from various services | 856 |
| A Ruby client library for converting HTML to PDF using the DocRaptor API. | 33 |
| A tool to analyze PDF files by examining their characteristics to determine if they are malicious or benign. | 178 |
| A tool for digitizing and organizing paper documents by scanning and tagging files for easy searching. | 308 |
| A tool for extracting pages from PDFs and converting them to images and text strings. | 216 |
| Research and development of tools and techniques for extracting information from images and PDFs using deep learning and graph neural networks. | 96 |
| A CodeIgniter extension that extracts ZIP files without requiring PECL extensions | 78 |
| Analyzes and extracts previous versions of a PDF document to reconstruct its modification history | 81 |