OCRmyPDF
PDF OCR tool
A tool that adds OCR text to scanned PDF files, allowing them to be searchable and copy-pasted.
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
14k stars
137 watching
1k forks
Language: Python
last commit: 5 days ago
Linked from 4 awesome lists
image-processingocrpdfpythontesseract
Related projects:
Repository | Description | Stars |
---|---|---|
py-pdf/pypdf | A Python library for manipulating and extracting data from PDF files | 8,524 |
mindee/doctr | A deep learning-based OCR library that enables efficient text parsing and recognition from documents | 4,011 |
jsvine/pdfplumber | A tool for extracting detailed information from PDFs | 6,898 |
ocropus-archive/dup-ocropy | A collection of tools for document analysis and OCR. | 3,426 |
pdfcpu/pdfcpu | A Go-based PDF processing library with both API and CLI support for various operations on PDF files | 7,091 |
librepdf/openpdf | A Java library for creating and editing PDF files with advanced features and dual licensing options. | 3,645 |
questpdf/questpdf | A modern C# library for generating PDF documents with a concise and discoverable API. | 12,169 |
johnwhitington/camlpdf | A PDF file format and manipulation library written in OCaml | 202 |
pdfarranger/pdfarranger | An application that allows users to manipulate PDF documents by merging/splitting and rearranging pages. | 3,653 |
vikparuchuri/marker | Converts PDF documents to text formats with high accuracy and support for various document types | 18,452 |
leofcardoso/pdf2pdfocr | A tool to extract text from PDFs and add a searchable layer to them | 279 |
tesseract-ocr/tesseract | An OCR engine capable of recognizing text in images from various languages and formats. | 63,142 |
jorisschellekens/borb | A Python library for creating and manipulating PDF documents in a JSON-like data structure. | 3,413 |
jbaiter/pdiiif | Library to create PDFs from IIIF manifests with client-side generation and server-based fallback for unsupported browsers. | 31 |
fmtlib/fmt | A formatting library providing a safe and fast alternative to C and C++ input/output functions | 20,980 |