OCRmyPDF

PDF OCR tool

A tool that adds OCR text to scanned PDF files, allowing them to be searchable and copy-pasted.

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

GitHub

14k stars
137 watching
1k forks
Language: Python
last commit: 5 days ago
Linked from 4 awesome lists

image-processingocrpdfpythontesseract

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
py-pdf/pypdf A Python library for manipulating and extracting data from PDF files 8,524
mindee/doctr A deep learning-based OCR library that enables efficient text parsing and recognition from documents 4,011
jsvine/pdfplumber A tool for extracting detailed information from PDFs 6,898
ocropus-archive/dup-ocropy A collection of tools for document analysis and OCR. 3,426
pdfcpu/pdfcpu A Go-based PDF processing library with both API and CLI support for various operations on PDF files 7,091
librepdf/openpdf A Java library for creating and editing PDF files with advanced features and dual licensing options. 3,645
questpdf/questpdf A modern C# library for generating PDF documents with a concise and discoverable API. 12,169
johnwhitington/camlpdf A PDF file format and manipulation library written in OCaml 202
pdfarranger/pdfarranger An application that allows users to manipulate PDF documents by merging/splitting and rearranging pages. 3,653
vikparuchuri/marker Converts PDF documents to text formats with high accuracy and support for various document types 18,452
leofcardoso/pdf2pdfocr A tool to extract text from PDFs and add a searchable layer to them 279
tesseract-ocr/tesseract An OCR engine capable of recognizing text in images from various languages and formats. 63,142
jorisschellekens/borb A Python library for creating and manipulating PDF documents in a JSON-like data structure. 3,413
jbaiter/pdiiif Library to create PDFs from IIIF manifests with client-side generation and server-based fallback for unsupported browsers. 31
fmtlib/fmt A formatting library providing a safe and fast alternative to C and C++ input/output functions 20,980