mimeograph

PDF extractor

A CoffeeScript library for extracting text from PDFs and creating searchable files

CoffeeScript lib for PDF OCR and text extraction

GitHub

28 stars
5 watching
2 forks
Language: CoffeeScript
last commit: about 12 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
leofcardoso/pdf2pdfocr A tool to extract text from PDFs and add a searchable layer to them 274
jonmagic/grim A tool for extracting pages from PDFs and converting them to images and text strings. 216
uglytoad/pdfpig A C# library for extracting and analyzing text from PDF files 1,733
ckorzen/pdf-text-extraction-benchmark Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles 65
aeksco/aws-pdf-textract-pipeline A data pipeline for extracting structured data from PDFs using AWS Textract and cloud-based services 164
gunnarmorling/quarkus-pdf-extract A Quarkus-based microservice to extract text from PDF files 24
michaelrsweet/pdfio A C library that provides read and write access to PDF files. 198
aymericbeaumet/squeeze A tool to extract relevant information from text 17
malfrats/xeuledoc A tool to fetch information about public Google documents from various services 846
mihaelisaev/wkhtmltopdf A Swift library for generating PDF files from templates and web pages using wkhtmltopdf 38
j-f-liu/lopdf A Rust library for working with PDF documents 1,653
unidoc/unidoc A Go library for extracting text from PDF files, particularly invoices. 708
sowcow/blank_slate_pdf A Rust-based PDF generator for creating customizable, structured PDFs with experimental features. 18
philipjkim/goreadability Extracts readable content from web pages using Open Graph and traditional readability rules. 69
bikash/documentunderstanding Research and development of tools and techniques for extracting information from images and PDFs using deep learning and graph neural networks. 96