mimeograph

PDF extractor

A CoffeeScript library for extracting text from PDF files and creating searchable documents with OCR capabilities

CoffeeScript lib for PDF OCR and text extraction

GitHub

28 stars

5 watching

2 forks

Language: CoffeeScript

last commit: almost 14 years ago

Linked from 1 awesome list

Screenshot of steelThread/mimeograph website

steelthread.github.com/mimeograph/

Backlinks from these awesome lists:

uhub/awesome-coffeescript

Related projects:

Repository	Description	Stars
leofcardoso/pdf2pdfocr	A tool to extract text from PDFs and add a searchable layer to them	279
jonmagic/grim	A tool for extracting pages from PDFs and converting them to images and text strings.	216
uglytoad/pdfpig	A C# library for extracting and analyzing text from PDF files	1,794
ckorzen/pdf-text-extraction-benchmark	Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles	65
aeksco/aws-pdf-textract-pipeline	A data pipeline for extracting structured data from PDFs using AWS Textract and cloud-based services	164
gunnarmorling/quarkus-pdf-extract	A Quarkus-based microservice to extract text from PDF files	24
michaelrsweet/pdfio	A C library that provides read and write access to PDF files.	204
aymericbeaumet/squeeze	A tool to extract relevant information from text	17
malfrats/xeuledoc	A tool to fetch information about public Google documents from various services	856
mihaelisaev/wkhtmltopdf	A Swift library for generating PDF files from templates and web pages using wkhtmltopdf	38
j-f-liu/lopdf	A Rust library for working with PDF documents	1,680
unidoc/unidoc	A Go library for extracting text from PDF files, particularly invoices.	708
sowcow/blank_slate_pdf	A Rust-based framework for generating customizable PDFs with flexible layouts and content structures.	18
philipjkim/goreadability	Extracts readable content from web pages using Open Graph and traditional readability rules.	69
bikash/documentunderstanding	Research and development of tools and techniques for extracting information from images and PDFs using deep learning and graph neural networks.	96