mimeograph
PDF extractor
A CoffeeScript library for extracting text from PDFs and creating searchable files
CoffeeScript lib for PDF OCR and text extraction
28 stars
5 watching
2 forks
Language: CoffeeScript
last commit: about 12 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
leofcardoso/pdf2pdfocr | A tool to extract text from PDFs and add a searchable layer to them | 274 |
jonmagic/grim | A tool for extracting pages from PDFs and converting them to images and text strings. | 216 |
uglytoad/pdfpig | A C# library for extracting and analyzing text from PDF files | 1,733 |
ckorzen/pdf-text-extraction-benchmark | Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles | 65 |
aeksco/aws-pdf-textract-pipeline | A data pipeline for extracting structured data from PDFs using AWS Textract and cloud-based services | 164 |
gunnarmorling/quarkus-pdf-extract | A Quarkus-based microservice to extract text from PDF files | 24 |
michaelrsweet/pdfio | A C library that provides read and write access to PDF files. | 198 |
aymericbeaumet/squeeze | A tool to extract relevant information from text | 17 |
malfrats/xeuledoc | A tool to fetch information about public Google documents from various services | 846 |
mihaelisaev/wkhtmltopdf | A Swift library for generating PDF files from templates and web pages using wkhtmltopdf | 38 |
j-f-liu/lopdf | A Rust library for working with PDF documents | 1,653 |
unidoc/unidoc | A Go library for extracting text from PDF files, particularly invoices. | 708 |
sowcow/blank_slate_pdf | A Rust-based PDF generator for creating customizable, structured PDFs with experimental features. | 18 |
philipjkim/goreadability | Extracts readable content from web pages using Open Graph and traditional readability rules. | 69 |
bikash/documentunderstanding | Research and development of tools and techniques for extracting information from images and PDFs using deep learning and graph neural networks. | 96 |