grim
Page extractor
A tool for extracting pages from PDFs and converting them to images and text strings.
Tool for extracting pages from pdf as images and text as strings.
216 stars
7 watching
51 forks
Language: Ruby
last commit: about 1 year ago
Linked from 2 awesome lists
ghostscriptimagemagickpdfruby
Related projects:
Repository | Description | Stars |
---|---|---|
philipjkim/goreadability | Extracts readable content from web pages using Open Graph and traditional readability rules. | 69 |
steelthread/mimeograph | A CoffeeScript library for extracting text from PDFs and creating searchable files | 28 |
docraptor/docraptor-ruby | A Ruby client library for converting HTML to PDF using the DocRaptor API. | 33 |
yomurb/yomu | A Ruby library for extracting text and metadata from various file formats. | 499 |
gunnarmorling/quarkus-pdf-extract | A Quarkus-based microservice to extract text from PDF files | 24 |
cantino/ruby-readability | A tool for extracting readable content from web pages written in Ruby. | 925 |
003random/getjs | A tool to extract JavaScript sources from URLs and web pages efficiently | 708 |
plainas/tq | Tool that extracts content from HTML documents based on CSS selectors | 236 |
lucianopereira86/quasar-nodejs-google-vision | A tool that extracts text from images using Google Vision API and NodeJS | 18 |
leofcardoso/pdf2pdfocr | A tool to extract text from PDFs and add a searchable layer to them | 274 |
robotools/extractor | A tool for extracting data from font binaries into UFO objects. | 52 |
xyntopia/pydoxtools | A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines. | 77 |
aymericbeaumet/squeeze | A tool to extract relevant information from text | 17 |
gettalong/hexapdf | A versatile Ruby library for creating and manipulating PDF files with advanced features such as layout, encryption, and image embedding. | 1,247 |
eset-la/lord-of-the-strings | A tool to extract and classify relevant strings from binary files | 9 |