grim

Page extractor

A tool for extracting pages from PDFs and converting them to images and text strings.

Tool for extracting pages from pdf as images and text as strings.

GitHub

216 stars
7 watching
51 forks
Language: Ruby
last commit: about 1 year ago
Linked from 2 awesome lists

ghostscriptimagemagickpdfruby

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
philipjkim/goreadability Extracts readable content from web pages using Open Graph and traditional readability rules. 69
steelthread/mimeograph A CoffeeScript library for extracting text from PDFs and creating searchable files 28
docraptor/docraptor-ruby A Ruby client library for converting HTML to PDF using the DocRaptor API. 33
yomurb/yomu A Ruby library for extracting text and metadata from various file formats. 499
gunnarmorling/quarkus-pdf-extract A Quarkus-based microservice to extract text from PDF files 24
cantino/ruby-readability A tool for extracting readable content from web pages written in Ruby. 925
003random/getjs A tool to extract JavaScript sources from URLs and web pages efficiently 708
plainas/tq Tool that extracts content from HTML documents based on CSS selectors 236
lucianopereira86/quasar-nodejs-google-vision A tool that extracts text from images using Google Vision API and NodeJS 18
leofcardoso/pdf2pdfocr A tool to extract text from PDFs and add a searchable layer to them 274
robotools/extractor A tool for extracting data from font binaries into UFO objects. 52
xyntopia/pydoxtools A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines. 77
aymericbeaumet/squeeze A tool to extract relevant information from text 17
gettalong/hexapdf A versatile Ruby library for creating and manipulating PDF files with advanced features such as layout, encryption, and image embedding. 1,247
eset-la/lord-of-the-strings A tool to extract and classify relevant strings from binary files 9