grim

Page extractor

A tool for extracting pages from PDFs and converting them to images and text strings.

Tool for extracting pages from pdf as images and text as strings.

GitHub

216 stars

7 watching

51 forks

Language: Ruby

last commit: almost 3 years ago

Linked from 2 awesome lists

ghostscriptimagemagickpdfruby

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
philipjkim/goreadability	Extracts readable content from web pages using Open Graph and traditional readability rules.	69
steelthread/mimeograph	A CoffeeScript library for extracting text from PDF files and creating searchable documents with OCR capabilities	28
docraptor/docraptor-ruby	A Ruby client library for converting HTML to PDF using the DocRaptor API.	33
yomurb/yomu	A Ruby library for extracting text and metadata from various file formats.	498
gunnarmorling/quarkus-pdf-extract	A Quarkus-based microservice to extract text from PDF files	24
cantino/ruby-readability	A Ruby port of a readability tool that extracts primary content from web pages.	927
003random/getjs	A tool to extract JavaScript sources from URLs and web pages efficiently	732
plainas/tq	Tool that extracts content from HTML documents based on CSS selectors	236
lucianopereira86/quasar-nodejs-google-vision	A tool that extracts text from images using Google Vision API and NodeJS	18
leofcardoso/pdf2pdfocr	A tool to extract text from PDFs and add a searchable layer to them	279
robotools/extractor	A tool for extracting data from font binaries into UFO objects.	53
xyntopia/pydoxtools	A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines.	78
aymericbeaumet/squeeze	A tool to extract relevant information from text	17
gettalong/hexapdf	A versatile Ruby library for creating and manipulating PDF files with advanced features such as layout, encryption, and image embedding.	1,253
eset-la/lord-of-the-strings	A tool to extract and classify relevant strings from binary files	9