yomu

File extractor

A Ruby library for extracting text and metadata from various file formats.

Read text and metadata from files and documents (.doc, .docx, .pages, .odt, .rtf, .pdf)

GitHub

499 stars
12 watching
125 forks
Language: Ruby
last commit: over 1 year ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
jonmagic/grim A tool for extracting pages from PDFs and converting them to images and text strings. 216
jimm/midilib A Ruby library for reading and writing MIDI file formats 181
exiftool-rb/exiftool.rb A Ruby library that wraps ExifTool to extract metadata from images and videos. 71
yohasebe/lemmatizer A Ruby library that provides a lemmatizer for text in English. 108
geemus/formatador A library for formatting text with various options and capabilities for displaying tables, progress bars, and other formatted output. 451
tmm1/emoji-extractor A Ruby script that extracts high-resolution emoji images from Apple's font files 558
cantino/ruby-readability A tool for extracting readable content from web pages written in Ruby. 925
recrm/archivetools A collection of tools for extracting and analyzing data from web archives 69
jkongie/mobi An Ruby Gem to extract metadata from MOBI files 38
rom-rb/rom-yaml Provides YAML-based data mapping and serialization support for Ruby objects 28
robotools/extractor A tool for extracting data from font binaries into UFO objects. 52
gunnarmorling/quarkus-pdf-extract A Quarkus-based microservice to extract text from PDF files 24
yoshoku/rumale A Ruby machine learning library providing interfaces to various algorithms 785
gomoob/php-metadata-extractor A PHP wrapper to call the Java metadata-extractor library. 9
coleifer/micawber A library for extracting metadata and content from URLs 636