wombat

Web scraper library

A Ruby-based web crawler and data extraction tool with an elegant DSL.

Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.

GitHub

1k stars
51 watching
129 forks
Language: Ruby
last commit: 10 months ago
Linked from 3 awesome lists

crawlerdslrubyscraper

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
bplawler/crawler A Scala-based DSL for programmatically accessing and interacting with web pages 148
benibela/xidel A tool to extract data from web pages using various query languages and selectors. 687
postmodern/spidr A Ruby web crawling library that provides flexible and customizable methods to crawl websites 808
jaimeiniesta/metainspector A Ruby gem for web scraping and extracting metadata from web pages. 1,037
ruippeixotog/scala-scraper A Scala library providing a DSL for loading and extracting content from HTML pages 717
archiveteam/wpull Downloads and crawls web pages, allowing for the archiving of websites. 557
miyagawa/web-scraper A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface. 104
joseconstela/webparsy A Node.js library and CLI for scraping websites using Puppeteer and YAML definitions 44
medialab/minet A command line tool and Python library for extracting data from various web sources. 289
oscarotero/embed A PHP library to extract metadata and embeddable code from any web page using various protocols and scraping techniques. 2,095
slotix/dataflowkit A framework for extracting structured data from web pages using CSS selectors. 662
spider-rs/spider A web crawler and scraper built on top of Rust, designed to extract data from the web in a flexible and configurable manner. 1,185
jjelosua/doga_scraper A tool that extracts and converts Galician Official journal documents to different formats based on input year. 0
s0rg/crawley A utility for systematically extracting URLs from web pages and printing them to the console. 265
fimad/scalpel A web scraping library providing a declarative interface on top of an HTML parsing library to extract data from HTML pages 323