wombat
Web scraper library
A Ruby-based web crawler and data extraction tool with an elegant DSL.
Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.
1k stars
51 watching
129 forks
Language: Ruby
last commit: 10 months ago
Linked from 3 awesome lists
crawlerdslrubyscraper
Related projects:
Repository | Description | Stars |
---|---|---|
bplawler/crawler | A Scala-based DSL for programmatically accessing and interacting with web pages | 148 |
benibela/xidel | A tool to extract data from web pages using various query languages and selectors. | 687 |
postmodern/spidr | A Ruby web crawling library that provides flexible and customizable methods to crawl websites | 808 |
jaimeiniesta/metainspector | A Ruby gem for web scraping and extracting metadata from web pages. | 1,037 |
ruippeixotog/scala-scraper | A Scala library providing a DSL for loading and extracting content from HTML pages | 717 |
archiveteam/wpull | Downloads and crawls web pages, allowing for the archiving of websites. | 557 |
miyagawa/web-scraper | A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface. | 104 |
joseconstela/webparsy | A Node.js library and CLI for scraping websites using Puppeteer and YAML definitions | 44 |
medialab/minet | A command line tool and Python library for extracting data from various web sources. | 289 |
oscarotero/embed | A PHP library to extract metadata and embeddable code from any web page using various protocols and scraping techniques. | 2,095 |
slotix/dataflowkit | A framework for extracting structured data from web pages using CSS selectors. | 662 |
spider-rs/spider | A web crawler and scraper built on top of Rust, designed to extract data from the web in a flexible and configurable manner. | 1,185 |
jjelosua/doga_scraper | A tool that extracts and converts Galician Official journal documents to different formats based on input year. | 0 |
s0rg/crawley | A utility for systematically extracting URLs from web pages and printing them to the console. | 265 |
fimad/scalpel | A web scraping library providing a declarative interface on top of an HTML parsing library to extract data from HTML pages | 323 |