web-scraper
HTML scraper
A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface.
Perl web scraping toolkit
104 stars
11 watching
31 forks
Language: Perl
last commit: over 7 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
fimad/scalpel | A web scraping library providing a declarative interface on top of an HTML parsing library to extract data from HTML pages | 323 |
slotix/dataflowkit | A framework for extracting structured data from web pages using CSS selectors. | 662 |
scrapy/scrapely | A pure-python library for extracting structured data from HTML pages. | 1,863 |
benibela/xidel | A tool to extract data from web pages using various query languages and selectors. | 686 |
rust-scraper/scraper | A Rust library for parsing and querying HTML documents using CSS selectors. | 1,937 |
jakopako/goskyr | A tool to simplify web scraping of list-like structured data from web pages | 35 |
medialab/minet | A command line tool and Python library for extracting data from various web sources. | 286 |
propublica/upton | A web scraping framework that simplifies the process by handling repetitive tasks and provides options for efficient data retrieval | 1,613 |
ruippeixotog/scala-scraper | A Scala library that provides a domain-specific language (DSL) for parsing and extracting content from HTML pages. | 717 |
jjelosua/doga_scraper | A tool that extracts and converts Galician Official journal documents to different formats based on input year. | 0 |
the-markup/blacklight-collector | A tool for scraping website content and analyzing browser behavior | 202 |
felipecsl/wombat | A Ruby-based web crawler and data extraction tool with an elegant DSL. | 1,315 |
meilisearch/docs-scraper | Automates scraping and indexing of documentation content into a search engine | 290 |
spider-rs/spider | A web crawler and scraper built on top of Rust, designed to extract data from the web in a flexible and configurable manner. | 1,140 |
zhuyingda/webster | A framework for automating web scraping and crawling tasks using Node.js | 515 |