web-scraper
HTML scraper
A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface.
Perl web scraping toolkit
104 stars
11 watching
31 forks
Language: Perl
last commit: over 7 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
fimad/scalpel | A web scraping library providing a declarative interface on top of an HTML parsing library to extract data from HTML pages | 325 |
slotix/dataflowkit | A framework for extracting structured data from web pages using CSS selectors. | 667 |
scrapy/scrapely | A pure-python library for extracting structured data from HTML pages. | 1,865 |
benibela/xidel | A tool to extract data from web pages using various query languages and selectors. | 690 |
rust-scraper/scraper | A Rust library for parsing and querying HTML documents using CSS selectors. | 1,961 |
jakopako/goskyr | A tool to simplify web scraping of list-like structured data from web pages | 36 |
medialab/minet | A command line tool and Python library for extracting data from various web sources. | 293 |
propublica/upton | A web scraping framework that simplifies the process by handling repetitive tasks and provides options for efficient data retrieval | 1,612 |
ruippeixotog/scala-scraper | A Scala library providing a DSL for loading and extracting content from HTML pages | 717 |
jjelosua/doga_scraper | A tool that extracts and converts Galician Official journal documents to different formats based on input year. | 0 |
the-markup/blacklight-collector | A tool for scraping website content and analyzing browser behavior | 205 |
felipecsl/wombat | A Ruby-based web crawler and data extraction tool with an elegant DSL. | 1,315 |
meilisearch/docs-scraper | Automates scraping and indexing of documentation content into a search engine | 297 |
spider-rs/spider | A tool for web data extraction and processing using Rust | 1,234 |
zhuyingda/webster | A framework for automating web scraping and crawling tasks using Node.js | 518 |