web-scraper

HTML scraper

A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface.

Perl web scraping toolkit

104 stars

11 watching

31 forks

Language: Perl

last commit: over 9 years ago

Linked from 1 awesome list

Screenshot of miyagawa/web-scraper website

search.cpan.org/dist/Web-Scraper

Backlinks from these awesome lists:

brucedone/awesome-crawler

Related projects:

Repository	Description	Stars
fimad/scalpel	A web scraping library providing a declarative interface on top of an HTML parsing library to extract data from HTML pages	325
slotix/dataflowkit	A framework for extracting structured data from web pages using CSS selectors.	667
scrapy/scrapely	A pure-python library for extracting structured data from HTML pages.	1,865
benibela/xidel	A tool to extract data from web pages using various query languages and selectors.	690
rust-scraper/scraper	A Rust library for parsing and querying HTML documents using CSS selectors.	1,961
jakopako/goskyr	A tool to simplify web scraping of list-like structured data from web pages	36
medialab/minet	A command line tool and Python library for extracting data from various web sources.	293
propublica/upton	A web scraping framework that simplifies the process by handling repetitive tasks and provides options for efficient data retrieval	1,612
ruippeixotog/scala-scraper	A Scala library providing a DSL for loading and extracting content from HTML pages	717
jjelosua/doga_scraper	A tool that extracts and converts Galician Official journal documents to different formats based on input year.	0
the-markup/blacklight-collector	A tool for scraping website content and analyzing browser behavior	205
felipecsl/wombat	A Ruby-based web crawler and data extraction tool with an elegant DSL.	1,315
meilisearch/docs-scraper	Automates scraping and indexing of documentation content into a search engine	297
spider-rs/spider	A tool for web data extraction and processing using Rust	1,234
zhuyingda/webster	A framework for automating web scraping and crawling tasks using Node.js	518