web-scraper

HTML scraper

A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface.

Perl web scraping toolkit

GitHub

104 stars
11 watching
31 forks
Language: Perl
last commit: over 7 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
fimad/scalpel A web scraping library providing a declarative interface on top of an HTML parsing library to extract data from HTML pages 323
slotix/dataflowkit A framework for extracting structured data from web pages using CSS selectors. 662
scrapy/scrapely A pure-python library for extracting structured data from HTML pages. 1,863
benibela/xidel A tool to extract data from web pages using various query languages and selectors. 686
rust-scraper/scraper A Rust library for parsing and querying HTML documents using CSS selectors. 1,937
jakopako/goskyr A tool to simplify web scraping of list-like structured data from web pages 35
medialab/minet A command line tool and Python library for extracting data from various web sources. 286
propublica/upton A web scraping framework that simplifies the process by handling repetitive tasks and provides options for efficient data retrieval 1,613
ruippeixotog/scala-scraper A Scala library that provides a domain-specific language (DSL) for parsing and extracting content from HTML pages. 717
jjelosua/doga_scraper A tool that extracts and converts Galician Official journal documents to different formats based on input year. 0
the-markup/blacklight-collector A tool for scraping website content and analyzing browser behavior 202
felipecsl/wombat A Ruby-based web crawler and data extraction tool with an elegant DSL. 1,315
meilisearch/docs-scraper Automates scraping and indexing of documentation content into a search engine 290
spider-rs/spider A web crawler and scraper built on top of Rust, designed to extract data from the web in a flexible and configurable manner. 1,140
zhuyingda/webster A framework for automating web scraping and crawling tasks using Node.js 515