xidel

Web scraper

A tool to extract data from web pages using various query languages and selectors.

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

GitHub

686 stars
27 watching
42 forks
Language: Pascal
last commit: 7 months ago
Linked from 1 awesome list

clicommand-linecss-selectorcurldata-processingdatascrapinghtmlhttphttpiejsonrestscraperwebwebscraperwebscrapingwgetxmlxmlstarletxpathxquery

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
felipecsl/wombat A Ruby-based web crawler and data extraction tool with an elegant DSL. 1,315
the-markup/blacklight-collector A tool for scraping website content and analyzing browser behavior 202
miyagawa/web-scraper A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface. 104
joseconstela/webparsy A Node.js library and CLI for scraping websites using Puppeteer and YAML definitions 44
spekulatius/phpscraper A web scraping utility for PHP that simplifies the process of extracting information from websites. 536
slotix/dataflowkit A framework for extracting structured data from web pages using CSS selectors. 662
medialab/minet A command line tool and Python library for extracting data from various web sources. 286
jaimeiniesta/metainspector A Ruby gem for web scraping and extracting metadata from web pages. 1,036
bplawler/crawler A Scala-based DSL for programmatically accessing and interacting with web pages 148
oscarotero/embed A PHP library to extract metadata and embeddable code from any web page using various protocols and scraping techniques. 2,091
zhuyingda/webster A framework for automating web scraping and crawling tasks using Node.js 515
spider-rs/spider A web crawler and scraper built on top of Rust, designed to extract data from the web in a flexible and configurable manner. 1,140
gushonorato/mechanize A web scraping and automation tool for Elixir. 30
meilisearch/docs-scraper Automates scraping and indexing of documentation content into a search engine 290
jakopako/goskyr A tool to simplify web scraping of list-like structured data from web pages 35