xidel
Web scraper
A tool to extract data from web pages using various query languages and selectors.
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
686 stars
27 watching
42 forks
Language: Pascal
last commit: 7 months ago
Linked from 1 awesome list
clicommand-linecss-selectorcurldata-processingdatascrapinghtmlhttphttpiejsonrestscraperwebwebscraperwebscrapingwgetxmlxmlstarletxpathxquery
Related projects:
Repository | Description | Stars |
---|---|---|
felipecsl/wombat | A Ruby-based web crawler and data extraction tool with an elegant DSL. | 1,315 |
the-markup/blacklight-collector | A tool for scraping website content and analyzing browser behavior | 202 |
miyagawa/web-scraper | A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface. | 104 |
joseconstela/webparsy | A Node.js library and CLI for scraping websites using Puppeteer and YAML definitions | 44 |
spekulatius/phpscraper | A web scraping utility for PHP that simplifies the process of extracting information from websites. | 536 |
slotix/dataflowkit | A framework for extracting structured data from web pages using CSS selectors. | 662 |
medialab/minet | A command line tool and Python library for extracting data from various web sources. | 286 |
jaimeiniesta/metainspector | A Ruby gem for web scraping and extracting metadata from web pages. | 1,036 |
bplawler/crawler | A Scala-based DSL for programmatically accessing and interacting with web pages | 148 |
oscarotero/embed | A PHP library to extract metadata and embeddable code from any web page using various protocols and scraping techniques. | 2,091 |
zhuyingda/webster | A framework for automating web scraping and crawling tasks using Node.js | 515 |
spider-rs/spider | A web crawler and scraper built on top of Rust, designed to extract data from the web in a flexible and configurable manner. | 1,140 |
gushonorato/mechanize | A web scraping and automation tool for Elixir. | 30 |
meilisearch/docs-scraper | Automates scraping and indexing of documentation content into a search engine | 290 |
jakopako/goskyr | A tool to simplify web scraping of list-like structured data from web pages | 35 |