upton

Web scraper library

A web scraping framework that simplifies the process by handling repetitive tasks and provides options for efficient data retrieval

A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)

GitHub

2k stars
79 watching
112 forks
Language: HTML
last commit: almost 6 years ago
Linked from 3 awesome lists


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
miyagawa/web-scraper A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface. 104
slotix/dataflowkit A framework for extracting structured data from web pages using CSS selectors. 662
fimad/scalpel A web scraping library providing a declarative interface on top of an HTML parsing library to extract data from HTML pages 323
spekulatius/phpscraper A web scraping utility for PHP that simplifies the process of extracting information from websites. 536
oscarotero/embed A PHP library to extract metadata and embeddable code from any web page using various protocols and scraping techniques. 2,091
felipecsl/wombat A Ruby-based web crawler and data extraction tool with an elegant DSL. 1,315
the-markup/blacklight-collector A tool for scraping website content and analyzing browser behavior 202
benibela/xidel A tool to extract data from web pages using various query languages and selectors. 681
zhuyingda/webster A framework for automating web scraping and crawling tasks using Node.js 515
rust-scraper/scraper A Rust library for parsing and querying HTML documents using CSS selectors. 1,937
tidyverse/rvest A package for extracting data from web pages using HTML parsing and CSS/XPath selectors. 1,492
joseconstela/webparsy A Node.js library and CLI for scraping websites using Puppeteer and YAML definitions 44
jakopako/goskyr A tool to simplify web scraping of list-like structured data from web pages 35
yhat/scrape A collection of utility functions and tools to simplify web scraping in Go. 1,513
archiveteam/wpull Downloads and crawls web pages, allowing for the archiving of websites. 556