x-ray
Web scraper
A flexible web scraping framework for extracting data from websites with customizable selectors and pagination support.
The next web scraper. See through the noise.
6k stars
110 watching
349 forks
Language: JavaScript
last commit: about 1 month ago
Linked from 3 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
ruipgil/scraperjs | A versatile web scraping module with two scrapers for static and dynamic content extraction. | 3,714 |
apify/crawlee | A tool for building reliable web scraping and browser automation pipelines in Node.js. | 16,081 |
ionicabizau/scrape-it | A Node.js library and CLI tool for automating web page scraping and parsing | 4,024 |
spatie/crawler | A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently. | 2,552 |
yujiosaka/headless-chrome-crawler | A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites | 5,534 |
s0md3v/photon | A fast and flexible web crawler designed to gather information from the internet | 11,122 |
unclecode/crawl4ai | A web crawling tool designed to extract structured data from the web for use in AI applications | 18,541 |
benibela/xidel | A tool to extract data from web pages using various query languages and selectors. | 690 |
rchipka/node-osmosis | A fast and flexible web scraping library using native libxml C bindings | 4,115 |
bda-research/node-crawler | A NodeJS-based web crawler and spider that extracts data from websites. | 6,718 |
gocolly/colly | A framework for extracting structured data from websites in a fast and elegant way | 23,444 |
justanotherarchivist/snscrape | A Python-based social media scraper that extracts data from various platforms. | 4,557 |
feng19/spider_man | A high-level web crawling and scraping framework for Elixir. | 23 |
philipjkim/goreadability | Extracts readable content from web pages using Open Graph and traditional readability rules. | 69 |
samuelclay/newsblur | A personal news reader application utilizing multiple technologies to fetch, parse, and store news articles. | 6,937 |