x-ray
Web scraper
A flexible web scraping framework for extracting data from websites with customizable selectors and pagination support.
The next web scraper. See through the noise.
6k stars
110 watching
348 forks
Language: JavaScript
last commit: 23 days ago
Linked from 3 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
ruipgil/scraperjs | A versatile web scraping module with two scrapers for static and dynamic content extraction. | 3,710 |
apify/crawlee | A tool for building reliable web scraping and browser automation pipelines in Node.js. | 15,604 |
ionicabizau/scrape-it | A Node.js library and CLI tool for automating web page scraping and parsing | 4,012 |
spatie/crawler | A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently. | 2,537 |
yujiosaka/headless-chrome-crawler | A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites | 5,527 |
s0md3v/photon | A fast and flexible web crawler designed to gather information from the internet | 11,067 |
unclecode/crawl4ai | A tool for web crawling and data extraction, designed to work with large language models. | 16,180 |
benibela/xidel | A tool to extract data from web pages using various query languages and selectors. | 681 |
rchipka/node-osmosis | A fast and flexible web scraping library using native libxml C bindings | 4,116 |
bda-research/node-crawler | A NodeJS-based web crawler and spider that extracts data from websites. | 6,704 |
gocolly/colly | A framework for extracting structured data from websites in a fast and elegant way | 23,317 |
justanotherarchivist/snscrape | A Python-based social media scraper that extracts data from various platforms. | 4,490 |
feng19/spider_man | A high-level web crawling and scraping framework for Elixir. | 23 |
philipjkim/goreadability | Extracts readable content from web pages using Open Graph and traditional readability rules. | 69 |
samuelclay/newsblur | A personal news reader application utilizing multiple technologies to fetch, parse, and store news articles. | 6,907 |