crawler
Crawler
A high-performance web crawling and scraping solution with customizable settings and worker pooling.
A high performance web crawler / scraper in Elixir.
945 stars
32 watching
91 forks
Language: Elixir
last commit: 5 months ago
Linked from 1 awesome list
crawlerelixirfilesofflinescraperscraper-enginespider
Related projects:
Repository | Description | Stars |
---|---|---|
elixir-crawly/crawly | A framework for extracting structured data from websites | 987 |
feng19/spider_man | A high-level web crawling and scraping framework for Elixir. | 23 |
fmpwizard/owlcrawler | A distributed web crawler that coordinates crawling tasks across multiple worker processes using a message bus. | 55 |
spider-rs/spider | A web crawler and scraper built on top of Rust, designed to extract data from the web in a flexible and configurable manner. | 1,140 |
webrecorder/browsertrix-crawler | A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner. | 652 |
hu17889/go_spider | A modular, concurrent web crawler framework written in Go. | 1,826 |
vida-nyu/ache | A web crawler designed to efficiently collect and prioritize relevant content from the web | 454 |
turnersoftware/infinitycrawler | A web crawling library for .NET that allows customizable crawling and throttling of websites. | 248 |
elliotgao2/gain | A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. | 2,035 |
puerkitobio/gocrawl | A concurrent web crawler written in Go that allows flexible and polite crawling of websites. | 2,038 |
chenjiandongx/github-spider | A Python-based web crawler for scraping Github user and repository data. | 264 |
a11ywatch/crawler | Performs web page crawling at high performance. | 49 |
puerkitobio/fetchbot | A flexible web crawler that follows robots.txt policies and crawl delays. | 786 |
antchfx/antch | A framework for building fast and efficient web crawlers and scrapers in Go. | 260 |
helgeho/web2warc | A Web crawler that creates custom archives in WARC/CDX format | 24 |