crawler

Crawler

A high-performance web crawling and scraping solution with customizable settings and worker pooling.

A high performance web crawler / scraper in Elixir.

GitHub

945 stars

32 watching

91 forks

Language: Elixir

last commit: about 2 years ago

Linked from 1 awesome list

crawlerelixirfilesofflinescraperscraper-enginespider

Backlinks from these awesome lists:

h4cc/awesome-elixir

Related projects:

Repository	Description	Stars
elixir-crawly/crawly	A framework for extracting structured data from websites	994
feng19/spider_man	A high-level web crawling and scraping framework for Elixir.	23
fmpwizard/owlcrawler	A distributed web crawler that coordinates crawling tasks across multiple worker processes using a message bus.	55
spider-rs/spider	A tool for web data extraction and processing using Rust	1,234
webrecorder/browsertrix-crawler	A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner.	677
hu17889/go_spider	A modular, concurrent web crawler framework written in Go.	1,827
vida-nyu/ache	A web crawler designed to efficiently collect and prioritize relevant content from the web	459
turnersoftware/infinitycrawler	A web crawling library for .NET that allows customizable crawling and throttling of websites.	248
elliotgao2/gain	A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites.	2,037
puerkitobio/gocrawl	A concurrent web crawler written in Go that allows flexible and polite crawling of websites.	2,036
chenjiandongx/github-spider	A Python-based web crawler for scraping Github user and repository data.	264
a11ywatch/crawler	Performs web page crawling at high performance.	51
puerkitobio/fetchbot	A flexible web crawler that follows robots.txt policies and crawl delays.	787
antchfx/antch	A framework for building fast and efficient web crawlers and scrapers in Go.	261
helgeho/web2warc	A Web crawler that creates custom archives in WARC/CDX format	25