crawler

Crawler

A high-performance web crawling and scraping solution with customizable settings and worker pooling.

A high performance web crawler / scraper in Elixir.

GitHub

945 stars
32 watching
91 forks
Language: Elixir
last commit: 5 months ago
Linked from 1 awesome list

crawlerelixirfilesofflinescraperscraper-enginespider

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
elixir-crawly/crawly A framework for extracting structured data from websites 987
feng19/spider_man A high-level web crawling and scraping framework for Elixir. 23
fmpwizard/owlcrawler A distributed web crawler that coordinates crawling tasks across multiple worker processes using a message bus. 55
spider-rs/spider A web crawler and scraper built on top of Rust, designed to extract data from the web in a flexible and configurable manner. 1,140
webrecorder/browsertrix-crawler A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner. 652
hu17889/go_spider A modular, concurrent web crawler framework written in Go. 1,826
vida-nyu/ache A web crawler designed to efficiently collect and prioritize relevant content from the web 454
turnersoftware/infinitycrawler A web crawling library for .NET that allows customizable crawling and throttling of websites. 248
elliotgao2/gain A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. 2,035
puerkitobio/gocrawl A concurrent web crawler written in Go that allows flexible and polite crawling of websites. 2,038
chenjiandongx/github-spider A Python-based web crawler for scraping Github user and repository data. 264
a11ywatch/crawler Performs web page crawling at high performance. 49
puerkitobio/fetchbot A flexible web crawler that follows robots.txt policies and crawl delays. 786
antchfx/antch A framework for building fast and efficient web crawlers and scrapers in Go. 260
helgeho/web2warc A Web crawler that creates custom archives in WARC/CDX format 24