 ebot
 ebot 
 crawler
 An Erlang-based web crawler designed to be scalable and highly configurable
Ebot, an Opensource Web Crawler built on top of a nosql database (apache couchdb, riak), AMQP database (rabbitmq), webmachine and mochiweb. Ebot is written in Erlang and it is a very scalable, distribuited and highly configurable web cawler. See wiki pages for more details
330 stars
 27 watching
 55 forks
 
Language: Erlang 
last commit: over 14 years ago 
Linked from   1 awesome list  
 Related projects:
| Repository | Description | Stars | 
|---|---|---|
|  | A high-level web crawling and scraping framework for Elixir. | 23 | 
|  | A flexible web crawler that follows robots.txt policies and crawl delays. | 787 | 
|  | A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. | 380 | 
|  | A framework for extracting structured data from websites | 994 | 
|  | A high-performance web crawling and scraping solution with customizable settings and worker pooling. | 945 | 
|  | A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. | 2,037 | 
|  | A Ruby web crawling library that provides flexible and customizable methods to crawl websites | 809 | 
|  | An OSINT bot that crawls pastebin sites to search for sensitive data leaks | 634 | 
|  | A concurrent web crawler written in Go that allows flexible and polite crawling of websites. | 2,036 | 
|  | A modular, concurrent web crawler framework written in Go. | 1,827 | 
|  | A Ruby-based web crawler and data extraction tool with an elegant DSL. | 1,315 | 
|  | A distributed web crawler that coordinates crawling tasks across multiple worker processes using a message bus. | 55 | 
|  | A web crawling library for .NET that allows customizable crawling and throttling of websites. | 248 | 
|  | A web crawler designed to efficiently collect and prioritize relevant content from the web | 459 | 
|  | A Ruby-based tool for web crawling and data extraction, aiming to be a replacement for paid software in the SEO space. | 143 |