ebot
crawler
An Erlang-based web crawler designed to be scalable and highly configurable
Ebot, an Opensource Web Crawler built on top of a nosql database (apache couchdb, riak), AMQP database (rabbitmq), webmachine and mochiweb. Ebot is written in Erlang and it is a very scalable, distribuited and highly configurable web cawler. See wiki pages for more details
330 stars
27 watching
55 forks
Language: Erlang
last commit: over 13 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
feng19/spider_man | A high-level web crawling and scraping framework for Elixir. | 23 |
puerkitobio/fetchbot | A flexible web crawler that follows robots.txt policies and crawl delays. | 786 |
brendonboshell/supercrawler | A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. | 378 |
elixir-crawly/crawly | A framework for extracting structured data from websites | 987 |
fredwu/crawler | A high-performance web crawling and scraping solution with customizable settings and worker pooling. | 945 |
elliotgao2/gain | A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. | 2,035 |
postmodern/spidr | A Ruby web crawling library that provides flexible and customizable methods to crawl websites | 806 |
rndinfosecguy/scavenger | An OSINT bot that crawls pastebin sites to search for sensitive data leaks | 629 |
puerkitobio/gocrawl | A concurrent web crawler written in Go that allows flexible and polite crawling of websites. | 2,038 |
hu17889/go_spider | A modular, concurrent web crawler framework written in Go. | 1,826 |
felipecsl/wombat | A Ruby-based web crawler and data extraction tool with an elegant DSL. | 1,315 |
fmpwizard/owlcrawler | A distributed web crawler that coordinates crawling tasks across multiple worker processes using a message bus. | 55 |
turnersoftware/infinitycrawler | A web crawling library for .NET that allows customizable crawling and throttling of websites. | 248 |
vida-nyu/ache | A web crawler designed to efficiently collect and prioritize relevant content from the web | 454 |
joenorton/rubyretriever | A Ruby-based tool for web crawling and data extraction, aiming to be a replacement for paid software in the SEO space. | 143 |