ebot

crawler

An Erlang-based web crawler designed to be scalable and highly configurable

Ebot, an Opensource Web Crawler built on top of a nosql database (apache couchdb, riak), AMQP database (rabbitmq), webmachine and mochiweb. Ebot is written in Erlang and it is a very scalable, distribuited and highly configurable web cawler. See wiki pages for more details

GitHub

330 stars
27 watching
55 forks
Language: Erlang
last commit: over 13 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
feng19/spider_man A high-level web crawling and scraping framework for Elixir. 23
puerkitobio/fetchbot A flexible web crawler that follows robots.txt policies and crawl delays. 786
brendonboshell/supercrawler A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. 378
elixir-crawly/crawly A framework for extracting structured data from websites 987
fredwu/crawler A high-performance web crawling and scraping solution with customizable settings and worker pooling. 945
elliotgao2/gain A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. 2,035
postmodern/spidr A Ruby web crawling library that provides flexible and customizable methods to crawl websites 806
rndinfosecguy/scavenger An OSINT bot that crawls pastebin sites to search for sensitive data leaks 629
puerkitobio/gocrawl A concurrent web crawler written in Go that allows flexible and polite crawling of websites. 2,038
hu17889/go_spider A modular, concurrent web crawler framework written in Go. 1,826
felipecsl/wombat A Ruby-based web crawler and data extraction tool with an elegant DSL. 1,315
fmpwizard/owlcrawler A distributed web crawler that coordinates crawling tasks across multiple worker processes using a message bus. 55
turnersoftware/infinitycrawler A web crawling library for .NET that allows customizable crawling and throttling of websites. 248
vida-nyu/ache A web crawler designed to efficiently collect and prioritize relevant content from the web 454
joenorton/rubyretriever A Ruby-based tool for web crawling and data extraction, aiming to be a replacement for paid software in the SEO space. 143