scrala

Web crawler framework

A web crawling framework written in Scala that allows users to define the start URL and parse response from it

Unmaintained whale coffee spider Scala crawler(spider) framework, inspired by scrapy, created by @gaocegege

GitHub

113 stars
12 watching
23 forks
Language: Scala
last commit: over 5 years ago
Linked from 1 awesome list

actor-modeldockerscalascrapyspider

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
howie6879/ruia An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling 1,753
hu17889/go_spider A modular, concurrent web crawler framework written in Go. 1,827
wspl/creeper A framework for building cross-platform web crawlers using Go 780
bplawler/crawler A Scala-based DSL for programmatically accessing and interacting with web pages 149
postmodern/spidr A Ruby web crawling library that provides flexible and customizable methods to crawl websites 809
zhegexiaohuozi/seimicrawler A distributed crawler framework that simplifies the process of building crawlers using Spring Boot and Redis 1,980
elixir-crawly/crawly A framework for extracting structured data from websites 994
codesofun/web-bee A Java framework for building web-based crawlers with features like distributed crawling and proxy support. 189
stewartmckee/cobweb A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner 226
feng19/spider_man A high-level web crawling and scraping framework for Elixir. 23
veliovgroup/spiderable-middleware intercepts requests from web crawlers and proxies them to a prerendering service for rendering HTML 39
untwisted/sukhoi A minimalist web crawler framework built on top of miners and structure-based data extraction 879
spider-rs/spider A tool for web data extraction and processing using Rust 1,234
xianhu/pspider A Python web crawler framework with support for multi-threading and proxy usage. 1,828
rivermont/spidy A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling 340