scrala

Web crawler framework

A web crawling framework written in Scala that allows users to define the start URL and parse response from it

Unmaintained whale coffee spider Scala crawler(spider) framework, inspired by scrapy, created by @gaocegege

GitHub

113 stars
12 watching
23 forks
Language: Scala
last commit: about 5 years ago
Linked from 1 awesome list

actor-modeldockerscalascrapyspider

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
howie6879/ruia An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling 1,752
hu17889/go_spider A modular, concurrent web crawler framework written in Go. 1,826
wspl/creeper A framework for building cross-platform web crawlers using Go 780
bplawler/crawler A Scala-based DSL for programmatically accessing and interacting with web pages 148
postmodern/spidr A Ruby web crawling library that provides flexible and customizable methods to crawl websites 806
zhegexiaohuozi/seimicrawler An agile and distributed crawler framework designed to simplify and speed up web scraping with Spring Boot support 1,980
elixir-crawly/crawly A framework for extracting structured data from websites 987
codesofun/web-bee A Java framework for building web-based crawlers with features like distributed crawling and proxy support. 189
stewartmckee/cobweb A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner 226
feng19/spider_man A high-level web crawling and scraping framework for Elixir. 23
veliovgroup/spiderable-middleware intercepts requests from web crawlers and proxies them to a prerendering service for rendering HTML 38
untwisted/sukhoi A minimalist web crawler framework built on top of miners and structure-based data extraction 881
spider-rs/spider A web crawler and scraper built on top of Rust, designed to extract data from the web in a flexible and configurable manner. 1,140
xianhu/pspider A Python web crawler framework with support for multi-threading and proxy usage. 1,827
rivermont/spidy A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling 340