scrala

Web crawler framework

A web crawling framework written in Scala that allows users to define the start URL and parse response from it

Unmaintained Scala crawler(spider) framework, inspired by scrapy, created by @gaocegege

GitHub

113 stars

12 watching

23 forks

Language: Scala

last commit: almost 6 years ago

Linked from 1 awesome list

actor-modeldockerscalascrapyspider

dongyueweb.com/scrala/

Backlinks from these awesome lists:

brucedone/awesome-crawler

Related projects:

Repository	Description	Stars
howie6879/ruia	An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling	1,753
hu17889/go_spider	A modular, concurrent web crawler framework written in Go.	1,827
wspl/creeper	A framework for building cross-platform web crawlers using Go	780
bplawler/crawler	A Scala-based DSL for programmatically accessing and interacting with web pages	149
postmodern/spidr	A Ruby web crawling library that provides flexible and customizable methods to crawl websites	809
zhegexiaohuozi/seimicrawler	A distributed crawler framework that simplifies the process of building crawlers using Spring Boot and Redis	1,980
elixir-crawly/crawly	A framework for extracting structured data from websites	994
codesofun/web-bee	A Java framework for building web-based crawlers with features like distributed crawling and proxy support.	189
stewartmckee/cobweb	A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner	226
feng19/spider_man	A high-level web crawling and scraping framework for Elixir.	23
veliovgroup/spiderable-middleware	intercepts requests from web crawlers and proxies them to a prerendering service for rendering HTML	39
untwisted/sukhoi	A minimalist web crawler framework built on top of miners and structure-based data extraction	879
spider-rs/spider	A tool for web data extraction and processing using Rust	1,234
xianhu/pspider	A Python web crawler framework with support for multi-threading and proxy usage.	1,828
rivermont/spidy	A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling	340