scrala
Web crawler framework
A web crawling framework written in Scala that allows users to define the start URL and parse response from it
Unmaintained Scala crawler(spider) framework, inspired by scrapy, created by @gaocegege
113 stars
12 watching
23 forks
Language: Scala
last commit: about 5 years ago
Linked from 1 awesome list
actor-modeldockerscalascrapyspider
Related projects:
Repository | Description | Stars |
---|---|---|
howie6879/ruia | An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling | 1,752 |
hu17889/go_spider | A modular, concurrent web crawler framework written in Go. | 1,826 |
wspl/creeper | A framework for building cross-platform web crawlers using Go | 780 |
bplawler/crawler | A Scala-based DSL for programmatically accessing and interacting with web pages | 148 |
postmodern/spidr | A Ruby web crawling library that provides flexible and customizable methods to crawl websites | 806 |
zhegexiaohuozi/seimicrawler | An agile and distributed crawler framework designed to simplify and speed up web scraping with Spring Boot support | 1,980 |
elixir-crawly/crawly | A framework for extracting structured data from websites | 987 |
codesofun/web-bee | A Java framework for building web-based crawlers with features like distributed crawling and proxy support. | 189 |
stewartmckee/cobweb | A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner | 226 |
feng19/spider_man | A high-level web crawling and scraping framework for Elixir. | 23 |
veliovgroup/spiderable-middleware | intercepts requests from web crawlers and proxies them to a prerendering service for rendering HTML | 38 |
untwisted/sukhoi | A minimalist web crawler framework built on top of miners and structure-based data extraction | 881 |
spider-rs/spider | A web crawler and scraper built on top of Rust, designed to extract data from the web in a flexible and configurable manner. | 1,140 |
xianhu/pspider | A Python web crawler framework with support for multi-threading and proxy usage. | 1,827 |
rivermont/spidy | A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling | 340 |