crawler

Web scraper

A Scala-based DSL for programmatically accessing and interacting with web pages

Scala DSL for web crawling

149 stars

14 watching

40 forks

Language: Scala

last commit: almost 10 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

brucedone/awesome-crawler

Related projects:

Repository	Description	Stars
ruippeixotog/scala-scraper	A Scala library providing a DSL for loading and extracting content from HTML pages	717
felipecsl/wombat	A Ruby-based web crawler and data extraction tool with an elegant DSL.	1,315
postmodern/spidr	A Ruby web crawling library that provides flexible and customizable methods to crawl websites	809
brendonboshell/supercrawler	A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages.	380
dyweb/scrala	A web crawling framework written in Scala that allows users to define the start URL and parse response from it	113
internetarchive/brozzler	A distributed web crawler that fetches and extracts links from websites using a real browser.	678
benibela/xidel	A tool to extract data from web pages using various query languages and selectors.	690
fimad/scalpel	A web scraping library providing a declarative interface on top of an HTML parsing library to extract data from HTML pages	325
miyagawa/web-scraper	A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface.	104
webrecorder/browsertrix-crawler	A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner.	677
lambdaworks/scurl-detector	Detects and extracts URLs from text in written content	16
apiel/test-crawler	A tool for end-to-end testing of web applications by crawling and comparing screenshots.	33
stewartmckee/cobweb	A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner	226
archiveteam/wpull	Downloads and crawls web pages, allowing for the archiving of websites.	556
the-markup/blacklight-collector	A tool for scraping website content and analyzing browser behavior	205