crawler
Web scraper
A Scala-based DSL for programmatically accessing and interacting with web pages
Scala DSL for web crawling
149 stars
14 watching
40 forks
Language: Scala
last commit: over 8 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
ruippeixotog/scala-scraper | A Scala library providing a DSL for loading and extracting content from HTML pages | 717 |
felipecsl/wombat | A Ruby-based web crawler and data extraction tool with an elegant DSL. | 1,315 |
postmodern/spidr | A Ruby web crawling library that provides flexible and customizable methods to crawl websites | 809 |
brendonboshell/supercrawler | A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. | 380 |
dyweb/scrala | A web crawling framework written in Scala that allows users to define the start URL and parse response from it | 113 |
internetarchive/brozzler | A distributed web crawler that fetches and extracts links from websites using a real browser. | 678 |
benibela/xidel | A tool to extract data from web pages using various query languages and selectors. | 690 |
fimad/scalpel | A web scraping library providing a declarative interface on top of an HTML parsing library to extract data from HTML pages | 325 |
miyagawa/web-scraper | A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface. | 104 |
webrecorder/browsertrix-crawler | A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner. | 677 |
lambdaworks/scurl-detector | Detects and extracts URLs from text in written content | 16 |
apiel/test-crawler | A tool for end-to-end testing of web applications by crawling and comparing screenshots. | 33 |
stewartmckee/cobweb | A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner | 226 |
archiveteam/wpull | Downloads and crawls web pages, allowing for the archiving of websites. | 556 |
the-markup/blacklight-collector | A tool for scraping website content and analyzing browser behavior | 205 |