crawler
Web scraper
A Scala-based DSL for programmatically accessing and interacting with web pages
Scala DSL for web crawling
148 stars
14 watching
40 forks
Language: Scala
last commit: over 8 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
ruippeixotog/scala-scraper | A Scala library that provides a domain-specific language (DSL) for parsing and extracting content from HTML pages. | 717 |
felipecsl/wombat | A Ruby-based web crawler and data extraction tool with an elegant DSL. | 1,315 |
postmodern/spidr | A Ruby web crawling library that provides flexible and customizable methods to crawl websites | 806 |
brendonboshell/supercrawler | A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. | 378 |
dyweb/scrala | A web crawling framework written in Scala that allows users to define the start URL and parse response from it | 113 |
internetarchive/brozzler | A distributed web crawler that fetches and extracts links from websites using a real browser. | 671 |
benibela/xidel | A tool to extract data from web pages using various query languages and selectors. | 681 |
fimad/scalpel | A web scraping library providing a declarative interface on top of an HTML parsing library to extract data from HTML pages | 323 |
miyagawa/web-scraper | A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface. | 104 |
webrecorder/browsertrix-crawler | A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner. | 652 |
lambdaworks/scurl-detector | Detects and extracts URLs from text in written content | 16 |
apiel/test-crawler | A tool for end-to-end testing of web applications by crawling and comparing screenshots. | 32 |
stewartmckee/cobweb | A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner | 226 |
archiveteam/wpull | Downloads and crawls web pages, allowing for the archiving of websites. | 556 |
the-markup/blacklight-collector | A tool for scraping website content and analyzing browser behavior | 202 |