 aut
 aut 
 Archive analyzer
 An open-source toolkit for analyzing web archives using Apache Spark.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
138 stars
 15 watching
 33 forks
 
Language: Scala 
last commit: over 1 year ago 
Linked from   2 awesome lists  
  analysisapache-sparkbig-databig-data-analyticsdataframedigital-humanitieshadoopnetwork-graphingpysparkpython3scalasparktext-extractionwebarchives 
 Related projects:
| Repository | Description | Stars | 
|---|---|---|
|  | Analyzes line-oriented JSON data from Twitter APIs using Apache Spark | 9 | 
|  | Provides tools and examples for working with web archives using the Archives Unleashed Toolkit | 23 | 
|  | A toolkit for analyzing and extracting data from legacy web archives in a structured format suitable for further analysis or reuse | 3 | 
|  | A tool to automate archiving of web resources into public archives. | 409 | 
|  | Downloads and crawls web pages, allowing for the archiving of websites. | 556 | 
|  | A graphical user interface layer for preserving and replaying web pages using multiple archiving tools. | 353 | 
|  | A .NET implementation of the Atom Asar archive format, allowing extraction and manipulation of archived files. | 36 | 
|  | A distributed compute analysis system for web archive collections | 15 | 
|  | Tools for indexing and discovering archived web content | 117 | 
|  | A toolkit for processing and analyzing web archive data | 89 | 
|  | A collection of tools for extracting and analyzing data from web archives | 71 | 
|  | Automates archiving of online content from various sources into local storage or cloud services | 585 | 
|  | A plugin that automates the extraction of archive files from various formats. | 19 | 
|  | A tool and package for extracting web archive data from popular sources like Wayback Machine and Common Crawl using the Go programming language. | 148 | 
|  | A framework for efficient data processing and extraction from archival collections, enabling the transformation of raw data into more accessible formats. | 145 |