aut
Archive analyzer
An open-source toolkit for analyzing web archives using Apache Spark.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
138 stars
15 watching
33 forks
Language: Scala
last commit: 12 months ago
Linked from 2 awesome lists
analysisapache-sparkbig-databig-data-analyticsdataframedigital-humanitieshadoopnetwork-graphingpysparkpython3scalasparktext-extractionwebarchives
Related projects:
Repository | Description | Stars |
---|---|---|
| Analyzes line-oriented JSON data from Twitter APIs using Apache Spark | 9 |
| Provides tools and examples for working with web archives using the Archives Unleashed Toolkit | 23 |
| A toolkit for analyzing and extracting data from legacy web archives in a structured format suitable for further analysis or reuse | 3 |
| A tool to automate archiving of web resources into public archives. | 409 |
| Downloads and crawls web pages, allowing for the archiving of websites. | 556 |
| A graphical user interface layer for preserving and replaying web pages using multiple archiving tools. | 353 |
| A .NET implementation of the Atom Asar archive format, allowing extraction and manipulation of archived files. | 36 |
| A distributed compute analysis system for web archive collections | 15 |
| Tools for indexing and discovering archived web content | 117 |
| A toolkit for processing and analyzing web archive data | 89 |
| A collection of tools for extracting and analyzing data from web archives | 71 |
| Automates archiving of online content from various sources into local storage or cloud services | 585 |
| A plugin that automates the extraction of archive files from various formats. | 19 |
| A tool and package for extracting web archive data from popular sources like Wayback Machine and Common Crawl using the Go programming language. | 148 |
| A framework for efficient data processing and extraction from archival collections, enabling the transformation of raw data into more accessible formats. | 145 |