ArchiveTools

Data extractor

A collection of tools for extracting and analyzing data from web archives

A collection of tools for archiving and analysing the internet.

GitHub

69 stars
6 watching
15 forks
Language: Python
last commit: over 2 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
raulfraile/distill A tool that extracts files from compressed archives using various methods and strategies to optimize bandwidth or decompression speed. 224
karust/gogetcrawl A tool and package for extracting web archive data from popular sources like Wayback Machine and Common Crawl using the Go programming language. 147
chatnoir-eu/chatnoir-resiliparse A toolkit for processing and analyzing web archive data 84
rmendels/rerddapxtracto A package for accessing and extracting environmental data from remote ERDDAP servers. 14
anonyfox/elixir-scrape A tool for extracting structured data from web resources using information-retrieval techniques. 328
pxyup/fitter A utility for extracting and processing data from various sources, including APIs, websites, and static text 119
eset-la/lord-of-the-strings A tool to extract and classify relevant strings from binary files 9
oduwsdl/archivenow A tool to automate archiving of web resources into public archives. 410
eyurtsev/kor Extracts structured data from unstructured text using large language models 1,629
le0me55i/zsh-extract A plugin that automates the extraction of archive files from various formats. 19
keydet89/regripper3.0 A tool designed to extract and analyze data from Windows registry files 557
jiiks/asar.net A .NET implementation of the Atom Asar archive format, allowing extraction and manipulation of archived files. 35
deviantech/rack-referrals Extracts information about referring search engines from HTTP requests. 17
thetic/extract A plugin that allows users to extract files from various archive formats without specifying the extraction command. 9
pbiecek/archivist Manages and stores data analysis results in a centralized archive 74