twarc

Data archiver

A tool for archiving Twitter JSON data via the Twitter API

A command line tool (and Python library) for archiving Twitter JSON

1k stars

35 watching

255 forks

Language: Python

last commit: over 2 years ago

Linked from 2 awesome lists

Screenshot of DocNow/twarc website

twarc-project.readthedocs.io

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
archivesunleashed/twut	Analyzes line-oriented JSON data from Twitter APIs using Apache Spark	9
fisadev/twistorpy	A tool to backup Twitter user's tweets to a JSON file	4
simonlindgren/2wttr	Collects and processes tweets from the Twitter API using Academic access	20
dapivei/tweetple	A Python library that provides a simple interface to stream information from Twitter's Full-Archive Search Endpoint.	12
janezkranjc/twitter-tap	A tool for collecting tweets from Twitter's search API and storing them in a MongoDB database	80
twitter/elephant-bird	A collection of input formats and utilities for working with compressed data files in various formats.	1,137
nla/httrack2warc	Converts HTTrack crawls to WARC files by reconstructing requests and responses from logs	32
peterk/warcworker	A web archiving tool that archives websites with high-fidelity preservation capabilities.	57
eldraco/twitter-stats	A tool to retrieve and display Twitter account statistics.	4
shohil-kishore/twitter-data-toolkit	A Node.js application allowing users to collect and aggregate Twitter data through its v2 API	7
n0tan3rd/squidwarc	An archival crawler built on top of Chrome or Chromium to preserve the web in high fidelity and user scriptable manner	170
ryanmcgrath/twython	Provides access to Twitter data and functionality via a Python interface	1,854
archiveteam/grab-site	A web crawler designed to backup websites by recursively crawling and writing WARC files.	1,406
webrecorder/har2warc	Converts HTTP Archive format to Web Archive format	48
chfoo/warcat	Tool for handling Web Archive files	152