HadoopConcatGz

A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz

GitHub

9 stars
2 watching
3 forks
Language: Java
last commit: over 6 years ago
Linked from 1 awesome list

hadoopsparkwarcweb-archivingwebarchive

Backlinks from these awesome lists: