HadoopConcatGz
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
9 stars
2 watching
3 forks
Language: Java
last commit: over 6 years ago
Linked from 1 awesome list
hadoopsparkwarcweb-archivingwebarchive