heritrix3
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
3k stars
187 watching
757 forks
Language: Java
last commit: 22 days ago
Linked from 3 awesome lists
heritrixjavawarcwebcrawling