tikalinkextract

URL extractor

Extracts URLs from files using Tika client

Tika based link (URL) extractor for httpreserve

GitHub

10 stars
4 watching
1 forks
Language: HTML
last commit: 4 months ago
Linked from 1 awesome list

archivescode4libdigitalpreservationhttpreserveiipctikatika-wrapperurl-extractorwebarchiving

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
arbazkiraak/linksdumper A tool that extracts and filters links from web responses 86
httpreserve/linkstat A command-line tool to test links and retrieve Internet Archive replacements. 10
xnl-h4ck3r/xnlinkfinder An automated tool to discover and extract links from web applications 1,216
jiiks/asar.net A .NET implementation of the Atom Asar archive format, allowing extraction and manipulation of archived files. 36
jjjake/internetarchive A command-line and Python interface to access Archive.org's services 1,643
internetarchive/warctools Tools for working with archived web content 153
knowitall/reverb Extracts binary relationships from English sentences at scale 543
hakky54/certificate-ripper A tool to extract and format SSL/TLS certificates from servers 718
karust/gogetcrawl A tool and package for extracting web archive data from popular sources like Wayback Machine and Common Crawl using the Go programming language. 148
nodeca/url-unshort Expands URLs from shortened links to their original addresses 117
mvdan/xurls A tool to extract URLs from text using regular expressions in the Go programming language. 1,193
kbrw/plug_forwarded_peer Simplifies access to the client IP address in HTTP requests with X-Forwarded headers 24
xnl-h4ck3r/waymore A tool that aggregates links from various web archives and crawlers to help find more links, including the ability to download archived responses for further searching. 1,790
archiveteam/wpull Downloads and crawls web pages, allowing for the archiving of websites. 556
richardlehane/webarchive Provides tools for reading and parsing web archive formats used in digital preservation. 20