tikalinkextract
URL extractor
Extracts URLs from files using Tika client
Tika based link (URL) extractor for httpreserve
9 stars
4 watching
1 forks
Language: HTML
last commit: 14 days ago
Linked from 1 awesome list
archivescode4libdigitalpreservationhttpreserveiipctikatika-wrapperurl-extractorwebarchiving
Related projects:
Repository | Description | Stars |
---|---|---|
arbazkiraak/linksdumper | A tool that extracts and filters links from web responses | 86 |
httpreserve/linkstat | A command-line tool to test links and retrieve Internet Archive replacements. | 9 |
xnl-h4ck3r/xnlinkfinder | A Python tool used to automatically discover and extract endpoints, parameters, and wordlists from target websites. | 1,204 |
jiiks/asar.net | A .NET implementation of the Atom Asar archive format, allowing extraction and manipulation of archived files. | 35 |
jjjake/internetarchive | A command-line and Python interface to access Archive.org's services | 1,625 |
internetarchive/warctools | Tools for working with archived web content | 152 |
knowitall/reverb | Extracts binary relationships from English sentences at scale | 543 |
hakky54/certificate-ripper | Extracts server certificates from URLs using a fast and easy-to-use CLI tool | 713 |
karust/gogetcrawl | A tool and package for extracting web archive data from popular sources like Wayback Machine and Common Crawl using the Go programming language. | 147 |
nodeca/url-unshort | Expands URLs from shortened links to their original addresses | 116 |
mvdan/xurls | A tool to extract URLs from text using regular expressions in the Go programming language. | 1,187 |
kbrw/plug_forwarded_peer | Simplifies access to the client IP address in HTTP requests with X-Forwarded headers | 24 |
xnl-h4ck3r/waymore | A tool that aggregates links from multiple web archiving sources to facilitate bug bounty and research efforts. | 1,739 |
archiveteam/wpull | Downloads and crawls web pages, allowing for the archiving of websites. | 556 |
richardlehane/webarchive | Provides tools for reading and parsing web archive formats used in digital preservation. | 20 |