tikalinkextract
URL extractor
Extracts URLs from files using Tika client
Tika based link (URL) extractor for httpreserve
10 stars
4 watching
1 forks
Language: HTML
last commit: 4 months ago
Linked from 1 awesome list
archivescode4libdigitalpreservationhttpreserveiipctikatika-wrapperurl-extractorwebarchiving
Related projects:
Repository | Description | Stars |
---|---|---|
| A tool that extracts and filters links from web responses | 86 |
| A command-line tool to test links and retrieve Internet Archive replacements. | 10 |
| An automated tool to discover and extract links from web applications | 1,216 |
| A .NET implementation of the Atom Asar archive format, allowing extraction and manipulation of archived files. | 36 |
| A command-line and Python interface to access Archive.org's services | 1,643 |
| Tools for working with archived web content | 153 |
| Extracts binary relationships from English sentences at scale | 543 |
| A tool to extract and format SSL/TLS certificates from servers | 718 |
| A tool and package for extracting web archive data from popular sources like Wayback Machine and Common Crawl using the Go programming language. | 148 |
| Expands URLs from shortened links to their original addresses | 117 |
| A tool to extract URLs from text using regular expressions in the Go programming language. | 1,193 |
| Simplifies access to the client IP address in HTTP requests with X-Forwarded headers | 24 |
| A tool that aggregates links from various web archives and crawlers to help find more links, including the ability to download archived responses for further searching. | 1,790 |
| Downloads and crawls web pages, allowing for the archiving of websites. | 556 |
| Provides tools for reading and parsing web archive formats used in digital preservation. | 20 |