crawley

Web URL extractor

A utility for systematically extracting URLs from web pages and printing them to the console.

The unix-way web crawler

GitHub

265 stars
2 watching
13 forks
Language: Go
last commit: 14 days ago
Linked from 4 awesome lists

clicrawlergogolanggolang-applicationpentestpentest-toolpentestingunix-wayweb-crawlerweb-scrapingweb-spider

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
dwisiswant0/galer A tool to extract URLs from HTML attributes by crawling in and evaluating JavaScript 253
mvdan/xurls A tool to extract URLs from text using regular expressions in the Go programming language. 1,187
karust/gogetcrawl A tool and package for extracting web archive data from popular sources like Wayback Machine and Common Crawl using the Go programming language. 147
003random/getjs A tool to extract JavaScript sources from URLs and web pages efficiently 712
foolin/pagser A tool for automatically extracting structured data from HTML pages 105
jakopako/goskyr A tool to simplify web scraping of list-like structured data from web pages 35
eloopwoo/chrome-url-dumper A tool to extract and dump URLs from Chrome's stored databases. 34
go-shiori/obelisk Archives a web page as a single HTML file with embedded resources. 263
archiveteam/grab-site A web crawler designed to backup websites by recursively crawling and writing WARC files. 1,402
slotix/dataflowkit A framework for extracting structured data from web pages using CSS selectors. 662
iamstoxe/urlgrab A tool to crawl websites by exploring links recursively with support for JavaScript rendering. 330
archiveteam/wpull Downloads and crawls web pages, allowing for the archiving of websites. 556
puerkitobio/gocrawl A concurrent web crawler written in Go that allows flexible and polite crawling of websites. 2,038
stewartmckee/cobweb A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner 226
rivermont/spidy A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling 340