crawley

Web URL extractor

A utility for systematically extracting URLs from web pages and printing them to the console.

The unix-way web crawler

GitHub

268 stars

2 watching

14 forks

Language: Go

last commit: 9 months ago

Linked from 4 awesome lists

clicrawlergogolanggolang-applicationpentestpentest-toolpentestingunix-wayweb-crawlerweb-scrapingweb-spider

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
dwisiswant0/galer	A tool to extract URLs from HTML attributes by crawling in and evaluating JavaScript	255
mvdan/xurls	A tool to extract URLs from text using regular expressions in the Go programming language.	1,193
karust/gogetcrawl	A tool and package for extracting web archive data from popular sources like Wayback Machine and Common Crawl using the Go programming language.	148
003random/getjs	A tool to extract JavaScript sources from URLs and web pages efficiently	732
foolin/pagser	A tool for automatically extracting structured data from HTML pages	105
jakopako/goskyr	A tool to simplify web scraping of list-like structured data from web pages	36
eloopwoo/chrome-url-dumper	A tool to extract and dump URLs from Chrome's stored databases.	34
go-shiori/obelisk	Archives a web page as a single HTML file with embedded resources.	267
archiveteam/grab-site	A web crawler designed to backup websites by recursively crawling and writing WARC files.	1,406
slotix/dataflowkit	A framework for extracting structured data from web pages using CSS selectors.	667
iamstoxe/urlgrab	A tool to crawl websites by exploring links recursively with support for JavaScript rendering.	331
archiveteam/wpull	Downloads and crawls web pages, allowing for the archiving of websites.	556
puerkitobio/gocrawl	A concurrent web crawler written in Go that allows flexible and polite crawling of websites.	2,036
stewartmckee/cobweb	A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner	226
rivermont/spidy	A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling	340