haul

Image crawler

A tool to extract images from web pages and URLs

An Extensible Image Crawler

GitHub

158 stars
11 watching
38 forks
Language: Python
last commit: almost 8 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
archiveteam/grab-site A web crawler designed to backup websites by recursively crawling and writing WARC files. 1,400
sananth12/imagescraper Downloads images from a webpage in parallel using multiple threads and saves them to a specified directory 763
webrecorder/browsertrix-crawler A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner. 663
vida-nyu/ache A web crawler designed to efficiently collect and prioritize relevant content from the web 456
vicktornl/wagtail-stock-images A tool to search and add stock images to the Wagtail content management system. 10
fredwu/crawler A high-performance web crawling and scraping solution with customizable settings and worker pooling. 945
puerkitobio/fetchbot A flexible web crawler that follows robots.txt policies and crawl delays. 787
jmg/crawley A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. 187
evyatarmeged/stegextract A tool to extract hidden data from images by detecting embedded files and strings. 114
mapbox/robosat An end-to-end pipeline for extracting features from aerial and satellite imagery using convolutional neural networks 2,025
internetarchive/brozzler A distributed web crawler that fetches and extracts links from websites using a real browser. 673
archiveteam/wpull Downloads and crawls web pages, allowing for the archiving of websites. 557
azubieta/appimages.scraper A tool to extract AppImage release data from web pages 11
elliotgao2/gain A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. 2,035
stewartmckee/cobweb A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner 226