haul
Image crawler
A tool to extract images from web pages and URLs
An Extensible Image Crawler
158 stars
11 watching
37 forks
Language: Python
last commit: almost 8 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
archiveteam/grab-site | A web crawler designed to backup websites by recursively crawling and writing WARC files. | 1,398 |
sananth12/imagescraper | Downloads images from a webpage in parallel using multiple threads and saves them to a specified directory | 763 |
webrecorder/browsertrix-crawler | A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner. | 652 |
vida-nyu/ache | A web crawler designed to efficiently collect and prioritize relevant content from the web | 454 |
vicktornl/wagtail-stock-images | A tool to search and add stock images to the Wagtail content management system. | 10 |
fredwu/crawler | A high-performance web crawling and scraping solution with customizable settings and worker pooling. | 945 |
puerkitobio/fetchbot | A flexible web crawler that follows robots.txt policies and crawl delays. | 786 |
jmg/crawley | A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. | 186 |
evyatarmeged/stegextract | A tool to extract hidden data from images by detecting embedded files and strings. | 114 |
mapbox/robosat | An end-to-end pipeline for extracting features from aerial and satellite imagery using convolutional neural networks | 2,024 |
internetarchive/brozzler | A distributed web crawler that fetches and extracts links from websites using a real browser. | 671 |
archiveteam/wpull | Downloads and crawls web pages, allowing for the archiving of websites. | 556 |
azubieta/appimages.scraper | A tool to extract AppImage release data from web pages | 11 |
elliotgao2/gain | A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. | 2,035 |
stewartmckee/cobweb | A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner | 226 |