crawl4ai
Web crawler
A web crawling tool designed to extract structured data from the web for use in AI applications
🚀🤖 Crawl4AI: Crawl Smarter, Faster, Freely. For AI.
19k stars
109 watching
1k forks
Language: HTML
last commit: 1 day ago Related projects:
Repository | Description | Stars |
---|---|---|
code4craft/webmagic | A framework for building scalable web crawlers in Java. | 11,456 |
apify/crawlee | A tool for building reliable web scraping and browser automation pipelines in Node.js. | 16,081 |
spatie/crawler | A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently. | 2,552 |
gocolly/colly | A framework for extracting structured data from websites in a fast and elegant way | 23,444 |
yasserg/crawler4j | A Java-based web crawler for extracting and processing web page content | 4,563 |
yujiosaka/headless-chrome-crawler | A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites | 5,534 |
s0md3v/photon | A fast and flexible web crawler designed to gather information from the internet | 11,122 |
bda-research/node-crawler | A NodeJS-based web crawler and spider that extracts data from websites. | 6,718 |
s0rg/crawley | A utility for systematically extracting URLs from web pages and printing them to the console. | 268 |
ionicabizau/scrape-it | A Node.js library and CLI tool for automating web page scraping and parsing | 4,024 |
howie6879/ruia | An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling | 1,753 |
internetarchive/heritrix3 | A web crawler designed to collect and preserve digital artifacts while respecting site policies and load constraints. | 2,857 |
elliotgao2/gain | A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. | 2,037 |
stewartmckee/cobweb | A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner | 226 |
archiveteam/grab-site | A web crawler designed to backup websites by recursively crawling and writing WARC files. | 1,406 |