crawl4ai

Web crawler

A web crawling tool designed to extract structured data from the web for use in AI applications

🚀🤖 Crawl4AI: Crawl Smarter, Faster, Freely. For AI.

19k stars

109 watching

1k forks

Language: HTML

last commit: over 1 year ago

Screenshot of unclecode/crawl4ai website

Related projects:

Repository	Description	Stars
code4craft/webmagic	A framework for building scalable web crawlers in Java.	11,456
apify/crawlee	A tool for building reliable web scraping and browser automation pipelines in Node.js.	16,081
spatie/crawler	A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently.	2,552
gocolly/colly	A framework for extracting structured data from websites in a fast and elegant way	23,444
yasserg/crawler4j	A Java-based web crawler for extracting and processing web page content	4,563
yujiosaka/headless-chrome-crawler	A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites	5,534
s0md3v/photon	A fast and flexible web crawler designed to gather information from the internet	11,122
bda-research/node-crawler	A NodeJS-based web crawler and spider that extracts data from websites.	6,718
s0rg/crawley	A utility for systematically extracting URLs from web pages and printing them to the console.	268
ionicabizau/scrape-it	A Node.js library and CLI tool for automating web page scraping and parsing	4,024
howie6879/ruia	An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling	1,753
internetarchive/heritrix3	A web crawler designed to collect and preserve digital artifacts while respecting site policies and load constraints.	2,857
elliotgao2/gain	A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites.	2,037
stewartmckee/cobweb	A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner	226
archiveteam/grab-site	A web crawler designed to backup websites by recursively crawling and writing WARC files.	1,406