crawl4ai

Web crawler

A web crawling tool designed to extract structured data from the web for use in AI applications

🚀🤖 Crawl4AI: Crawl Smarter, Faster, Freely. For AI.

GitHub

19k stars
109 watching
1k forks
Language: HTML
last commit: 1 day ago

Related projects:

Repository Description Stars
code4craft/webmagic A framework for building scalable web crawlers in Java. 11,456
apify/crawlee A tool for building reliable web scraping and browser automation pipelines in Node.js. 16,081
spatie/crawler A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently. 2,552
gocolly/colly A framework for extracting structured data from websites in a fast and elegant way 23,444
yasserg/crawler4j A Java-based web crawler for extracting and processing web page content 4,563
yujiosaka/headless-chrome-crawler A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites 5,534
s0md3v/photon A fast and flexible web crawler designed to gather information from the internet 11,122
bda-research/node-crawler A NodeJS-based web crawler and spider that extracts data from websites. 6,718
s0rg/crawley A utility for systematically extracting URLs from web pages and printing them to the console. 268
ionicabizau/scrape-it A Node.js library and CLI tool for automating web page scraping and parsing 4,024
howie6879/ruia An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling 1,753
internetarchive/heritrix3 A web crawler designed to collect and preserve digital artifacts while respecting site policies and load constraints. 2,857
elliotgao2/gain A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. 2,037
stewartmckee/cobweb A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner 226
archiveteam/grab-site A web crawler designed to backup websites by recursively crawling and writing WARC files. 1,406