dom-crawler

DOM parser

A PHP component for navigating and manipulating HTML and XML documents programmatically.

Eases DOM navigation for HTML and XML documents

GitHub

4k stars

27 watching

123 forks

Language: PHP

last commit: over 1 year ago

Linked from 1 awesome list

componentphpsymfonysymfony-component

Screenshot of symfony/dom-crawler website

symfony.com/dom-crawler

Backlinks from these awesome lists:

brucedone/awesome-crawler

Related projects:

Repository	Description	Stars
turnersoftware/infinitycrawler	A web crawling library for .NET that allows customizable crawling and throttling of websites.	248
meilisearch/docs-scraper	Automates scraping and indexing of documentation content into a search engine	297
brendonboshell/supercrawler	A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages.	380
yujiosaka/headless-chrome-crawler	A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites	5,534
dyweb/scrala	A web crawling framework written in Scala that allows users to define the start URL and parse response from it	113
hominee/dyer	A fast and flexible web crawling tool with features like asynchronous I/O and event-driven design.	135
amoilanen/js-crawler	A Node.js module for crawling web sites and scraping their content	254
naufalardhani/domhttpx	A tool to discover and extract information from web pages using HTTP requests and Google search queries.	68
feng19/spider_man	A high-level web crawling and scraping framework for Elixir.	23
symfony/html-sanitizer	Provides an object-oriented API to sanitize untrusted HTML input	241
iamstoxe/urlgrab	A tool to crawl websites by exploring links recursively with support for JavaScript rendering.	331
symfony/finder	A PHP library that provides an intuitive interface to find files and directories in a file system.	8,415
webrecorder/browsertrix-crawler	A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner.	677
symfony/process	Executes commands in separate tasks for concurrent execution	7,440
cocrawler/cocrawler	A versatile web crawler built with modern tools and concurrency to handle various crawl tasks	188