dom-crawler

DOM parser

A PHP component for navigating and manipulating HTML and XML documents programmatically.

Eases DOM navigation for HTML and XML documents

GitHub

4k stars
27 watching
123 forks
Language: PHP
last commit: 8 days ago
Linked from 1 awesome list

componentphpsymfonysymfony-component

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
turnersoftware/infinitycrawler A web crawling library for .NET that allows customizable crawling and throttling of websites. 248
meilisearch/docs-scraper Automates scraping and indexing of documentation content into a search engine 290
brendonboshell/supercrawler A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. 378
yujiosaka/headless-chrome-crawler A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites 5,527
dyweb/scrala A web crawling framework written in Scala that allows users to define the start URL and parse response from it 113
hominee/dyer A fast and flexible web crawling tool with features like asynchronous I/O and event-driven design. 133
amoilanen/js-crawler A Node.js module for crawling web sites and scraping their content 253
naufalardhani/domhttpx A tool to discover and extract information from web pages using HTTP requests and Google search queries. 68
feng19/spider_man A high-level web crawling and scraping framework for Elixir. 23
symfony/html-sanitizer Provides an object-oriented API to sanitize untrusted HTML input 238
iamstoxe/urlgrab A tool to crawl websites by exploring links recursively with support for JavaScript rendering. 330
symfony/finder A PHP library that provides an intuitive interface to find files and directories in a file system. 8,404
webrecorder/browsertrix-crawler A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner. 652
symfony/process Executes commands in separate tasks for concurrent execution 7,431
cocrawler/cocrawler A versatile web crawler built with modern tools and concurrency to handle various crawl tasks 187