dom-crawler

DOM parser

A PHP component for navigating and manipulating HTML and XML documents programmatically.

Eases DOM navigation for HTML and XML documents

GitHub

4k stars
27 watching
123 forks
Language: PHP
last commit: about 2 months ago
Linked from 1 awesome list

componentphpsymfonysymfony-component

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
turnersoftware/infinitycrawler A web crawling library for .NET that allows customizable crawling and throttling of websites. 248
meilisearch/docs-scraper Automates scraping and indexing of documentation content into a search engine 297
brendonboshell/supercrawler A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. 380
yujiosaka/headless-chrome-crawler A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites 5,534
dyweb/scrala A web crawling framework written in Scala that allows users to define the start URL and parse response from it 113
hominee/dyer A fast and flexible web crawling tool with features like asynchronous I/O and event-driven design. 135
amoilanen/js-crawler A Node.js module for crawling web sites and scraping their content 254
naufalardhani/domhttpx A tool to discover and extract information from web pages using HTTP requests and Google search queries. 68
feng19/spider_man A high-level web crawling and scraping framework for Elixir. 23
symfony/html-sanitizer Provides an object-oriented API to sanitize untrusted HTML input 241
iamstoxe/urlgrab A tool to crawl websites by exploring links recursively with support for JavaScript rendering. 331
symfony/finder A PHP library that provides an intuitive interface to find files and directories in a file system. 8,415
webrecorder/browsertrix-crawler A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner. 677
symfony/process Executes commands in separate tasks for concurrent execution 7,440
cocrawler/cocrawler A versatile web crawler built with modern tools and concurrency to handle various crawl tasks 188