HtmlPageParser

HTML parser

An HTML parsing library that converts web pages to structured data and then generates Markdown content from that data

A generic HTML parser

GitHub

1 stars
1 watching
0 forks
Language: Python
last commit: over 1 year ago

Related projects:

Repository Description Stars
scrapy/scrapely A pure-python library for extracting structured data from HTML pages. 1,863
kovidgoyal/html5-parser A fast HTML parser written in C, optimized for performance. 682
imangazaliev/didom A fast and simple HTML parser with support for CSS selectors and XPath expressions. 2,200
html5lib/html5lib-python A standards-compliant Python library for parsing and serializing HTML documents and fragments. 1,128
skevo18/pyeditorjs A Python package for parsing and rendering content from Editor.js JSON data in HTML format. 19
bupt1987/html-parser A fast and efficient HTML parser for PHP. 525
servo/html5ever A high-performance HTML parser written in Rust. 2,148
ndmitchell/tagsoup A Haskell library for parsing and extracting information from HTML/XML documents 233
terrier989/universal_html A cross-platform Dart package for parsing and manipulating HTML, XML, and CSS documents across various platforms. 0
iabudiab/htmlkit An Objective-C framework for parsing and serializing HTML documents 240
choru-k/react-native-html-parser A JavaScript library for parsing HTML and XML documents across multiple platforms, including React Native and Titanium. 84
egonschiele/handsomesoup A Haskell library that simplifies HTML parsing by providing CSS selectors and attribute extraction functions. 124
kennethreitz/requests-html A Pythonic HTML parsing library providing intuitive and asynchronous web scraping capabilities. 303
lexborisov/myhtml A fast HTML parsing library written in C 1,655
rust-scraper/scraper A Rust library for parsing and querying HTML documents using CSS selectors. 1,937