html5lib-python

HTML parser

A standards-compliant Python library for parsing and serializing HTML documents and fragments.

Standards-compliant library for parsing and serializing HTML documents and fragments in Python

GitHub

1k stars
50 watching
284 forks
Language: Python
last commit: 9 months ago
Linked from 2 awesome lists


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
kovidgoyal/html5-parser A fast HTML parser written in C, optimized for performance. 682
scrapy/scrapely A pure-python library for extracting structured data from HTML pages. 1,863
bupt1987/html-parser A fast and efficient HTML parser for PHP. 525
servo/html5ever A high-performance HTML parser written in Rust. 2,148
lexborisov/myhtml A fast HTML parsing library written in C 1,655
snjyor/htmlpageparser An HTML parsing library that converts web pages to structured data and then generates Markdown content from that data 1
rotatef/cl-html5-parser An HTML5 parser for Common Lisp. 55
kennethreitz/requests-html A Pythonic HTML parsing library providing intuitive and asynchronous web scraping capabilities. 303
cclib/cclib A Python library for parsing and analyzing output files from computational chemistry packages 336
imangazaliev/didom A fast and simple HTML parser with support for CSS selectors and XPath expressions. 2,200
qmlweb/qmlweb-parser A JavaScript library that parses QML and JavaScript files at runtime 27
iabudiab/htmlkit An Objective-C framework for parsing and serializing HTML documents 240
r1chardj0n3s/parse A library that parses strings using a specification based on the Python format() syntax 1,713
ndmitchell/tagsoup A Haskell library for parsing and extracting information from HTML/XML documents 233
thephpleague/uri A PHP library for manipulating and parsing URIs according to various standards 1,034