python-readability
HTML parser
Extracts and cleans main body text and title from an HTML document
fast python port of arc90's readability tool, updated to match latest readability.js!
3k stars
95 watching
348 forks
Language: Python
last commit: about 1 month ago
Linked from 2 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
remarkjs/remark-rehype | Transforms markdown into HTML to support HTML processing plugins | 271 |
wcz-txp/unicode-url-for-textpattern | Automatically converts non-ASCII characters in text links to UTF-8 URLs for improved SEO and readability | 4 |
apostrophecms/sanitize-html | A JavaScript library for cleaning up and sanitizing user-submitted HTML, removing unwanted content while preserving whitelisted elements and attributes. | 3,833 |
webreflection/hyperhtml | A lightweight virtual DOM alternative built on top of HTML template literals | 3,070 |
remarkjs/remark | Tools for processing and transforming markdown text into various formats. | 7,703 |
haml/haml | A templating engine for HTML written in Ruby, designed to simplify and beautify HTML document generation. | 3,766 |
kkos/oniguruma | A modern and flexible regular expressions library for text pattern matching | 2,310 |
overbryd/myhtmlex | Erlang/Elixir bindings for parsing and processing HTML documents | 14 |
rehypejs/rehype-remark | Transforms HTML into Markdown syntax tree to support remark | 82 |
jhy/jsoup | A Java library for parsing and manipulating HTML, XML, and CSS | 10,949 |
markdown-it/linkify-it | Automatically converts plain text links into clickable URLs with full unicode support | 669 |
lexborisov/myhtml | A fast HTML parsing library written in C | 1,655 |
github/markup | Converts raw markup to HTML for rendering on GitHub.com | 5,870 |
archakov06/codex-to-html | Converts JSON-blocks from EditorJS to HTML markup | 15 |
zzzprojects/html-agility-pack | An HTML parsing library that allows developers to parse and manipulate malformed HTML documents | 2,652 |