python-readability

HTML parser

Extracts and cleans main body text and title from an HTML document

fast python port of arc90's readability tool, updated to match latest readability.js!

GitHub

3k stars
95 watching
350 forks
Language: Python
last commit: 3 months ago
Linked from 2 awesome lists


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
remarkjs/remark-rehype Transforms markdown into HTML to support HTML processing plugins 275
wcz-txp/unicode-url-for-textpattern Automatically converts non-ASCII characters in text links to UTF-8 URLs for improved SEO and readability 4
apostrophecms/sanitize-html A JavaScript library for cleaning up and sanitizing user-submitted HTML, removing unwanted content while preserving whitelisted elements and attributes. 3,867
webreflection/hyperhtml A lightweight virtual DOM alternative built on top of HTML template literals 3,071
remarkjs/remark Tools for processing and transforming markdown text into various formats. 7,778
haml/haml A templating engine for HTML that uses a concise syntax and automatic indentation to simplify the process of writing and rendering HTML documents 3,766
kkos/oniguruma A flexible and modern regular expression library with support for various character encodings and APIs. 2,331
overbryd/myhtmlex Erlang/Elixir bindings for parsing and processing HTML documents 14
rehypejs/rehype-remark Transforms HTML into Markdown syntax tree to support remark 82
jhy/jsoup A Java library for parsing and manipulating HTML, XML, and CSS 10,985
markdown-it/linkify-it Library to recognize and normalize links with full unicode support 670
lexborisov/myhtml A fast HTML parsing library written in C 1,657
github/markup Converts raw markup to HTML for rendering on GitHub.com 5,876
archakov06/codex-to-html Converts JSON-blocks from EditorJS to HTML markup 15
zzzprojects/html-agility-pack An HTML parsing library that allows developers to parse and manipulate malformed HTML documents 2,665