python-readability

HTML parser

Extracts and cleans main body text and title from an HTML document

fast python port of arc90's readability tool, updated to match latest readability.js!

GitHub

3k stars
95 watching
348 forks
Language: Python
last commit: about 1 month ago
Linked from 2 awesome lists


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
remarkjs/remark-rehype Transforms markdown into HTML to support HTML processing plugins 271
wcz-txp/unicode-url-for-textpattern Automatically converts non-ASCII characters in text links to UTF-8 URLs for improved SEO and readability 4
apostrophecms/sanitize-html A JavaScript library for cleaning up and sanitizing user-submitted HTML, removing unwanted content while preserving whitelisted elements and attributes. 3,833
webreflection/hyperhtml A lightweight virtual DOM alternative built on top of HTML template literals 3,070
remarkjs/remark Tools for processing and transforming markdown text into various formats. 7,703
haml/haml A templating engine for HTML written in Ruby, designed to simplify and beautify HTML document generation. 3,766
kkos/oniguruma A modern and flexible regular expressions library for text pattern matching 2,310
overbryd/myhtmlex Erlang/Elixir bindings for parsing and processing HTML documents 14
rehypejs/rehype-remark Transforms HTML into Markdown syntax tree to support remark 82
jhy/jsoup A Java library for parsing and manipulating HTML, XML, and CSS 10,949
markdown-it/linkify-it Automatically converts plain text links into clickable URLs with full unicode support 669
lexborisov/myhtml A fast HTML parsing library written in C 1,655
github/markup Converts raw markup to HTML for rendering on GitHub.com 5,870
archakov06/codex-to-html Converts JSON-blocks from EditorJS to HTML markup 15
zzzprojects/html-agility-pack An HTML parsing library that allows developers to parse and manipulate malformed HTML documents 2,652