jsoup

HTML parser

A Java library for parsing and manipulating HTML, XML, and CSS

jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.

GitHub

11k stars
395 watching
2k forks
Language: Java
last commit: 17 days ago
csscss-selectorsdomhtmljavajava-html-parserjsoupparserxmlxpath

Related projects:

Repository Description Stars
zhegexiaohuozi/jsoupxpath An HTML parser implementing W3C XPATH 1.0 syntax for Java. 452
fcannizzaro/jsoup-annotations A Java library that provides annotations to simplify HTML scraping and processing with Jsoup 239
egonschiele/handsomesoup A Haskell library that simplifies HTML parsing by providing CSS selectors and attribute extraction functions. 124
tjatse/node-readability Automates web page scraping and text extraction to make any webpage readable 343
cheeriojs/cheerio A fast and flexible HTML parser and DOM manipulator with jQuery-like API 28,692
jsdom/jsdom A pure-JavaScript implementation of various web standards for use with Node.js 20,560
ericchiang/pup A command line tool for parsing and manipulating HTML 8,116
ndmitchell/tagsoup A Haskell library for parsing and extracting information from HTML/XML documents 233
imangazaliev/didom A fast and simple HTML parser with support for CSS selectors and XPath expressions. 2,200
fb55/htmlparser2 A fast and forgiving HTML parser with a focus on minimal allocations 4,451
lexborisov/myhtml A fast HTML parsing library written in C 1,655
snjyor/htmlpageparser An HTML parsing library that converts web pages to structured data and then generates Markdown content from that data 1
js-devtools/rehype-url-inspector A plugin to inspect and manipulate URLs in HTML documents 19
javve/list.js A JavaScript library for adding search, sort, filters and flexibility to tables and lists in HTML elements. 11,204
aantron/lambdasoup A functional HTML scraping and manipulation library 383