lambdasoup
HTML scraper
A functional HTML scraping and manipulation library in OCaml
Functional HTML scraping and rewriting with CSS in OCaml
384 stars
12 watching
31 forks
Language: OCaml
last commit: 5 days ago
Linked from 1 awesome list
csshtmlocamlscrapingsoup
Related projects:
Repository | Description | Stars |
---|---|---|
aantron/markup.ml | A streaming HTML5 and XML parser that detects character encodings, emits signals, and provides error recovery. | 146 |
fcannizzaro/jsoup-annotations | A Java library that provides annotations to simplify HTML scraping and processing with Jsoup | 239 |
miyagawa/web-scraper | A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface. | 104 |
meilisearch/docs-scraper | Automates scraping and indexing of documentation content into a search engine | 290 |
pharo-contributions/soup | An HTML parsing and scraping library for Pharo | 6 |
egonschiele/handsomesoup | A Haskell library that simplifies HTML parsing by providing CSS selectors and attribute extraction functions. | 124 |
oscarotero/embed | A PHP library to extract metadata and embeddable code from any web page using various protocols and scraping techniques. | 2,091 |
felipecsl/wombat | A Ruby-based web crawler and data extraction tool with an elegant DSL. | 1,315 |
bendeaton/abaqus-documentation-scraper | Extracts keywords and parameters from Abaqus documentation for syntax highlighting plugin | 3 |
jakopako/goskyr | A tool to simplify web scraping of list-like structured data from web pages | 35 |
the-markup/blacklight-collector | A tool for scraping website content and analyzing browser behavior | 202 |
malfrats/xeuledoc | A tool to fetch information about public Google documents from various services | 846 |
scrapy/scrapely | A pure-python library for extracting structured data from HTML pages. | 1,863 |
michaelhelmick/lassie | Library for retrieving basic content from websites | 613 |
laramies/metagoofil | Extracts metadata from public documents available on websites | 1,028 |