pagser
HTML extractor
A tool for automatically extracting structured data from HTML pages
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler
105 stars
3 watching
8 forks
Language: Go
last commit: over 1 year ago
Linked from 2 awesome lists
collycrawlerdeserializationgogolanggoqueryhtmlpageparserscrapy
Related projects:
Repository | Description | Stars |
---|---|---|
| A utility for systematically extracting URLs from web pages and printing them to the console. | 268 |
| A tool to extract URLs from HTML attributes by crawling in and evaluating JavaScript | 255 |
| A pure-python library for extracting structured data from HTML pages. | 1,865 |
| A lightweight, CSS-based selector for extracting structured data from HTML documents. | 273 |
| Extracts readable content from web pages using Open Graph and traditional readability rules. | 69 |
| An HTML parsing library that converts web pages to structured data and then generates Markdown content from that data | 1 |
| Tool that extracts content from HTML documents based on CSS selectors | 236 |
| A library to extract specific properties from complex JSON structures into Go structs with minimal code changes. | 22 |
| An article extraction tool that retrieves metadata and main content from web articles | 840 |
| A tool to simplify web scraping of list-like structured data from web pages | 36 |
| A tool to crawl websites by exploring links recursively with support for JavaScript rendering. | 331 |
| A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface. | 104 |
| A utility for extracting and processing data from various sources, including APIs, websites, and static text | 120 |
| A framework for extracting structured data from web pages using CSS selectors. | 667 |
| A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. | 188 |