pagser
HTML extractor
A tool for automatically extracting structured data from HTML pages
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler
105 stars
3 watching
7 forks
Language: Go
last commit: about 1 year ago
Linked from 2 awesome lists
collycrawlerdeserializationgogolanggoqueryhtmlpageparserscrapy
Related projects:
Repository | Description | Stars |
---|---|---|
s0rg/crawley | A utility for systematically extracting URLs from web pages and printing them to the console. | 263 |
dwisiswant0/galer | A tool to extract URLs from HTML attributes by crawling in and evaluating JavaScript | 253 |
scrapy/scrapely | A pure-python library for extracting structured data from HTML pages. | 1,863 |
feichao93/temme | A lightweight, CSS-based selector for extracting structured data from HTML documents. | 273 |
philipjkim/goreadability | Extracts readable content from web pages using Open Graph and traditional readability rules. | 69 |
snjyor/htmlpageparser | An HTML parsing library that converts web pages to structured data and then generates Markdown content from that data | 1 |
plainas/tq | Tool that extracts content from HTML documents based on CSS selectors | 236 |
limiu82214/gojmapr | A library to extract specific properties from complex JSON structures into Go structs with minimal code changes. | 22 |
goose3/goose3 | An article extraction tool that retrieves metadata and main content from web articles | 830 |
jakopako/goskyr | A tool to simplify web scraping of list-like structured data from web pages | 35 |
iamstoxe/urlgrab | A tool to crawl websites by exploring links recursively with support for JavaScript rendering. | 330 |
miyagawa/web-scraper | A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface. | 104 |
pxyup/fitter | A utility for extracting and processing data from various sources, including APIs, websites, and static text | 119 |
slotix/dataflowkit | A framework for extracting structured data from web pages using CSS selectors. | 662 |
jmg/crawley | A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. | 186 |