pagser

HTML extractor

A tool for automatically extracting structured data from HTML pages

Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler

GitHub

105 stars
3 watching
7 forks
Language: Go
last commit: about 1 year ago
Linked from 2 awesome lists

collycrawlerdeserializationgogolanggoqueryhtmlpageparserscrapy

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
s0rg/crawley A utility for systematically extracting URLs from web pages and printing them to the console. 263
dwisiswant0/galer A tool to extract URLs from HTML attributes by crawling in and evaluating JavaScript 253
scrapy/scrapely A pure-python library for extracting structured data from HTML pages. 1,863
feichao93/temme A lightweight, CSS-based selector for extracting structured data from HTML documents. 273
philipjkim/goreadability Extracts readable content from web pages using Open Graph and traditional readability rules. 69
snjyor/htmlpageparser An HTML parsing library that converts web pages to structured data and then generates Markdown content from that data 1
plainas/tq Tool that extracts content from HTML documents based on CSS selectors 236
limiu82214/gojmapr A library to extract specific properties from complex JSON structures into Go structs with minimal code changes. 22
goose3/goose3 An article extraction tool that retrieves metadata and main content from web articles 830
jakopako/goskyr A tool to simplify web scraping of list-like structured data from web pages 35
iamstoxe/urlgrab A tool to crawl websites by exploring links recursively with support for JavaScript rendering. 330
miyagawa/web-scraper A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface. 104
pxyup/fitter A utility for extracting and processing data from various sources, including APIs, websites, and static text 119
slotix/dataflowkit A framework for extracting structured data from web pages using CSS selectors. 662
jmg/crawley A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. 186