pagser

HTML extractor

A tool for automatically extracting structured data from HTML pages

Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler

GitHub

105 stars

3 watching

8 forks

Language: Go

last commit: almost 2 years ago

Linked from 2 awesome lists

collycrawlerdeserializationgogolanggoqueryhtmlpageparserscrapy

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
s0rg/crawley	A utility for systematically extracting URLs from web pages and printing them to the console.	268
dwisiswant0/galer	A tool to extract URLs from HTML attributes by crawling in and evaluating JavaScript	255
scrapy/scrapely	A pure-python library for extracting structured data from HTML pages.	1,865
feichao93/temme	A lightweight, CSS-based selector for extracting structured data from HTML documents.	273
philipjkim/goreadability	Extracts readable content from web pages using Open Graph and traditional readability rules.	69
snjyor/htmlpageparser	An HTML parsing library that converts web pages to structured data and then generates Markdown content from that data	1
plainas/tq	Tool that extracts content from HTML documents based on CSS selectors	236
limiu82214/gojmapr	A library to extract specific properties from complex JSON structures into Go structs with minimal code changes.	22
goose3/goose3	An article extraction tool that retrieves metadata and main content from web articles	840
jakopako/goskyr	A tool to simplify web scraping of list-like structured data from web pages	36
iamstoxe/urlgrab	A tool to crawl websites by exploring links recursively with support for JavaScript rendering.	331
miyagawa/web-scraper	A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface.	104
pxyup/fitter	A utility for extracting and processing data from various sources, including APIs, websites, and static text	120
slotix/dataflowkit	A framework for extracting structured data from web pages using CSS selectors.	667
jmg/crawley	A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options.	188