python-goose

Article Scraper

An HTML content extractor and web scraper for extracting article metadata and images from web pages

Html Content / Article Extractor, web scrapping lib in Python

GitHub

4k stars

202 watching

786 forks

Language: HTML

last commit: over 4 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

jobbole/awesome-python-cn

Related projects:

Repository	Description	Stars
goose3/goose3	An article extraction tool that retrieves metadata and main content from web articles	840
laramies/metagoofil	Extracts metadata from public documents found on websites, useful for brute-force attacks.	1,050
foolin/pagser	A tool for automatically extracting structured data from HTML pages	105
j6k4m8/goosepaper	A utility that generates and delivers a daily newspaper to an e-ink tablet based on RSS feeds, news articles, and weather data.	274
unclecode/crawl4ai	A web crawling tool designed to extract structured data from the web for use in AI applications	18,541
getpelican/pelican	A tool for creating and publishing static websites using Markdown and reStructuredText syntax in Python.	12,636
jsvine/pdfplumber	A tool for extracting detailed information from PDFs	6,898
gocolly/colly	A framework for extracting structured data from websites in a fast and elegant way	23,444
needmorecowbell/giggity	A tool to scrape and store hierarchical data about GitHub organizations, users, or repositories.	127
apify/crawlee	A tool for building reliable web scraping and browser automation pipelines in Node.js.	16,081
geeks-of-data/knowledge-gpt	Extracts and stores information from various sources using AI models to generate answers.	283
xyntopia/pydoxtools	A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines.	78
elliotgao2/gain	A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites.	2,037
gee-community/geetools	A collection of tools and extensions to the Google Earth Engine Python API for geospatial processing	531
armbues/ioc_parser	Extracts indicators of compromise from PDF security reports	430