python-goose
Article Scraper
An HTML content extractor and web scraper for extracting article metadata and images from web pages
Html Content / Article Extractor, web scrapping lib in Python
4k stars
202 watching
786 forks
Language: HTML
last commit: about 3 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
goose3/goose3 | An article extraction tool that retrieves metadata and main content from web articles | 840 |
laramies/metagoofil | Extracts metadata from public documents found on websites, useful for brute-force attacks. | 1,050 |
foolin/pagser | A tool for automatically extracting structured data from HTML pages | 105 |
j6k4m8/goosepaper | A utility that generates and delivers a daily newspaper to an e-ink tablet based on RSS feeds, news articles, and weather data. | 274 |
unclecode/crawl4ai | A web crawling tool designed to extract structured data from the web for use in AI applications | 18,541 |
getpelican/pelican | A tool for creating and publishing static websites using Markdown and reStructuredText syntax in Python. | 12,636 |
jsvine/pdfplumber | A tool for extracting detailed information from PDFs | 6,898 |
gocolly/colly | A framework for extracting structured data from websites in a fast and elegant way | 23,444 |
needmorecowbell/giggity | A tool to scrape and store hierarchical data about GitHub organizations, users, or repositories. | 127 |
apify/crawlee | A tool for building reliable web scraping and browser automation pipelines in Node.js. | 16,081 |
geeks-of-data/knowledge-gpt | Extracts and stores information from various sources using AI models to generate answers. | 283 |
xyntopia/pydoxtools | A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines. | 78 |
elliotgao2/gain | A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. | 2,037 |
gee-community/geetools | A collection of tools and extensions to the Google Earth Engine Python API for geospatial processing | 531 |
armbues/ioc_parser | Extracts indicators of compromise from PDF security reports | 430 |