python-goose
Article Scraper
An HTML content extractor and web scraper for extracting article metadata and images from web pages
Html Content / Article Extractor, web scrapping lib in Python
4k stars
202 watching
786 forks
Language: HTML
last commit: almost 4 years ago
Linked from 1 awesome list
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | An article extraction tool that retrieves metadata and main content from web articles | 840 |
| | Extracts metadata from public documents found on websites, useful for brute-force attacks. | 1,050 |
| | A tool for automatically extracting structured data from HTML pages | 105 |
| | A utility that generates and delivers a daily newspaper to an e-ink tablet based on RSS feeds, news articles, and weather data. | 274 |
| | A web crawling tool designed to extract structured data from the web for use in AI applications | 18,541 |
| | A tool for creating and publishing static websites using Markdown and reStructuredText syntax in Python. | 12,636 |
| | A tool for extracting detailed information from PDFs | 6,898 |
| | A framework for extracting structured data from websites in a fast and elegant way | 23,444 |
| | A tool to scrape and store hierarchical data about GitHub organizations, users, or repositories. | 127 |
| | A tool for building reliable web scraping and browser automation pipelines in Node.js. | 16,081 |
| | Extracts and stores information from various sources using AI models to generate answers. | 283 |
| | A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines. | 78 |
| | A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. | 2,037 |
| | A collection of tools and extensions to the Google Earth Engine Python API for geospatial processing | 531 |
| | Extracts indicators of compromise from PDF security reports | 430 |