python-goose
Article Scraper
An HTML content extractor and web scraper for extracting article metadata and images from web pages
Html Content / Article Extractor, web scrapping lib in Python
4k stars
202 watching
786 forks
Language: HTML
last commit: about 3 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| An article extraction tool that retrieves metadata and main content from web articles | 840 |
| Extracts metadata from public documents found on websites, useful for brute-force attacks. | 1,050 |
| A tool for automatically extracting structured data from HTML pages | 105 |
| A utility that generates and delivers a daily newspaper to an e-ink tablet based on RSS feeds, news articles, and weather data. | 274 |
| A web crawling tool designed to extract structured data from the web for use in AI applications | 18,541 |
| A tool for creating and publishing static websites using Markdown and reStructuredText syntax in Python. | 12,636 |
| A tool for extracting detailed information from PDFs | 6,898 |
| A framework for extracting structured data from websites in a fast and elegant way | 23,444 |
| A tool to scrape and store hierarchical data about GitHub organizations, users, or repositories. | 127 |
| A tool for building reliable web scraping and browser automation pipelines in Node.js. | 16,081 |
| Extracts and stores information from various sources using AI models to generate answers. | 283 |
| A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines. | 78 |
| A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. | 2,037 |
| A collection of tools and extensions to the Google Earth Engine Python API for geospatial processing | 531 |
| Extracts indicators of compromise from PDF security reports | 430 |