metagoofil

Document scraper

Extracts metadata from public documents available on websites

Metadata harvester

GitHub

1k stars
58 watching
205 forks
Language: Python
last commit: 8 months ago
Linked from 2 awesome lists


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
gomoob/php-metadata-extractor A PHP wrapper to call the Java metadata-extractor library. 9
meilisearch/docs-scraper Automates scraping and indexing of documentation content into a search engine 290
jaimeiniesta/metainspector A Ruby gem for web scraping and extracting metadata from web pages. 1,036
needmorecowbell/giggity A tool to scrape and store hierarchical data about GitHub organizations, users, or repositories. 126
erikriver/opengraph A Python module to extract and parse metadata from web pages using the Open Graph Protocol. 228
unkl4b/gitminer Automated tool for gathering code information from Github repositories 2,092
neon-jungle/wagtail-metadata A tool to help with metadata for search engines and social media platforms. 116
barasher/go-exiftool A Go wrapper around ExifTool to extract metadata from various file types. 252
davemolk/gogetjs Tools for extracting and analyzing JavaScript files from web pages 40
pachterlab/ffq A tool to fetch and display metadata from various public databases 551
jkongie/mobi An Ruby Gem to extract metadata from MOBI files 38
michaelhelmick/lassie Library for retrieving basic content from websites 613
jgomezdans/get_modis Downloads MODIS data from the USGS repository using a standardized interface 62
aantron/lambdasoup A functional HTML scraping and manipulation library 383
holgerd77/django-dynamic-scraper An app that allows you to manage Scrapy spiders through a Django admin interface. 1,153