MAGIC

Visual Guidance Model

Enables language models to generate text based on visual inputs and captions images without requiring explicit training or labeling data.

Language Models Can See: Plugging Visual Controls in Text Generation

GitHub

254 stars
11 watching
27 forks
Language: Python
last commit: over 2 years ago
clipgpt-2image-captioningmultimodalplug-and-play-language-modelsstory-generationtext-generationunsupervised-learningzero-shot