MiniGPT-4
Vision-Language Model
Enabling vision-language understanding by fine-tuning large language models on visual data.
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
25k stars
219 watching
3k forks
Language: Python
last commit: 6 months ago
Linked from 3 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
| Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines. | 2,439 |
| An open-source framework for training large language models with vision capabilities. | 3,229 |
| A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. | 12,870 |
| Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax | 10,620 |
| A system that uses large language and vision models to generate and process visual instructions | 20,683 |
| A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks | 5,179 |
| A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text. | 3,613 |
| A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition | 2,616 |
| Generates images from text prompts using a variant of the DALL-E model | 14,756 |
| Image restoration toolbox with training and testing codes for various deep learning-based methods | 2,994 |
| A deep learning framework for generating videos from text inputs and visual features. | 3,071 |
| Develops large language models capable of processing multiple data types and modalities | 6,394 |
| A comprehensive PyTorch-based framework for computer vision tasks | 2,249 |
| A deep learning framework for training multi-modal models with vision and language capabilities. | 1,299 |
| A Python implementation of an image-to-image translation model for generating diverse images across multiple domains. | 3,513 |