MiniGPT-4
Vision-Language Model
Enabling vision-language understanding by fine-tuning large language models on visual data.
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
25k stars
218 watching
3k forks
Language: Python
last commit: 3 months ago
Linked from 3 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
google-research/big_vision | Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines. | 2,334 |
dvlab-research/mgm | An open-source framework for training large language models with vision capabilities. | 3,211 |
openbmb/minicpm-v | A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. | 12,619 |
google-research/vision_transformer | Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax | 10,450 |
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,232 |
qwenlm/qwen-vl | A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks | 5,045 |
qwenlm/qwen2-vl | A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text. | 3,093 |
internlm/internlm-xcomposer | A large vision language model that can understand and generate text from visual inputs, with capabilities for long-contextual input and output, high-resolution understanding, fine-grained video understanding, and multi-turn multi-image dialogue. | 2,521 |
borisdayma/dalle-mini | Generates images from text prompts using a variant of the DALL-E model | 14,751 |
cszn/kair | Image restoration toolbox with training and testing codes for various deep learning-based methods | 2,957 |
pku-yuangroup/video-llava | This project enables large language models to perform visual reasoning capabilities on images and videos simultaneously by learning united visual representations before projection. | 2,990 |
opengvlab/internvl | A pioneering open-source alternative to commercial multimodal models with a family of large-scale language and vision models. | 6,014 |
donnyyou/torchcv | A comprehensive PyTorch-based framework for computer vision tasks | 2,250 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,298 |
clovaai/stargan-v2 | A Python implementation of an image-to-image translation model for generating diverse images across multiple domains. | 3,500 |