MiniGPT-4

Vision-Language Model

Enabling vision-language understanding by fine-tuning large language models on visual data.

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

25k stars

219 watching

3k forks

Language: Python

last commit: 11 months ago

Linked from 3 awesome lists

Screenshot of Vision-CAIR/MiniGPT-4 website

minigpt-4.github.io

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
google-research/big_vision	Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines.	2,439
dvlab-research/mgm	An open-source framework for training large language models with vision capabilities.	3,229
openbmb/minicpm-v	A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs.	12,870
google-research/vision_transformer	Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax	10,620
haotian-liu/llava	A system that uses large language and vision models to generate and process visual instructions	20,683
qwenlm/qwen-vl	A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks	5,179
qwenlm/qwen2-vl	A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text.	3,613
internlm/internlm-xcomposer	A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition	2,616
borisdayma/dalle-mini	Generates images from text prompts using a variant of the DALL-E model	14,756
cszn/kair	Image restoration toolbox with training and testing codes for various deep learning-based methods	2,994
pku-yuangroup/video-llava	A deep learning framework for generating videos from text inputs and visual features.	3,071
opengvlab/internvl	Develops large language models capable of processing multiple data types and modalities	6,394
donnyyou/torchcv	A comprehensive PyTorch-based framework for computer vision tasks	2,249
nvlabs/prismer	A deep learning framework for training multi-modal models with vision and language capabilities.	1,299
clovaai/stargan-v2	A Python implementation of an image-to-image translation model for generating diverse images across multiple domains.	3,513