MGM

Vision-LM Framework

An open-source framework for training large language models with vision capabilities.

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

GitHub

3k stars

28 watching

281 forks

Language: Python

last commit: about 2 years ago

generationlarge-language-modelsvision-language-model

Related projects:

Repository	Description	Stars
haotian-liu/llava	A system that uses large language and vision models to generate and process visual instructions	20,683
alpha-vllm/llama2-accessory	An open-source toolkit for pretraining and fine-tuning large language models	2,732
opengvlab/llama-adapter	An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy	5,775
llava-vl/llava-next	Develops large multimodal models for various computer vision tasks including image and video analysis	3,099
borisdayma/dalle-mini	Generates images from text prompts using a variant of the DALL-E model	14,756
vision-cair/minigpt-4	Enabling vision-language understanding by fine-tuning large language models on visual data.	25,490
pku-yuangroup/video-llava	A deep learning framework for generating videos from text inputs and visual features.	3,071
openbmb/minicpm-v	A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs.	12,870
nvlabs/eagle	Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions	549
eleutherai/lm-evaluation-harness	Provides a unified framework to test generative language models on various evaluation tasks.	7,200
amazon-science/mm-cot	An implementation of multimodal chain-of-thought reasoning in language models using a decoupled training framework for rationale generation and answer inference.	3,833
open-mmlab/mmcv	Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops.	5,948
yfzhang114/llava-align	Debiasing techniques to minimize hallucinations in large visual language models	75
qwenlm/qwen-vl	A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks	5,179
luodian/otter	A multi-modal AI model developed for improved instruction-following and in-context learning, utilizing large-scale architectures and various training datasets.	3,570