MGM
Vision-LM Framework
An open-source framework for training large language models with vision capabilities.
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
3k stars
28 watching
278 forks
Language: Python
last commit: 7 months ago generationlarge-language-modelsvision-language-model
Related projects:
Repository | Description | Stars |
---|---|---|
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,232 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,720 |
opengvlab/llama-adapter | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,754 |
llava-vl/llava-next | Develops large multimodal models for various computer vision tasks including image and video analysis | 2,872 |
borisdayma/dalle-mini | Generates images from text prompts using a variant of the DALL-E model | 14,751 |
vision-cair/minigpt-4 | Enabling vision-language understanding by fine-tuning large language models on visual data. | 25,422 |
pku-yuangroup/video-llava | This project enables large language models to perform visual reasoning capabilities on images and videos simultaneously by learning united visual representations before projection. | 2,990 |
openbmb/minicpm-v | A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. | 12,619 |
nvlabs/eagle | Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions | 539 |
eleutherai/lm-evaluation-harness | Provides a unified framework to test generative language models on various evaluation tasks. | 6,970 |
amazon-science/mm-cot | An implementation of multimodal chain-of-thought reasoning in language models using a decoupled training framework for rationale generation and answer inference. | 3,810 |
open-mmlab/mmcv | Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops. | 5,906 |
yfzhang114/llava-align | Debiasing techniques to minimize hallucinations in large visual language models | 71 |
qwenlm/qwen-vl | A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks | 5,045 |
luodian/otter | A multi-modal AI model developed for improved instruction-following and in-context learning, utilizing large-scale architectures and various training datasets. | 3,563 |