Multi-Modality-Arena
Model arena
An evaluation platform for comparing multi-modality models on visual question-answering tasks
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
478 stars
6 watching
36 forks
Language: Python
last commit: 9 months ago chatchatbotchatgptgradiolarge-language-modelsllmsmulti-modalityvision-language-modelvqa
Related projects:
Repository | Description | Stars |
---|---|---|
open-mmlab/multimodal-gpt | Trains a multimodal chatbot that combines visual and language instructions to generate responses | 1,478 |
opengvlab/lamm | A framework and benchmark for training and evaluating multi-modal large language models, enabling the development of AI agents capable of seamless interaction between humans and machines. | 305 |
mbzuai-oryx/groundinglmm | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations | 797 |
openbmb/viscpm | A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages | 1,098 |
mbzuai-oryx/video-chatgpt | A video conversation model that generates meaningful conversations about videos using large vision and language models | 1,246 |
nvlabs/eagle | Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions | 549 |
xverse-ai/xverse-v-13b | A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. | 78 |
multimodal-art-projection/omnibench | Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 15 |
open-compass/mmbench | A collection of benchmarks to evaluate the multi-modal understanding capability of large vision language models. | 168 |
tsb0601/mmvp | An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. | 296 |
mlpc-ucsd/bliva | A multimodal LLM designed to handle text-rich visual questions | 270 |
ailab-cvc/seed | An implementation of a multimodal language model with capabilities for comprehension and generation | 585 |
ucsc-vlaa/sight-beyond-text | An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models | 19 |
wisconsinaivision/vip-llava | A system designed to enable large multimodal models to understand arbitrary visual prompts | 302 |
mlo-lab/muvi | A software framework for multi-view latent variable modeling with domain-informed structured sparsity | 27 |