Multi-Modality-Arena

Model arena

An evaluation platform for comparing multi-modality models on visual question-answering tasks

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

GitHub

467 stars
6 watching
35 forks
Language: Python
last commit: 7 months ago
chatchatbotchatgptgradiolarge-language-modelsllmsmulti-modalityvision-language-modelvqa

Related projects:

Repository Description Stars
open-mmlab/multimodal-gpt Trains a multimodal chatbot that combines visual and language instructions to generate responses 1,477
opengvlab/lamm A framework and benchmark for training and evaluating multi-modal large language models, enabling the development of AI agents capable of seamless interaction between humans and machines. 301
mbzuai-oryx/groundinglmm An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. 781
openbmb/viscpm A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages 1,089
mbzuai-oryx/video-chatgpt A video conversation model that generates meaningful conversations about videos using large vision and language models 1,213
nvlabs/eagle Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions 539
xverse-ai/xverse-v-13b A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. 77
multimodal-art-projection/omnibench Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. 14
open-compass/mmbench A collection of benchmarks to evaluate the multi-modal understanding capability of large vision language models. 163
tsb0601/mmvp An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. 288
mlpc-ucsd/bliva A multimodal LLM designed to handle text-rich visual questions 269
ailab-cvc/seed An implementation of a multimodal language model with capabilities for comprehension and generation 576
ucsc-vlaa/sight-beyond-text This repository provides an official implementation of a research paper exploring the use of multi-modal training to enhance language models' truthfulness and ethics in various applications. 19
wisconsinaivision/vip-llava A system designed to enable large multimodal models to understand arbitrary visual prompts 294
mlo-lab/muvi A software framework for multi-view latent variable modeling with domain-informed structured sparsity 29