Multi-Modality-Arena
Model arena
An evaluation platform for comparing multi-modality models on visual question-answering tasks
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
467 stars
6 watching
35 forks
Language: Python
last commit: 7 months ago chatchatbotchatgptgradiolarge-language-modelsllmsmulti-modalityvision-language-modelvqa
Related projects:
Repository | Description | Stars |
---|---|---|
open-mmlab/multimodal-gpt | Trains a multimodal chatbot that combines visual and language instructions to generate responses | 1,477 |
opengvlab/lamm | A framework and benchmark for training and evaluating multi-modal large language models, enabling the development of AI agents capable of seamless interaction between humans and machines. | 301 |
mbzuai-oryx/groundinglmm | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. | 781 |
openbmb/viscpm | A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages | 1,089 |
mbzuai-oryx/video-chatgpt | A video conversation model that generates meaningful conversations about videos using large vision and language models | 1,213 |
nvlabs/eagle | Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions | 539 |
xverse-ai/xverse-v-13b | A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. | 77 |
multimodal-art-projection/omnibench | Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 14 |
open-compass/mmbench | A collection of benchmarks to evaluate the multi-modal understanding capability of large vision language models. | 163 |
tsb0601/mmvp | An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. | 288 |
mlpc-ucsd/bliva | A multimodal LLM designed to handle text-rich visual questions | 269 |
ailab-cvc/seed | An implementation of a multimodal language model with capabilities for comprehension and generation | 582 |
ucsc-vlaa/sight-beyond-text | This repository provides an official implementation of a research paper exploring the use of multi-modal training to enhance language models' truthfulness and ethics in various applications. | 19 |
wisconsinaivision/vip-llava | A system designed to enable large multimodal models to understand arbitrary visual prompts | 294 |
mlo-lab/muvi | A software framework for multi-view latent variable modeling with domain-informed structured sparsity | 29 |