MMVP
Visual model evaluation
An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks.
296 stars
10 watching
7 forks
Language: Python
last commit: about 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
| Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
| A framework for efficiently evaluating and benchmarking large models | 308 |
| Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics | 274 |
| A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. | 22 |
| Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 15 |
| A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages | 1,098 |
| A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. | 78 |
| An evaluation framework for machine learning models and datasets, providing standardized metrics and tools for comparing model performance. | 2,063 |
| Evaluates and compares the performance of multimodal large language models on various tasks | 56 |
| An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models | 84 |
| A software framework for multi-view latent variable modeling with domain-informed structured sparsity | 27 |
| An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. | 1,849 |
| Tools and evaluation framework for accelerating the development of large multimodal models by providing an efficient way to assess their performance | 2,164 |
| An AI-powered system that leverages multimodal reasoning and action to analyze visual data and provide insights | 940 |
| An evaluation platform for comparing multi-modality models on visual question-answering tasks | 478 |