MM-Vet
Model evaluator
Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
267 stars
2 watching
11 forks
Language: Python
last commit: 17 days ago Related projects:
Repository | Description | Stars |
---|---|---|
zhourax/vega | Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
tsb0601/mmvp | An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. | 288 |
yuliang-liu/monkey | A toolkit for building conversational AI models that can process images and text inputs. | 1,825 |
chenllliang/mmevalpro | A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. | 22 |
mshukor/evalign-icl | Evaluating and improving large multimodal models through in-context learning | 20 |
freedomintelligence/mllm-bench | Evaluates and compares the performance of multimodal large language models on various tasks | 55 |
allenai/olmo-eval | An evaluation framework for large language models. | 310 |
haozhezhao/mic | Develops a multimodal vision-language model to enable machines to understand complex relationships between instructions and images in various tasks. | 334 |
evolvinglmms-lab/lmms-eval | Tools and evaluation suite for large multimodal models | 2,058 |
yfzhang114/slime | Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. | 137 |
fuxiaoliu/mmc | Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models. | 84 |
mikegu721/xiezhibenchmark | An evaluation suite to assess language models' performance in multi-choice questions | 91 |
yuliang-liu/multimodalocr | An evaluation benchmark for OCR capabilities in large multmodal models. | 471 |
tiger-ai-lab/uniir | Trains and evaluates a universal multimodal retrieval model to perform various information retrieval tasks. | 110 |
felixgithub2017/mmcu | Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. | 87 |