SEED-Bench
Multimodal LLM test suite
A benchmark for evaluating large language models' ability to process multimodal input
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
315 stars
4 watching
12 forks
Language: Python
last commit: 4 months ago Related projects:
Repository | Description | Stars |
---|---|---|
ailab-cvc/seed | An implementation of a multimodal language model with capabilities for comprehension and generation | 576 |
multimodal-art-projection/omnibench | Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 14 |
aifeg/benchlmm | An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models | 83 |
mshukor/evalign-icl | Evaluating and improving large multimodal models through in-context learning | 20 |
freedomintelligence/mllm-bench | Evaluates and compares the performance of multimodal large language models on various tasks | 55 |
khanrc/honeybee | An implementation of a multimodal language model using locality-enhanced projection techniques | 432 |
yuliang-liu/multimodalocr | An evaluation benchmark for OCR capabilities in large multmodal models. | 471 |
jvalegre/robert | Automated machine learning protocols for cheminformatics using Python | 38 |
damo-nlp-sg/m3exam | A benchmark for evaluating large language models in multiple languages and formats | 92 |
ys-zong/vl-icl | A benchmarking suite for multimodal in-context learning models | 28 |
mlpc-ucsd/bliva | A multimodal LLM designed to handle text-rich visual questions | 269 |
fuxiaoliu/mmc | Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models. | 84 |
qcri/llmebench | A benchmarking framework for large language models | 80 |
cloud-cv/evalai | A platform for comparing and evaluating AI and machine learning algorithms at scale | 1,771 |
junyangwang0410/amber | An LLM-free benchmark suite for evaluating MLLMs' hallucination capabilities in various tasks and dimensions | 93 |