lmms-eval

Model evaluation toolkit

Tools and evaluation framework for accelerating the development of large multimodal models by providing an efficient way to assess their performance

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

GitHub

2k stars
3 watching
168 forks
Language: Python
last commit: 1 day ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 56
chenllliang/mmevalpro A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. 22
mlgroupjlu/llm-eval-survey A repository of papers and resources for evaluating large language models. 1,450
allenai/olmo-eval A framework for evaluating language models on NLP tasks 326
mlabonne/llm-autoeval A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters. 566
mshukor/evalign-icl Evaluating and improving large multimodal models through in-context learning 21
open-compass/vlmevalkit An evaluation toolkit for large vision-language models 1,514
declare-lab/instruct-eval An evaluation framework for large language models trained with instruction tuning methods 535
prometheus-eval/prometheus-eval An open-source framework that enables language model evaluation using Prometheus and GPT4 820
esmvalgroup/esmvaltool A community-developed tool for evaluating climate models and providing diagnostic metrics. 230
h2oai/h2o-llm-eval An evaluation framework for large language models with Elo rating system and A/B testing capabilities 50
maluuba/nlg-eval A toolset for evaluating and comparing natural language generation models 1,350
huggingface/lighteval An all-in-one toolkit for evaluating Large Language Models (LLMs) across multiple backends. 879
evolvinglmms-lab/longva An open-source project that enables the transfer of language understanding to vision capabilities through long context processing. 347
modelscope/evalscope A framework for efficiently evaluating and benchmarking large models 308