lmms-eval

Model evaluation toolkit

Tools and evaluation framework for accelerating the development of large multimodal models by providing an efficient way to assess their performance

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

GitHub

2k stars

3 watching

168 forks

Language: Python

last commit: over 1 year ago

Linked from 1 awesome list

Screenshot of EvolvingLMMs-Lab/lmms-eval website

lmms-lab.framer.ai/

Backlinks from these awesome lists:

ethicalml/awesome-production-machine-learning

Related projects:

Repository	Description	Stars
freedomintelligence/mllm-bench	Evaluates and compares the performance of multimodal large language models on various tasks	56
chenllliang/mmevalpro	A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline.	22
mlgroupjlu/llm-eval-survey	A repository of papers and resources for evaluating large language models.	1,450
allenai/olmo-eval	A framework for evaluating language models on NLP tasks	326
mlabonne/llm-autoeval	A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters.	566
mshukor/evalign-icl	Evaluating and improving large multimodal models through in-context learning	21
open-compass/vlmevalkit	An evaluation toolkit for large vision-language models	1,514
declare-lab/instruct-eval	An evaluation framework for large language models trained with instruction tuning methods	535
prometheus-eval/prometheus-eval	An open-source framework that enables language model evaluation using Prometheus and GPT4	820
esmvalgroup/esmvaltool	A community-developed tool for evaluating climate models and providing diagnostic metrics.	230
h2oai/h2o-llm-eval	An evaluation framework for large language models with Elo rating system and A/B testing capabilities	50
maluuba/nlg-eval	A toolset for evaluating and comparing natural language generation models	1,350
huggingface/lighteval	An all-in-one toolkit for evaluating Large Language Models (LLMs) across multiple backends.	879
evolvinglmms-lab/longva	An open-source project that enables the transfer of language understanding to vision capabilities through long context processing.	347
modelscope/evalscope	A framework for efficiently evaluating and benchmarking large models	308