LLM-Evals-Catalogue

LLM Eval Catalogue

A collaborative catalogue of Large Language Model evaluation frameworks and papers.

This repository stems from our paper, “Cataloguing LLM Evaluations”, and serves as a living, collaborative catalogue of LLM evaluation frameworks, benchmarks and papers.

GitHub

14 stars
1 watching
2 forks
last commit: about 1 year ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
mlgroupjlu/llm-eval-survey A repository of papers and resources for evaluating large language models. 1,433
relari-ai/continuous-eval Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics 446
h2oai/h2o-llm-eval An evaluation framework for large language models with Elo rating system and A/B testing capabilities 50
huggingface/lighteval A toolkit for evaluating Large Language Models across multiple backends 804
evolvinglmms-lab/lmms-eval Tools and evaluation suite for large multimodal models 2,058
modelscope/evalscope A framework for efficient large model evaluation and performance benchmarking. 248
open-evals/evals A framework for evaluating OpenAI models and an open-source registry of benchmarks. 19
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 55
aifeg/benchlmm An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models 82
mlabonne/llm-autoeval A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters. 558
allenai/olmo-eval An evaluation framework for large language models. 311
declare-lab/instruct-eval An evaluation framework for large language models trained with instruction tuning methods 528
chenllliang/mmevalpro A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. 22
psycoy/mixeval An evaluation suite and dynamic data release platform for large language models 224
volcengine/verl A flexible and efficient reinforcement learning framework designed for large language models. 315