LLM-eval-survey

LLM evaluation resource

A repository of papers and resources for evaluating large language models.

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

GitHub

1k stars
14 watching
90 forks
last commit: 6 months ago
Linked from 1 awesome list

benchmarkevaluationlarge-language-modelsllmllmsmodel-assessment

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
aiverify-foundation/llm-evals-catalogue A collaborative catalogue of Large Language Model evaluation frameworks and papers. 14
relari-ai/continuous-eval Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics 446
evolvinglmms-lab/lmms-eval Tools and evaluation suite for large multimodal models 2,058
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 55
h2oai/h2o-llm-eval An evaluation framework for large language models with Elo rating system and A/B testing capabilities 50
damo-nlp-sg/m3exam A benchmark for evaluating large language models in multiple languages and formats 92
mlabonne/llm-autoeval A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters. 558
nlpai-lab/kullm Korea University Large Language Model developed by researchers at Korea University and HIAI Research Institute. 569
eugeneyan/open-llms A curated list of commercial-use large language models 11,222
allenai/olmo-eval An evaluation framework for large language models. 311
psycoy/mixeval An evaluation suite and dynamic data release platform for large language models 224
maluuba/nlg-eval A toolset for evaluating and comparing natural language generation models 1,349
howiehwong/trustllm A toolkit for assessing trustworthiness in large language models 466
km1994/llmsninestorydemontower Exploring various LLMs and their applications in natural language processing and related areas 1,798
prometheus-eval/prometheus-eval An open-source framework that enables language model evaluation using Prometheus and GPT4 796