LLM-eval-survey

LLM evaluation resource

A repository of papers and resources for evaluating large language models.

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

GitHub

1k stars
14 watching
92 forks
last commit: 8 months ago
Linked from 1 awesome list

benchmarkevaluationlarge-language-modelsllmllmsmodel-assessment

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
aiverify-foundation/llm-evals-catalogue A collaborative catalogue of LLM evaluation frameworks and papers 13
relari-ai/continuous-eval Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics 455
evolvinglmms-lab/lmms-eval Tools and evaluation framework for accelerating the development of large multimodal models by providing an efficient way to assess their performance 2,164
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 56
h2oai/h2o-llm-eval An evaluation framework for large language models with Elo rating system and A/B testing capabilities 50
damo-nlp-sg/m3exam A benchmark for evaluating large language models in multiple languages and formats 93
mlabonne/llm-autoeval A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters. 566
nlpai-lab/kullm Korea University Large Language Model developed by researchers at Korea University and HIAI Research Institute. 576
eugeneyan/open-llms A curated list of commercial-use large language models 11,314
allenai/olmo-eval A framework for evaluating language models on NLP tasks 326
psycoy/mixeval An evaluation suite and dynamic data release platform for large language models 230
maluuba/nlg-eval A toolset for evaluating and comparing natural language generation models 1,350
howiehwong/trustllm A toolkit for assessing trustworthiness in large language models 491
km1994/llmsninestorydemontower Exploring various LLMs and their applications in natural language processing and related areas 1,854
prometheus-eval/prometheus-eval An open-source framework that enables language model evaluation using Prometheus and GPT4 820