LLM-eval-survey
LLM evaluation resource
A repository of papers and resources for evaluating large language models.
The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".
1k stars
14 watching
90 forks
last commit: 6 months ago
Linked from 1 awesome list
benchmarkevaluationlarge-language-modelsllmllmsmodel-assessment
Related projects:
Repository | Description | Stars |
---|---|---|
aiverify-foundation/llm-evals-catalogue | A collaborative catalogue of Large Language Model evaluation frameworks and papers. | 14 |
relari-ai/continuous-eval | Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics | 446 |
evolvinglmms-lab/lmms-eval | Tools and evaluation suite for large multimodal models | 2,058 |
freedomintelligence/mllm-bench | Evaluates and compares the performance of multimodal large language models on various tasks | 55 |
h2oai/h2o-llm-eval | An evaluation framework for large language models with Elo rating system and A/B testing capabilities | 50 |
damo-nlp-sg/m3exam | A benchmark for evaluating large language models in multiple languages and formats | 92 |
mlabonne/llm-autoeval | A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters. | 558 |
nlpai-lab/kullm | Korea University Large Language Model developed by researchers at Korea University and HIAI Research Institute. | 569 |
eugeneyan/open-llms | A curated list of commercial-use large language models | 11,222 |
allenai/olmo-eval | An evaluation framework for large language models. | 311 |
psycoy/mixeval | An evaluation suite and dynamic data release platform for large language models | 224 |
maluuba/nlg-eval | A toolset for evaluating and comparing natural language generation models | 1,349 |
howiehwong/trustllm | A toolkit for assessing trustworthiness in large language models | 466 |
km1994/llmsninestorydemontower | Exploring various LLMs and their applications in natural language processing and related areas | 1,798 |
prometheus-eval/prometheus-eval | An open-source framework that enables language model evaluation using Prometheus and GPT4 | 796 |