LLM-eval-survey
LLM evaluation resource
A repository of papers and resources for evaluating large language models.
The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".
1k stars
14 watching
92 forks
last commit: 8 months ago
Linked from 1 awesome list
benchmarkevaluationlarge-language-modelsllmllmsmodel-assessment
Related projects:
Repository | Description | Stars |
---|---|---|
aiverify-foundation/llm-evals-catalogue | A collaborative catalogue of LLM evaluation frameworks and papers | 13 |
relari-ai/continuous-eval | Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics | 455 |
evolvinglmms-lab/lmms-eval | Tools and evaluation framework for accelerating the development of large multimodal models by providing an efficient way to assess their performance | 2,164 |
freedomintelligence/mllm-bench | Evaluates and compares the performance of multimodal large language models on various tasks | 56 |
h2oai/h2o-llm-eval | An evaluation framework for large language models with Elo rating system and A/B testing capabilities | 50 |
damo-nlp-sg/m3exam | A benchmark for evaluating large language models in multiple languages and formats | 93 |
mlabonne/llm-autoeval | A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters. | 566 |
nlpai-lab/kullm | Korea University Large Language Model developed by researchers at Korea University and HIAI Research Institute. | 576 |
eugeneyan/open-llms | A curated list of commercial-use large language models | 11,314 |
allenai/olmo-eval | A framework for evaluating language models on NLP tasks | 326 |
psycoy/mixeval | An evaluation suite and dynamic data release platform for large language models | 230 |
maluuba/nlg-eval | A toolset for evaluating and comparing natural language generation models | 1,350 |
howiehwong/trustllm | A toolkit for assessing trustworthiness in large language models | 491 |
km1994/llmsninestorydemontower | Exploring various LLMs and their applications in natural language processing and related areas | 1,854 |
prometheus-eval/prometheus-eval | An open-source framework that enables language model evaluation using Prometheus and GPT4 | 820 |