LLM-eval-survey

LLM evaluation resource

A repository of papers and resources for evaluating large language models.

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

GitHub

1k stars

14 watching

92 forks

last commit: over 1 year ago

Linked from 1 awesome list

benchmarkevaluationlarge-language-modelsllmllmsmodel-assessment

Screenshot of MLGroupJLU/LLM-eval-survey website

arxiv.org/abs/2307.03109

Backlinks from these awesome lists:

filipecalegario/awesome-generative-ai

Related projects:

Repository	Description	Stars
aiverify-foundation/llm-evals-catalogue	A collaborative catalogue of LLM evaluation frameworks and papers	13
relari-ai/continuous-eval	Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics	455
evolvinglmms-lab/lmms-eval	Tools and evaluation framework for accelerating the development of large multimodal models by providing an efficient way to assess their performance	2,164
freedomintelligence/mllm-bench	Evaluates and compares the performance of multimodal large language models on various tasks	56
h2oai/h2o-llm-eval	An evaluation framework for large language models with Elo rating system and A/B testing capabilities	50
damo-nlp-sg/m3exam	A benchmark for evaluating large language models in multiple languages and formats	93
mlabonne/llm-autoeval	A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters.	566
nlpai-lab/kullm	Korea University Large Language Model developed by researchers at Korea University and HIAI Research Institute.	576
eugeneyan/open-llms	A curated list of commercial-use large language models	11,314
allenai/olmo-eval	A framework for evaluating language models on NLP tasks	326
psycoy/mixeval	An evaluation suite and dynamic data release platform for large language models	230
maluuba/nlg-eval	A toolset for evaluating and comparing natural language generation models	1,350
howiehwong/trustllm	A toolkit for assessing trustworthiness in large language models	491
km1994/llmsninestorydemontower	Exploring various LLMs and their applications in natural language processing and related areas	1,854
prometheus-eval/prometheus-eval	An open-source framework that enables language model evaluation using Prometheus and GPT4	820