alpaca_eval
Evaluator
An automatic evaluation tool for large language models
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
2k stars
8 watching
244 forks
Language: Jupyter Notebook
last commit: 10 days ago
Linked from 1 awesome list
deep-learningevaluationfoundation-modelsinstruction-followinglarge-language-modelsleaderboardnlprlhf
Related projects:
Repository | Description | Stars |
---|---|---|
declare-lab/instruct-eval | An evaluation framework for large language models trained with instruction tuning methods | 528 |
allenai/olmo-eval | An evaluation framework for large language models. | 310 |
maluuba/nlg-eval | A toolset for evaluating and comparing natural language generation models | 1,347 |
edublancas/sklearn-evaluation | A tool for evaluating and visualizing machine learning model performance | 3 |
openai/simple-evals | A library for evaluating language models using standardized prompts and benchmarking tests. | 1,939 |
huggingface/evaluate | An evaluation framework for machine learning models and datasets, providing standardized metrics and tools for comparing model performance. | 2,034 |
h2oai/h2o-llm-eval | An evaluation framework for large language models with Elo rating system and A/B testing capabilities | 50 |
obss/jury | A comprehensive toolkit for evaluating NLP experiments offering automated metrics and efficient computation. | 188 |
pkunlp-icler/pca-eval | An open-source benchmark and evaluation tool for assessing multimodal large language models' performance in embodied decision-making tasks | 100 |
ccapndave/elm-eexl | An expression parser and evaluator for Elm language, used to evaluate logical expressions in educational software. | 2 |
nullne/evaluator | An expression evaluator library written in Go. | 41 |
maja42/goval | A Go library for evaluating arbitrary arithmetic, string, and logic expressions with support for variables and custom functions. | 159 |
open-compass/vlmevalkit | A toolkit for evaluating large vision-language models on various benchmarks and datasets. | 1,343 |
evolvinglmms-lab/lmms-eval | Tools and evaluation suite for large multimodal models | 2,058 |
rlancemartin/auto-evaluator | An evaluation tool for question-answering systems using large language models and natural language processing techniques | 1,063 |