alpaca_eval
Evaluator
An automatic evaluation tool for large language models
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
2k stars
8 watching
245 forks
Language: Jupyter Notebook
last commit: 3 months ago
Linked from 1 awesome list
deep-learningevaluationfoundation-modelsinstruction-followinglarge-language-modelsleaderboardnlprlhf
Related projects:
Repository | Description | Stars |
---|---|---|
| An evaluation framework for large language models trained with instruction tuning methods | 535 |
| A framework for evaluating language models on NLP tasks | 326 |
| A toolset for evaluating and comparing natural language generation models | 1,350 |
| A tool for evaluating and visualizing machine learning model performance | 3 |
| Evaluates language models using standardized benchmarks and prompting techniques. | 2,059 |
| An evaluation framework for machine learning models and datasets, providing standardized metrics and tools for comparing model performance. | 2,063 |
| An evaluation framework for large language models with Elo rating system and A/B testing capabilities | 50 |
| A comprehensive toolkit for evaluating NLP experiments offering automated metrics and efficient computation. | 187 |
| An open-source benchmark and evaluation tool for assessing multimodal large language models' performance in embodied decision-making tasks | 99 |
| An expression parser and evaluator for Elm language, used to evaluate logical expressions in educational software. | 2 |
| An expression evaluator library written in Go. | 41 |
| A Go library for evaluating arbitrary arithmetic, string, and logic expressions with support for variables and custom functions. | 160 |
| An evaluation toolkit for large vision-language models | 1,514 |
| Tools and evaluation framework for accelerating the development of large multimodal models by providing an efficient way to assess their performance | 2,164 |
| An evaluation tool for question-answering systems using large language models and natural language processing techniques | 1,065 |