FlagEval
Model evaluation framework
An evaluation toolkit and platform for assessing large models in various domains
FlagEval is an evaluation toolkit for AI large foundation models.
307 stars
13 watching
27 forks
Language: Python
last commit: 6 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
flagai-open/aquila2 | Provides pre-trained language models and tools for fine-tuning and evaluation | 439 |
allenai/olmo-eval | A framework for evaluating language models on NLP tasks | 326 |
huggingface/lighteval | An all-in-one toolkit for evaluating Large Language Models (LLMs) across multiple backends. | 879 |
open-evals/evals | A framework for evaluating OpenAI models and an open-source registry of benchmarks. | 19 |
modelscope/evalscope | A framework for efficiently evaluating and benchmarking large models | 308 |
huggingface/evaluate | An evaluation framework for machine learning models and datasets, providing standardized metrics and tools for comparing model performance. | 2,063 |
openai/simple-evals | Evaluates language models using standardized benchmarks and prompting techniques. | 2,059 |
psycoy/mixeval | An evaluation suite and dynamic data release platform for large language models | 230 |
stanford-crfm/helm | A framework to evaluate and compare language models by analyzing their performance on various tasks | 1,981 |
baaivision/emu | A multimodal generative model framework | 1,672 |
chenllliang/mmevalpro | A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. | 22 |
declare-lab/instruct-eval | An evaluation framework for large language models trained with instruction tuning methods | 535 |
aiverify-foundation/llm-evals-catalogue | A collaborative catalogue of LLM evaluation frameworks and papers | 13 |
ukgovernmentbeis/inspect_ai | A framework for evaluating large language models | 669 |
maluuba/nlg-eval | A toolset for evaluating and comparing natural language generation models | 1,350 |