FlagEval
Model evaluation framework
An evaluation toolkit and platform for assessing large models in various domains
FlagEval is an evaluation toolkit for AI large foundation models.
300 stars
13 watching
28 forks
Language: Python
last commit: 4 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
flagai-open/aquila2 | Provides pre-trained language models and tools for fine-tuning and evaluation | 437 |
allenai/olmo-eval | An evaluation framework for large language models. | 310 |
huggingface/lighteval | A toolkit for evaluating Large Language Models across multiple backends | 804 |
open-evals/evals | A framework for evaluating OpenAI models and an open-source registry of benchmarks. | 19 |
modelscope/evalscope | A framework for efficient large model evaluation and performance benchmarking. | 248 |
huggingface/evaluate | An evaluation framework for machine learning models and datasets, providing standardized metrics and tools for comparing model performance. | 2,034 |
openai/simple-evals | A library for evaluating language models using standardized prompts and benchmarking tests. | 1,939 |
psycoy/mixeval | An evaluation suite and dynamic data release platform for large language models | 224 |
stanford-crfm/helm | A framework to evaluate and compare language models by analyzing their performance on various tasks | 1,947 |
baaivision/emu | A multimodal generative model framework | 1,659 |
chenllliang/mmevalpro | A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. | 22 |
declare-lab/instruct-eval | An evaluation framework for large language models trained with instruction tuning methods | 528 |
aiverify-foundation/llm-evals-catalogue | A collaborative catalogue of Large Language Model evaluation frameworks and papers. | 14 |
ukgovernmentbeis/inspect_ai | A framework for evaluating large language models | 615 |
maluuba/nlg-eval | A toolset for evaluating and comparing natural language generation models | 1,347 |