FlagEval

Model evaluation framework

An evaluation toolkit and platform for assessing large models in various domains

FlagEval is an evaluation toolkit for AI large foundation models.

GitHub

300 stars
13 watching
28 forks
Language: Python
last commit: 4 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
flagai-open/aquila2 Provides pre-trained language models and tools for fine-tuning and evaluation 437
allenai/olmo-eval An evaluation framework for large language models. 310
huggingface/lighteval A toolkit for evaluating Large Language Models across multiple backends 804
open-evals/evals A framework for evaluating OpenAI models and an open-source registry of benchmarks. 19
modelscope/evalscope A framework for efficient large model evaluation and performance benchmarking. 248
huggingface/evaluate An evaluation framework for machine learning models and datasets, providing standardized metrics and tools for comparing model performance. 2,034
openai/simple-evals A library for evaluating language models using standardized prompts and benchmarking tests. 1,939
psycoy/mixeval An evaluation suite and dynamic data release platform for large language models 224
stanford-crfm/helm A framework to evaluate and compare language models by analyzing their performance on various tasks 1,947
baaivision/emu A multimodal generative model framework 1,659
chenllliang/mmevalpro A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. 22
declare-lab/instruct-eval An evaluation framework for large language models trained with instruction tuning methods 528
aiverify-foundation/llm-evals-catalogue A collaborative catalogue of Large Language Model evaluation frameworks and papers. 14
ukgovernmentbeis/inspect_ai A framework for evaluating large language models 615
maluuba/nlg-eval A toolset for evaluating and comparing natural language generation models 1,347