FlagEval

Model evaluation framework

An evaluation toolkit and platform for assessing large models in various domains

FlagEval is an evaluation toolkit for AI large foundation models.

GitHub

307 stars

13 watching

27 forks

Language: Python

last commit: about 1 year ago

Linked from 1 awesome list

Backlinks from these awesome lists:

ethicalml/awesome-production-machine-learning

Related projects:

Repository	Description	Stars
flagai-open/aquila2	Provides pre-trained language models and tools for fine-tuning and evaluation	439
allenai/olmo-eval	A framework for evaluating language models on NLP tasks	326
huggingface/lighteval	An all-in-one toolkit for evaluating Large Language Models (LLMs) across multiple backends.	879
open-evals/evals	A framework for evaluating OpenAI models and an open-source registry of benchmarks.	19
modelscope/evalscope	A framework for efficiently evaluating and benchmarking large models	308
huggingface/evaluate	An evaluation framework for machine learning models and datasets, providing standardized metrics and tools for comparing model performance.	2,063
openai/simple-evals	Evaluates language models using standardized benchmarks and prompting techniques.	2,059
psycoy/mixeval	An evaluation suite and dynamic data release platform for large language models	230
stanford-crfm/helm	A framework to evaluate and compare language models by analyzing their performance on various tasks	1,981
baaivision/emu	A multimodal generative model framework	1,672
chenllliang/mmevalpro	A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline.	22
declare-lab/instruct-eval	An evaluation framework for large language models trained with instruction tuning methods	535
aiverify-foundation/llm-evals-catalogue	A collaborative catalogue of LLM evaluation frameworks and papers	13
ukgovernmentbeis/inspect_ai	A framework for evaluating large language models	669
maluuba/nlg-eval	A toolset for evaluating and comparing natural language generation models	1,350