GAOKAO-Bench

Model testing framework

An evaluation framework using Chinese high school examination questions to assess large language model capabilities

GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.

GitHub

565 stars

4 watching

42 forks

Language: Python

last commit: over 1 year ago

Related projects:

Repository	Description	Stars
felixgithub2017/mmcu	Measures the understanding of massive multitask Chinese datasets using large language models	87
flagai-open/aquila2	Provides pre-trained language models and tools for fine-tuning and evaluation	439
freedomintelligence/mllm-bench	Evaluates and compares the performance of multimodal large language models on various tasks	56
zjunlp/knowlm	A framework for training and utilizing large language models with knowledge augmentation capabilities	1,251
opengvlab/lamm	A framework and benchmark for training and evaluating multi-modal large language models, enabling the development of AI agents capable of seamless interaction between humans and machines.	305
aifeg/benchlmm	An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models	84
openlmlab/openchinesellama	An incremental pre-trained Chinese large language model based on the LLaMA-7B model	234
matthewhammer/motoko-bigtest	A testing framework that enables the creation of long-running tests using a domain-specific language.	12
open-compass/lawbench	Evaluates the legal knowledge of large language models using a custom benchmarking framework.	273
letmeno1/aki	A cross-platform desktop testing framework utilizing accessibility APIs and the JNA library to automate interactions with GUI elements.	34
google/paxml	A framework for configuring and running machine learning experiments on top of Jax.	461
johnsnowlabs/langtest	A tool for testing and evaluating large language models with a focus on AI safety and model assessment.	506
pku-yuangroup/video-bench	Evaluates and benchmarks large language models' video understanding capabilities	121
yuliang-liu/monkey	An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage.	1,849
yuweihao/mm-vet	Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics	274