GAOKAO-Bench

Model testing framework

An evaluation framework using Chinese high school examination questions to assess large language model capabilities

GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.

GitHub

551 stars
4 watching
40 forks
Language: Python
last commit: 8 months ago

Related projects:

Repository Description Stars
felixgithub2017/mmcu Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. 87
flagai-open/aquila2 Provides pre-trained language models and tools for fine-tuning and evaluation 437
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 55
zjunlp/knowlm A framework for training and utilizing large language models with knowledge augmentation capabilities 1,239
opengvlab/lamm A framework and benchmark for training and evaluating multi-modal large language models, enabling the development of AI agents capable of seamless interaction between humans and machines. 301
aifeg/benchlmm An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models 83
openlmlab/openchinesellama An incremental pre-trained Chinese large language model based on the LLaMA-7B model 234
matthewhammer/motoko-bigtest A testing framework that enables the creation of long-running tests using a domain-specific language. 12
open-compass/lawbench Evaluates the legal knowledge of large language models using a custom benchmarking framework. 267
letmeno1/aki A cross-platform desktop testing framework utilizing accessibility APIs and the JNA library to automate interactions with GUI elements. 34
google/paxml A framework for configuring and running machine learning experiments on top of Jax. 457
johnsnowlabs/langtest A tool for testing and evaluating large language models with a focus on AI safety and model assessment. 501
pku-yuangroup/video-bench Evaluates and benchmarks large language models' video understanding capabilities 117
yuliang-liu/monkey A toolkit for building conversational AI models that can process images and text inputs. 1,825
yuweihao/mm-vet Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics 267