ceval
Evaluation suite
An evaluation suite providing multiple-choice questions for foundation models in various disciplines, with tools for assessing model performance.
Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]
2k stars
14 watching
79 forks
Language: Python
last commit: over 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
| An evaluation suite for assessing chart understanding in multimodal large language models. | 85 |
| Evaluates foundation models on human-centric tasks with diverse exams and question types | 714 |
| A framework for evaluating OpenAI models and an open-source registry of benchmarks. | 19 |
| An evaluation suite and dynamic data release platform for large language models | 230 |
| Evaluates German transformer language models with syntactic agreement tests | 7 |
| A comprehensive toolkit for evaluating NLP experiments offering automated metrics and efficient computation. | 187 |
| A framework for evaluating contribution of individual clients in federated learning systems. | 7 |
| A toolset for evaluating and comparing natural language generation models | 1,350 |
| An open-source benchmark and evaluation tool for assessing multimodal large language models' performance in embodied decision-making tasks | 99 |
| A build-time code evaluation tool for JavaScript | 127 |
| Evaluating and improving large multimodal models through in-context learning | 21 |
| Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics | 274 |
| An evaluation suite for assessing foundation models in the DevOps field. | 690 |
| Evaluates language models using standardized benchmarks and prompting techniques. | 2,059 |
| A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. | 22 |