SuperCLUElyb
Model benchmark
A benchmarking platform for evaluating Chinese general-purpose models through anonymous, random battles
SuperCLUE琅琊榜:中文通用大模型匿名对战评价基准
141 stars
5 watching
6 forks
last commit: 5 months ago Related projects:
Repository | Description | Stars |
---|---|---|
cluebenchmark/cluepretrainedmodels | Provides pre-trained models for Chinese language tasks with improved performance and smaller model sizes compared to existing models. | 804 |
cluebenchmark/cluecorpus2020 | A large-scale pre-training corpus for Chinese language models | 925 |
cluebenchmark/electra | Trains and evaluates a Chinese language model using adversarial training on a large corpus. | 140 |
cluebenchmark/pclue | A large-scale dataset for training models to perform multiple tasks and zero-shot learning in natural language processing. | 468 |
clue-ai/promptclue | A pre-trained language model for multiple natural language processing tasks with support for few-shot learning and transfer learning. | 654 |
felixgithub2017/mmcu | Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. | 87 |
clue-ai/chatyuan-7b | An updated version of a large language model designed to improve performance on multiple tasks and datasets | 13 |
qcri/llmebench | A benchmarking framework for large language models | 80 |
aifeg/benchlmm | An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models | 83 |
catboost/benchmarks | Comparative benchmarks of various machine learning algorithms | 169 |
bitshifter/mathbench-rs | A benchmarking framework comparing performance of different Rust linear algebra libraries | 198 |
ibob/picobench | A microbenchmarking library for C++ | 211 |
yuliang-liu/multimodalocr | An evaluation benchmark for OCR capabilities in large multmodal models. | 471 |
robustbench/robustbench | A standardized benchmark for measuring the robustness of machine learning models against adversarial attacks | 667 |
clue-ai/chatyuan | Large language model for dialogue support in multiple languages | 1,902 |