MMCU

Chinese language model evaluation

Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset.

MEASURING MASSIVE MULTITASK CHINESE UNDERSTANDING

GitHub

87 stars
2 watching
12 forks
Language: Python
last commit: 8 months ago

Related projects:

Repository Description Stars
ieit-yuan/yuan2.0-m32 A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation 180
hit-scir/chinese-mixtral-8x7b An implementation of a large language model for Chinese text processing, focusing on MoE (Multi-Headed Attention) architecture and incorporating a vast vocabulary. 641
fuxiaoliu/mmc Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models. 84
mikegu721/xiezhibenchmark An evaluation suite to assess language models' performance in multi-choice questions 91
pku-yuangroup/video-bench Evaluates and benchmarks large language models' video understanding capabilities 117
cluebenchmark/cluepretrainedmodels Provides pre-trained models for Chinese language tasks with improved performance and smaller model sizes compared to existing models. 804
tencent/tencent-hunyuan-large This project makes a large language model accessible for research and development 1,114
qcri/llmebench A benchmarking framework for large language models 80
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 55
yunwentechnology/unilm This project provides pre-trained models for natural language understanding and generation tasks using the UniLM architecture. 438
damo-nlp-sg/m3exam A benchmark for evaluating large language models in multiple languages and formats 92
brightmart/xlnet_zh Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks 230
cluebenchmark/electra Trains and evaluates a Chinese language model using adversarial training on a large corpus. 140
shawn-ieitsystems/yuan-1.0 Large-scale language model with improved performance on NLP tasks through distributed training and efficient data processing 591
yuliang-liu/multimodalocr An evaluation benchmark for OCR capabilities in large multmodal models. 471