XiezhiBenchmark

Questionnaire

An evaluation suite to assess language models' performance in multi-choice questions

GitHub

91 stars
1 watching
4 forks
Language: Python
last commit: 12 months ago

Related projects:

Repository Description Stars
felixgithub2017/mmcu Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. 87
xverse-ai/xverse-moe-a36b Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. 36
yuweihao/mm-vet Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics 267
xverse-ai/xverse-13b A large language model developed to support multiple languages and applications 649
ieit-yuan/yuan2.0-m32 A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation 180
xverse-ai/xverse-65b A large language model developed by XVERSE Technology Inc. using transformer architecture and fine-tuned on diverse data sets for various applications. 132
xverse-ai/xverse-v-13b A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. 77
sergioburdisso/pyss3 A Python package implementing an interpretable machine learning model for text classification with visualization tools 336
pku-yuangroup/video-bench Evaluates and benchmarks large language models' video understanding capabilities 117
michael-wzhu/promptcblue A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain 323
xverse-ai/xverse-moe-a4.2b Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding. 36
yfzhang114/slime Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. 137
xunzi-llm-of-chinese-classics/xunziallm An open-source framework providing tools and models for analyzing and generating Chinese classics texts using large language models 257
zhourax/vega Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. 33
gzcch/bingo An analysis project investigating limitations of visual language models in understanding and processing images with potential biases and interference challenges. 53