XiezhiBenchmark

Questionnaire

An evaluation suite to assess language models' performance in multi-choice questions

93 stars

1 watching

4 forks

Language: Python

last commit: over 2 years ago

Related projects:

Repository	Description	Stars
felixgithub2017/mmcu	Measures the understanding of massive multitask Chinese datasets using large language models	87
xverse-ai/xverse-moe-a36b	Develops and publishes large multilingual language models with advanced mixing-of-experts architecture.	37
yuweihao/mm-vet	Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics	274
xverse-ai/xverse-13b	A large language model developed to support multiple languages and applications	648
ieit-yuan/yuan2.0-m32	A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation	182
xverse-ai/xverse-65b	A large language model developed by XVERSE Technology Inc. using transformer architecture and fine-tuned on diverse data sets for various applications.	132
xverse-ai/xverse-v-13b	A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences.	78
sergioburdisso/pyss3	A Python package implementing an interpretable machine learning model for text classification with visualization tools	336
pku-yuangroup/video-bench	Evaluates and benchmarks large language models' video understanding capabilities	121
michael-wzhu/promptcblue	A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain	328
xverse-ai/xverse-moe-a4.2b	Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding.	36
yfzhang114/slime	Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types.	143
xunzi-llm-of-chinese-classics/xunziallm	An open-source framework providing tools and models for analyzing and generating Chinese classics texts using large language models	263
zhourax/vega	Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs.	33
gzcch/bingo	An analysis project investigating limitations of visual language models in understanding and processing images with potential biases and interference challenges.	53