XiezhiBenchmark
Questionnaire
An evaluation suite to assess language models' performance in multi-choice questions
91 stars
1 watching
4 forks
Language: Python
last commit: 12 months ago Related projects:
Repository | Description | Stars |
---|---|---|
felixgithub2017/mmcu | Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. | 87 |
xverse-ai/xverse-moe-a36b | Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. | 36 |
yuweihao/mm-vet | Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics | 267 |
xverse-ai/xverse-13b | A large language model developed to support multiple languages and applications | 649 |
ieit-yuan/yuan2.0-m32 | A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation | 180 |
xverse-ai/xverse-65b | A large language model developed by XVERSE Technology Inc. using transformer architecture and fine-tuned on diverse data sets for various applications. | 132 |
xverse-ai/xverse-v-13b | A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. | 77 |
sergioburdisso/pyss3 | A Python package implementing an interpretable machine learning model for text classification with visualization tools | 336 |
pku-yuangroup/video-bench | Evaluates and benchmarks large language models' video understanding capabilities | 117 |
michael-wzhu/promptcblue | A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain | 323 |
xverse-ai/xverse-moe-a4.2b | Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding. | 36 |
yfzhang114/slime | Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. | 137 |
xunzi-llm-of-chinese-classics/xunziallm | An open-source framework providing tools and models for analyzing and generating Chinese classics texts using large language models | 257 |
zhourax/vega | Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
gzcch/bingo | An analysis project investigating limitations of visual language models in understanding and processing images with potential biases and interference challenges. | 53 |