MME-RealWorld
Real-world dataset
A benchmark dataset designed to evaluate the performance of multimodal large language models in realistic, high-resolution real-world scenarios.
✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
78 stars
2 watching
5 forks
Language: Python
last commit: 7 days ago Related projects:
Repository | Description | Stars |
---|---|---|
yfzhang114/slime | Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. | 137 |
zhourax/vega | Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
yuliang-liu/monkey | A toolkit for building conversational AI models that can process images and text inputs. | 1,825 |
fuxiaoliu/mmc | Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models. | 84 |
bradyfu/video-mme | An evaluation framework for large language models in video analysis, providing a comprehensive benchmark of their capabilities. | 406 |
multimodal-art-projection/omnibench | Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 14 |
xverse-ai/xverse-v-13b | A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. | 77 |
yuweihao/mm-vet | Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics | 267 |
workday/upshot-montague | Translates natural language into formal representations using Combinatory Categorial Grammar (CCG), enabling semantic parsing. | 59 |
xverse-ai/xverse-moe-a36b | Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. | 36 |
felixgithub2017/mmcu | Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. | 87 |
xverse-ai/xverse-moe-a4.2b | Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding. | 36 |
junyangwang0410/amber | An LLM-free benchmark suite for evaluating MLLMs' hallucination capabilities in various tasks and dimensions | 93 |
aifeg/benchlmm | An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models | 83 |
pleisto/yuren-baichuan-7b | A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 72 |