MME-RealWorld

Real-world challenge simulator

A multimodal large language model benchmark designed to simulate real-world challenges and measure the performance of such models in practical scenarios.

✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

GitHub

86 stars

1 watching

6 forks

Language: Python

last commit: about 1 year ago

Related projects:

Repository	Description	Stars
yfzhang114/slime	Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types.	143
zhourax/vega	Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs.	33
yuliang-liu/monkey	An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage.	1,849
fuxiaoliu/mmc	Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models.	87
bradyfu/video-mme	Comprehensive benchmark for evaluating multi-modal large language models on video analysis tasks	422
multimodal-art-projection/omnibench	Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously.	15
xverse-ai/xverse-v-13b	A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences.	78
yuweihao/mm-vet	Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics	274
workday/upshot-montague	Translates natural language into formal representations using Combinatory Categorial Grammar (CCG), enabling semantic parsing.	59
xverse-ai/xverse-moe-a36b	Develops and publishes large multilingual language models with advanced mixing-of-experts architecture.	37
felixgithub2017/mmcu	Measures the understanding of massive multitask Chinese datasets using large language models	87
xverse-ai/xverse-moe-a4.2b	Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding.	36
junyangwang0410/amber	An LLM-free benchmark suite for evaluating MLLMs' hallucination capabilities in various tasks and dimensions	98
aifeg/benchlmm	An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models	84
pleisto/yuren-baichuan-7b	A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks	73