MME-RealWorld
Real-world challenge simulator
A multimodal large language model benchmark designed to simulate real-world challenges and measure the performance of such models in practical scenarios.
✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
86 stars
1 watching
6 forks
Language: Python
last commit: 3 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. | 143 |
| Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
| An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. | 1,849 |
| Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models. | 87 |
| Comprehensive benchmark for evaluating multi-modal large language models on video analysis tasks | 422 |
| Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 15 |
| A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. | 78 |
| Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics | 274 |
| Translates natural language into formal representations using Combinatory Categorial Grammar (CCG), enabling semantic parsing. | 59 |
| Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. | 37 |
| Measures the understanding of massive multitask Chinese datasets using large language models | 87 |
| Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding. | 36 |
| An LLM-free benchmark suite for evaluating MLLMs' hallucination capabilities in various tasks and dimensions | 98 |
| An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models | 84 |
| A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 73 |