AMBER
MLLM benchmark
An LLM-free benchmark suite for evaluating MLLMs' hallucination capabilities in various tasks and dimensions
An LLM-free Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation
98 stars
1 watching
2 forks
Language: Python
last commit: about 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
| Evaluates and mitigates hallucinations in multimodal large language models | 82 |
| A framework for detecting hallucinations in large language models | 17 |
| A benchmark for evaluating large language models in multiple languages and formats | 93 |
| Compares performance of large language models on generating coherent summaries from short documents | 1,281 |
| Evaluates and compares the performance of multimodal large language models on various tasks | 56 |
| An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models | 84 |
| Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 15 |
| An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy | 259 |
| A benchmark for evaluating large language models' ability to process multimodal input | 322 |
| Provides a benchmarking framework and dataset for evaluating the performance of large language models in text-to-image tasks | 30 |
| Exploring various LLMs and their applications in natural language processing and related areas | 1,854 |
| A method to correct hallucinations in multimodal large language models without requiring retraining | 617 |
| A lightweight logger for machine learning experiments | 127 |
| A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 73 |
| A benchmark for evaluating machine learning algorithms' performance on large datasets | 1,874 |