AMBER

MLLM benchmark

An LLM-free benchmark suite for evaluating MLLMs' hallucination capabilities in various tasks and dimensions

An LLM-free Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation

GitHub

98 stars

1 watching

2 forks

Language: Python

last commit: over 1 year ago

Related projects:

Repository	Description	Stars
x-plug/mplug-halowl	Evaluates and mitigates hallucinations in multimodal large language models	82
junyangwang0410/haelm	A framework for detecting hallucinations in large language models	17
damo-nlp-sg/m3exam	A benchmark for evaluating large language models in multiple languages and formats	93
vectara/hallucination-leaderboard	Compares performance of large language models on generating coherent summaries from short documents	1,281
freedomintelligence/mllm-bench	Evaluates and compares the performance of multimodal large language models on various tasks	56
aifeg/benchlmm	An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models	84
multimodal-art-projection/omnibench	Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously.	15
tianyi-lab/hallusionbench	An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy	259
ailab-cvc/seed-bench	A benchmark for evaluating large language models' ability to process multimodal input	322
uw-madison-lee-lab/cobsat	Provides a benchmarking framework and dataset for evaluating the performance of large language models in text-to-image tasks	30
km1994/llmsninestorydemontower	Exploring various LLMs and their applications in natural language processing and related areas	1,854
bradyfu/woodpecker	A method to correct hallucinations in multimodal large language models without requiring retraining	617
oval-group/mlogger	A lightweight logger for machine learning experiments	127
pleisto/yuren-baichuan-7b	A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks	73
szilard/benchm-ml	A benchmark for evaluating machine learning algorithms' performance on large datasets	1,874