HallusionBench

Benchmark

An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

GitHub

259 stars

4 watching

7 forks

Language: Python

last commit: over 1 year ago

benchmarkbenchmarksgpt-4gpt-4vhallucinationlarge-language-modelslarge-vision-language-modelsllavallmlmmvlms

Related projects:

Repository	Description	Stars
bradyfu/woodpecker	A method to correct hallucinations in multimodal large language models without requiring retraining	617
yfzhang114/llava-align	Debiasing techniques to minimize hallucinations in large visual language models	75
amazon-science/refchecker	Automates fine-grained hallucination detection in large language model outputs	325
x-plug/mplug-halowl	Evaluates and mitigates hallucinations in multimodal large language models	82
nvlabs/bongard-hoi	A benchmarking tool and software framework for evaluating few-shot visual reasoning capabilities in computer vision models.	64
yiyangzhou/lure	Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability.	136
1zhou-wang/memvr	An implementation of a method to mitigate hallucinations in large language models using visual re-tracing	28
fuxiaoliu/lrv-instruction	A research project focused on mitigating hallucinations in large multi-modal models by improving instruction tuning through robust training methods.	262
junyangwang0410/amber	An LLM-free benchmark suite for evaluating MLLMs' hallucination capabilities in various tasks and dimensions	98
lalbj/pai	Improves the performance of large language models by intervening in their internal workings to reduce hallucinations	83
bcdnlp/faithscore	Evaluates answers generated by large vision-language models to assess hallucinations	27
vectara/hallucination-leaderboard	Compares performance of large language models on generating coherent summaries from short documents	1,281
qcri/llmebench	A benchmarking framework for large language models	81
junyangwang0410/haelm	A framework for detecting hallucinations in large language models	17
yuqifan1117/hallucidoctor	This project provides tools and frameworks to mitigate hallucinatory toxicity in visual instruction data, allowing researchers to fine-tune MLLM models on specific datasets.	41