HallusionBench

Benchmark

An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

GitHub

243 stars
4 watching
7 forks
Language: Python
last commit: 8 days ago
benchmarkbenchmarksgpt-4gpt-4vhallucinationlarge-language-modelslarge-vision-language-modelsllavallmlmmvlms

Related projects:

Repository Description Stars
bradyfu/woodpecker A method to correct hallucinations in multimodal large language models during text generation 611
yfzhang114/llava-align Debiasing techniques to minimize hallucinations in large visual language models 71
amazon-science/refchecker Automates fine-grained hallucination detection in large language model outputs 302
x-plug/mplug-halowl Evaluates and mitigates hallucinations in multimodal large language models 79
nvlabs/bongard-hoi A benchmarking tool and software framework for evaluating few-shot visual reasoning capabilities in computer vision models. 64
yiyangzhou/lure Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability. 134
1zhou-wang/memvr An implementation of a method to mitigate hallucinations in large language models using visual re-tracing 27
fuxiaoliu/lrv-instruction A research project focused on mitigating hallucinations in large multi-modal models by improving instruction tuning through robust training methods. 255
junyangwang0410/amber An LLM-free benchmark suite for evaluating MLLMs' hallucination capabilities in various tasks and dimensions 93
lalbj/pai Improves the performance of large language models by intervening in their internal workings to reduce hallucinations 67
bcdnlp/faithscore Evaluates answers generated by large vision-language models to assess hallucinations 25
vectara/hallucination-leaderboard Evaluates and compares the performance of large language models in generating hallucinations during document summarization. 1,236
qcri/llmebench A benchmarking framework for large language models 80
junyangwang0410/haelm A framework for detecting hallucinations in large language models 17
yuqifan1117/hallucidoctor This project provides tools and frameworks to mitigate hallucinatory toxicity in visual instruction data, allowing researchers to fine-tune MLLM models on specific datasets. 41