HallusionBench
Benchmark
An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
243 stars
4 watching
7 forks
Language: Python
last commit: 8 days ago benchmarkbenchmarksgpt-4gpt-4vhallucinationlarge-language-modelslarge-vision-language-modelsllavallmlmmvlms
Related projects:
Repository | Description | Stars |
---|---|---|
bradyfu/woodpecker | A method to correct hallucinations in multimodal large language models during text generation | 611 |
yfzhang114/llava-align | Debiasing techniques to minimize hallucinations in large visual language models | 71 |
amazon-science/refchecker | Automates fine-grained hallucination detection in large language model outputs | 302 |
x-plug/mplug-halowl | Evaluates and mitigates hallucinations in multimodal large language models | 79 |
nvlabs/bongard-hoi | A benchmarking tool and software framework for evaluating few-shot visual reasoning capabilities in computer vision models. | 64 |
yiyangzhou/lure | Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability. | 134 |
1zhou-wang/memvr | An implementation of a method to mitigate hallucinations in large language models using visual re-tracing | 27 |
fuxiaoliu/lrv-instruction | A research project focused on mitigating hallucinations in large multi-modal models by improving instruction tuning through robust training methods. | 255 |
junyangwang0410/amber | An LLM-free benchmark suite for evaluating MLLMs' hallucination capabilities in various tasks and dimensions | 93 |
lalbj/pai | Improves the performance of large language models by intervening in their internal workings to reduce hallucinations | 67 |
bcdnlp/faithscore | Evaluates answers generated by large vision-language models to assess hallucinations | 25 |
vectara/hallucination-leaderboard | Evaluates and compares the performance of large language models in generating hallucinations during document summarization. | 1,236 |
qcri/llmebench | A benchmarking framework for large language models | 80 |
junyangwang0410/haelm | A framework for detecting hallucinations in large language models | 17 |
yuqifan1117/hallucidoctor | This project provides tools and frameworks to mitigate hallucinatory toxicity in visual instruction data, allowing researchers to fine-tune MLLM models on specific datasets. | 41 |