hallucination-leaderboard
Model Comparison
Compares performance of large language models on generating coherent summaries from short documents
Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents
1k stars
37 watching
50 forks
Language: Python
last commit: 5 days ago
Linked from 1 awesome list
generative-aihallucinationsllm
Related projects:
Repository | Description | Stars |
---|---|---|
junyangwang0410/amber | An LLM-free benchmark suite for evaluating MLLMs' hallucination capabilities in various tasks and dimensions | 98 |
tianyi-lab/hallusionbench | An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy | 259 |
bradyfu/woodpecker | A method to correct hallucinations in multimodal large language models without requiring retraining | 617 |
junyangwang0410/haelm | A framework for detecting hallucinations in large language models | 17 |
x-plug/mplug-halowl | Evaluates and mitigates hallucinations in multimodal large language models | 82 |
chiragbadhe/lensanalytics-v1 | A leaderboard application using public data from the Lens Protocol API to rank notable profiles | 5 |
fuxiaoliu/lrv-instruction | A research project focused on mitigating hallucinations in large multi-modal models by improving instruction tuning through robust training methods. | 262 |
bcdnlp/faithscore | Evaluates answers generated by large vision-language models to assess hallucinations | 27 |
amazon-science/refchecker | Automates fine-grained hallucination detection in large language model outputs | 325 |
yfzhang114/llava-align | Debiasing techniques to minimize hallucinations in large visual language models | 75 |
lalbj/pai | Improves the performance of large language models by intervening in their internal workings to reduce hallucinations | 83 |
m1guelpf/lens-leaderboard | A leaderboard app using public data from the Lens Protocol API to rank notable profiles | 31 |
bronyayang/halle_control | Controlling object hallucination in large multimodal models | 28 |
victordibia/llmx | An API that provides a unified interface to multiple large language models for chat fine-tuning | 79 |
damo-nlp-sg/m3exam | A benchmark for evaluating large language models in multiple languages and formats | 93 |