hallucination-leaderboard

Model Comparison

Compares performance of large language models on generating coherent summaries from short documents

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents

GitHub

1k stars
37 watching
50 forks
Language: Python
last commit: 5 days ago
Linked from 1 awesome list

generative-aihallucinationsllm

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
junyangwang0410/amber An LLM-free benchmark suite for evaluating MLLMs' hallucination capabilities in various tasks and dimensions 98
tianyi-lab/hallusionbench An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy 259
bradyfu/woodpecker A method to correct hallucinations in multimodal large language models without requiring retraining 617
junyangwang0410/haelm A framework for detecting hallucinations in large language models 17
x-plug/mplug-halowl Evaluates and mitigates hallucinations in multimodal large language models 82
chiragbadhe/lensanalytics-v1 A leaderboard application using public data from the Lens Protocol API to rank notable profiles 5
fuxiaoliu/lrv-instruction A research project focused on mitigating hallucinations in large multi-modal models by improving instruction tuning through robust training methods. 262
bcdnlp/faithscore Evaluates answers generated by large vision-language models to assess hallucinations 27
amazon-science/refchecker Automates fine-grained hallucination detection in large language model outputs 325
yfzhang114/llava-align Debiasing techniques to minimize hallucinations in large visual language models 75
lalbj/pai Improves the performance of large language models by intervening in their internal workings to reduce hallucinations 83
m1guelpf/lens-leaderboard A leaderboard app using public data from the Lens Protocol API to rank notable profiles 31
bronyayang/halle_control Controlling object hallucination in large multimodal models 28
victordibia/llmx An API that provides a unified interface to multiple large language models for chat fine-tuning 79
damo-nlp-sg/m3exam A benchmark for evaluating large language models in multiple languages and formats 93