hallucination-leaderboard

Model Comparison

Compares performance of large language models on generating coherent summaries from short documents

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents

GitHub

1k stars

37 watching

50 forks

Language: Python

last commit: 8 months ago

Linked from 1 awesome list

generative-aihallucinationsllm

Screenshot of vectara/hallucination-leaderboard website

vectara.com

Backlinks from these awesome lists:

filipecalegario/awesome-generative-ai

Related projects:

Repository	Description	Stars
junyangwang0410/amber	An LLM-free benchmark suite for evaluating MLLMs' hallucination capabilities in various tasks and dimensions	98
tianyi-lab/hallusionbench	An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy	259
bradyfu/woodpecker	A method to correct hallucinations in multimodal large language models without requiring retraining	617
junyangwang0410/haelm	A framework for detecting hallucinations in large language models	17
x-plug/mplug-halowl	Evaluates and mitigates hallucinations in multimodal large language models	82
chiragbadhe/lensanalytics-v1	A leaderboard application using public data from the Lens Protocol API to rank notable profiles	5
fuxiaoliu/lrv-instruction	A research project focused on mitigating hallucinations in large multi-modal models by improving instruction tuning through robust training methods.	262
bcdnlp/faithscore	Evaluates answers generated by large vision-language models to assess hallucinations	27
amazon-science/refchecker	Automates fine-grained hallucination detection in large language model outputs	325
yfzhang114/llava-align	Debiasing techniques to minimize hallucinations in large visual language models	75
lalbj/pai	Improves the performance of large language models by intervening in their internal workings to reduce hallucinations	83
m1guelpf/lens-leaderboard	A leaderboard app using public data from the Lens Protocol API to rank notable profiles	31
bronyayang/halle_control	Controlling object hallucination in large multimodal models	28
victordibia/llmx	An API that provides a unified interface to multiple large language models for chat fine-tuning	79
damo-nlp-sg/m3exam	A benchmark for evaluating large language models in multiple languages and formats	93