AGLA

Image descriptor model

Improves large vision-language models' ability to accurately describe images by combining global and local attention mechanisms.

[Arxiv 2024] AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

GitHub

18 stars

2 watching

0 forks

Language: Python

last commit: about 2 years ago

Related projects:

Repository	Description	Stars
byungkwanlee/collavo	Develops a PyTorch implementation of an enhanced vision language model	93
deepseek-ai/deepseek-vl	A multimodal AI model that enables real-world vision-language understanding applications	2,145
dvlab-research/lisa	A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge.	1,923
baaivision/eve	A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities	246
yiyangzhou/lure	Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability.	136
andy971022/auto-lama	Automates object removal from images using computer vision techniques	99
yfzhang114/llava-align	Debiasing techniques to minimize hallucinations in large visual language models	75
damo-nlp-sg/vcd	An approach to reduce object hallucinations in large vision-language models by contrasting output distributions derived from original and distorted visual inputs	222
ayoolaolafenwa/pixellib	A deep learning library for image segmentation and object detection using PyTorch.	1,054
umass-foundation-model/3d-llm	Developing a Large Language Model capable of processing 3D representations as inputs	979
opengvlab/visionllm	A large language model designed to process and generate visual information	956
byungkwanlee/moai	Improves performance of vision language tasks by integrating computer vision capabilities into large language models	314
mshukor/evalign-icl	Evaluating and improving large multimodal models through in-context learning	21
algolzw/daclip-uir	This project controls vision-language models to restore degraded images in various environments and conditions.	673
uclanlp/elmo-c	Efficient Contextual Representation Learning Model with Continuous Outputs	4