AGLA

Object description model

Improving large vision-language models to accurately describe images without generating fictional objects

Code for paper "AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention"

GitHub

15 stars
2 watching
0 forks
Language: Python
last commit: 4 months ago

Related projects:

Repository Description Stars
byungkwanlee/collavo Develops a PyTorch implementation of an enhanced vision language model 93
deepseek-ai/deepseek-vl A multimodal AI model that enables real-world vision-language understanding applications 2,077
dvlab-research/lisa A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge. 1,861
baaivision/eve A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities 230
yiyangzhou/lure Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability. 134
andy971022/auto-lama Automates object removal from images using computer vision techniques 98
yfzhang114/llava-align Debiasing techniques to minimize hallucinations in large visual language models 71
damo-nlp-sg/vcd An approach to reduce object hallucinations in large vision-language models by contrasting output distributions derived from original and distorted visual inputs 209
ayoolaolafenwa/pixellib A deep learning library for image segmentation and object detection using PyTorch. 1,049
umass-foundation-model/3d-llm Developing a Large Language Model capable of processing 3D representations as inputs 961
opengvlab/visionllm A large language model designed to process and generate visual information 915
byungkwanlee/moai Improves performance of vision language tasks by integrating computer vision capabilities into large language models 311
mshukor/evalign-icl Evaluating and improving large multimodal models through in-context learning 20
algolzw/daclip-uir This project controls vision-language models to restore degraded images in various environments and conditions. 662
uclanlp/elmo-c Efficient Contextual Representation Learning Model with Continuous Outputs 4