AGLA
Object description model
Improving large vision-language models to accurately describe images without generating fictional objects
Code for paper "AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention"
15 stars
2 watching
0 forks
Language: Python
last commit: 4 months ago Related projects:
Repository | Description | Stars |
---|---|---|
byungkwanlee/collavo | Develops a PyTorch implementation of an enhanced vision language model | 93 |
deepseek-ai/deepseek-vl | A multimodal AI model that enables real-world vision-language understanding applications | 2,077 |
dvlab-research/lisa | A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge. | 1,861 |
baaivision/eve | A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 230 |
yiyangzhou/lure | Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability. | 134 |
andy971022/auto-lama | Automates object removal from images using computer vision techniques | 98 |
yfzhang114/llava-align | Debiasing techniques to minimize hallucinations in large visual language models | 71 |
damo-nlp-sg/vcd | An approach to reduce object hallucinations in large vision-language models by contrasting output distributions derived from original and distorted visual inputs | 209 |
ayoolaolafenwa/pixellib | A deep learning library for image segmentation and object detection using PyTorch. | 1,049 |
umass-foundation-model/3d-llm | Developing a Large Language Model capable of processing 3D representations as inputs | 961 |
opengvlab/visionllm | A large language model designed to process and generate visual information | 915 |
byungkwanlee/moai | Improves performance of vision language tasks by integrating computer vision capabilities into large language models | 311 |
mshukor/evalign-icl | Evaluating and improving large multimodal models through in-context learning | 20 |
algolzw/daclip-uir | This project controls vision-language models to restore degraded images in various environments and conditions. | 662 |
uclanlp/elmo-c | Efficient Contextual Representation Learning Model with Continuous Outputs | 4 |