VCoder

Perception adapter

An adapter for improving large language models at object-level perception tasks with auxiliary perception modalities

VCoder: Versatile Vision Encoders for Multimodal Large Language Models, arXiv 2023 / CVPR 2024

GitHub

266 stars

9 watching

15 forks

Language: Python

last commit: over 2 years ago

praeclarumjj3.github.io/vcoder/

Related projects:

Repository	Description	Stars
lhoyer/mic	An unsupervised domain adaptation method that uses contextual information to improve performance on visual recognition tasks	271
shi-labs/gfr-dsod	Improving Object Detection from Scratch via Gated Feature Reuse	65
vchitect/vbench	A benchmark suite for evaluating the performance of video generative models	643
roboflow/maestro	A tool to streamline fine-tuning of multimodal models for vision-language tasks	1,415
wasidennis/adaptsegnet	This project implements a deep learning-based approach to adapt semantic segmentation models from one domain to another.	851
vision-cair/longvu	An artificial intelligence system designed to understand and describe long-form video content	329
yiyangzhou/lure	Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability.	136
yunxinli/lingcloud	Enhances language models by incorporating human-like eyes to improve visual comprehension and interaction with external world	48
gordonhu608/mqt-llava	A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens.	101
byungkwanlee/moai	Improves performance of vision language tasks by integrating computer vision capabilities into large language models	314
vlf-silkie/vlfeedback	An annotated preference dataset and training framework for improving large vision language models.	88
tianyi-lab/hallusionbench	An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy	259
thecodrr/vspeech	Provides an interface to Mozilla's DeepSpeech TensorFlow-based Speech-to-Text library using V bindings.	49
cvondrick/vatic	Tools for efficiently scaling up video annotation using crowdsourced marketplaces.	609
sergioburdisso/pyss3	A Python package implementing an interpretable machine learning model for text classification with visualization tools	336