PVIT

Visual Instruction Model

A project that extends large language models by integrating an additional region-level vision encoder to improve visual instruction tuning.

Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models

GitHub

36 stars
2 watching
2 forks
Language: Python
last commit: about 1 year ago

Related projects:

Repository Description Stars
baai-dcai/visual-instruction-tuning A dataset and model designed to scale visual instruction tuning using language-only GPT-4 models. 163
aidc-ai/parrot A method and toolkit for fine-tuning large language models to perform visual instruction tasks in multiple languages. 32
vt-nlp/multiinstruct A multimodal benchmark dataset designed to evaluate the performance of vision-language foundation models through instruction tuning. 133
vpgtrans/vpgtrans Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs 269
wisconsinaivision/vip-llava A system designed to enable large multimodal models to understand arbitrary visual prompts 294
salt-nlp/llavar An open-source project that enhances visual instruction tuning for text-rich image understanding by integrating GPT-4 models with multimodal datasets. 258
whai362/pvt An implementation of Pyramid Vision Transformers for image classification, object detection, and semantic segmentation tasks 1,728
x2fd/lvis-instruct4v A dataset of fine-grained visual instructions generated by prompting a large language model with images from another dataset 131
rucaibox/comvint Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks 18
vlf-silkie/vlfeedback An annotated preference dataset and training framework for improving large vision language models. 85
jy0205/lavit A unified framework for training large language models to understand and generate visual content 528
jshilong/gpt4roi Training and deploying large language models on computer vision tasks using region-of-interest inputs 506
opendatalab/vigc Autonomously generates high-quality image-text instruction fine-tuning datasets 90
vchitect/vbench A tool for evaluating and benchmarking video generative models in computer vision and artificial intelligence 576
pvlib/pvlib-python A Python library for simulating photovoltaic energy system performance and modeling solar energy systems. 1,205