PVIT

Visual Instruction Model

A project that extends large language models by integrating an additional region-level vision encoder to improve visual instruction tuning.

Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models

GitHub

37 stars

2 watching

2 forks

Language: Python

last commit: almost 2 years ago

Related projects:

Repository	Description	Stars
baai-dcai/visual-instruction-tuning	A dataset and model designed to scale visual instruction tuning using language-only GPT-4 models.	164
aidc-ai/parrot	A method and toolkit for fine-tuning large language models to perform visual instruction tasks in multiple languages.	34
vt-nlp/multiinstruct	A multimodal benchmark dataset designed to evaluate the performance of vision-language foundation models through instruction tuning.	134
vpgtrans/vpgtrans	Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs	270
wisconsinaivision/vip-llava	A system designed to enable large multimodal models to understand arbitrary visual prompts	302
salt-nlp/llavar	An open-source project that enhances visual instruction tuning for text-rich image understanding by integrating GPT-4 models with multimodal datasets.	259
whai362/pvt	An implementation of Pyramid Vision Transformers for image classification, object detection, and semantic segmentation tasks	1,745
x2fd/lvis-instruct4v	A dataset of fine-grained visual instructions generated by prompting a large language model with images from another dataset	131
rucaibox/comvint	Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks	18
vlf-silkie/vlfeedback	An annotated preference dataset and training framework for improving large vision language models.	88
jy0205/lavit	A unified framework for training large language models to understand and generate visual content	544
jshilong/gpt4roi	Training and deploying large language models on computer vision tasks using region-of-interest inputs	517
opendatalab/vigc	Autonomously generates high-quality image-text instruction fine-tuning datasets	91
vchitect/vbench	A benchmark suite for evaluating the performance of video generative models	643
pvlib/pvlib-python	A Python library for simulating photovoltaic energy system performance and modeling solar energy systems.	1,228