PVIT
Visual Instruction Model
A project that extends large language models by integrating an additional region-level vision encoder to improve visual instruction tuning.
Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
36 stars
2 watching
2 forks
Language: Python
last commit: about 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
baai-dcai/visual-instruction-tuning | A dataset and model designed to scale visual instruction tuning using language-only GPT-4 models. | 163 |
aidc-ai/parrot | A method and toolkit for fine-tuning large language models to perform visual instruction tasks in multiple languages. | 32 |
vt-nlp/multiinstruct | A multimodal benchmark dataset designed to evaluate the performance of vision-language foundation models through instruction tuning. | 133 |
vpgtrans/vpgtrans | Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 269 |
wisconsinaivision/vip-llava | A system designed to enable large multimodal models to understand arbitrary visual prompts | 294 |
salt-nlp/llavar | An open-source project that enhances visual instruction tuning for text-rich image understanding by integrating GPT-4 models with multimodal datasets. | 258 |
whai362/pvt | An implementation of Pyramid Vision Transformers for image classification, object detection, and semantic segmentation tasks | 1,728 |
x2fd/lvis-instruct4v | A dataset of fine-grained visual instructions generated by prompting a large language model with images from another dataset | 131 |
rucaibox/comvint | Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks | 18 |
vlf-silkie/vlfeedback | An annotated preference dataset and training framework for improving large vision language models. | 85 |
jy0205/lavit | A unified framework for training large language models to understand and generate visual content | 528 |
jshilong/gpt4roi | Training and deploying large language models on computer vision tasks using region-of-interest inputs | 506 |
opendatalab/vigc | Autonomously generates high-quality image-text instruction fine-tuning datasets | 90 |
vchitect/vbench | A tool for evaluating and benchmarking video generative models in computer vision and artificial intelligence | 576 |
pvlib/pvlib-python | A Python library for simulating photovoltaic energy system performance and modeling solar energy systems. | 1,205 |