Visual-Instruction-Tuning
Visual Instruction Tuning
A dataset and model designed to scale visual instruction tuning using language-only GPT-4 models.
SVIT: Scaling up Visual Instruction Tuning
163 stars
5 watching
4 forks
Language: Python
last commit: 5 months ago Related projects:
Repository | Description | Stars |
---|---|---|
pvit-official/pvit | A project that extends large language models by integrating an additional region-level vision encoder to improve visual instruction tuning. | 36 |
icoz69/stablellava | A tool for generating and evaluating multimodal Large Language Models with visual instruction tuning capabilities | 91 |
salt-nlp/llavar | An open-source project that enhances visual instruction tuning for text-rich image understanding by integrating GPT-4 models with multimodal datasets. | 258 |
rucaibox/comvint | Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks | 18 |
openai/lm-human-preferences | Training methods and tools for fine-tuning language models using human preferences | 1,229 |
aidc-ai/parrot | A method and toolkit for fine-tuning large language models to perform visual instruction tasks in multiple languages. | 30 |
opendatalab/vigc | Autonomously generates high-quality image-text instruction fine-tuning datasets | 90 |
jshilong/gpt4roi | Training and deploying large language models on computer vision tasks using region-of-interest inputs | 506 |
vt-nlp/multiinstruct | A multimodal benchmark dataset designed to evaluate the performance of vision-language foundation models through instruction tuning. | 133 |
ys-zong/vlguard | Improves safety and helpfulness of large language models by fine-tuning them using safety-critical tasks | 45 |
pevma/septun-mark-ii | A tuning guide for optimizing the performance of a network intrusion prevention system | 113 |
pevma/septun | A guide to tuning Suricata for maximum performance in network intrusion detection systems | 204 |
circleradon/osprey | This project presents a new approach to fine-grained visual understanding using pixel-wise mask regions in language instructions | 770 |
microsoft/archai | Automates the search for optimal neural network configurations in deep learning applications | 467 |
liaoning97/revo-lion | A comprehensive dataset and evaluation framework for Vision-Language Instruction Tuning models | 11 |