ComVint

Instruction generator

Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks

The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning''

GitHub

18 stars
2 watching
0 forks
Language: Python
last commit: about 1 year ago

Related projects:

Repository Description Stars
circleradon/osprey This project presents a new approach to fine-grained visual understanding using pixel-wise mask regions in language instructions 770
opendatalab/vigc Autonomously generates high-quality image-text instruction fine-tuning datasets 90
rowanz/r2c An open-source project providing PyTorch code and data for a deep learning model that enables visual commonsense reasoning. 466
deepcs233/visual-cot Develops a multi-modal language model with a comprehensive dataset and benchmark for chain-of-thought reasoning 134
salt-nlp/llavar An open-source project that enhances visual instruction tuning for text-rich image understanding by integrating GPT-4 models with multimodal datasets. 258
rbbrdckybk/ai-art-generator Automates large batches of AI-generated artwork locally using GPU acceleration. 634
baai-dcai/visual-instruction-tuning A dataset and model designed to scale visual instruction tuning using language-only GPT-4 models. 163
rucaibox/pope An evaluation framework for detecting object hallucinations in vision-language models 179
kunpengli1994/vsrn An open-source PyTorch implementation of a visual semantic reasoning model for image-text matching 294
aidc-ai/parrot A method and toolkit for fine-tuning large language models to perform visual instruction tasks in multiple languages. 32
bigredt/vico Multi-sense word embeddings learned from visual cooccurrences 25
rubocop/rubocop-rspec Analyzes Ruby code for style and syntax errors in RSpec files 810
lxtgh/omg-seg Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. 1,300
kacky24/stylenet A PyTorch implementation of a framework for generating captions with styles for images and videos. 63
jtoy/sketchnet Generates code in a visual programming language using images as input 40