ComVint

Instruction generator

Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks

The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning''

GitHub

18 stars

2 watching

0 forks

Language: Python

last commit: over 2 years ago

Related projects:

Repository	Description	Stars
circleradon/osprey	This project presents a new approach to fine-grained visual understanding using pixel-wise mask regions in language instructions	781
opendatalab/vigc	Autonomously generates high-quality image-text instruction fine-tuning datasets	91
rowanz/r2c	An open-source project providing PyTorch code and data for a deep learning model that enables visual commonsense reasoning.	466
deepcs233/visual-cot	A framework for training multi-modal language models with a focus on visual inputs and providing interpretable thoughts.	162
salt-nlp/llavar	An open-source project that enhances visual instruction tuning for text-rich image understanding by integrating GPT-4 models with multimodal datasets.	259
rbbrdckybk/ai-art-generator	Automates large batches of AI-generated artwork locally using GPU acceleration.	633
baai-dcai/visual-instruction-tuning	A dataset and model designed to scale visual instruction tuning using language-only GPT-4 models.	164
rucaibox/pope	An evaluation framework for detecting object hallucinations in vision-language models	187
kunpengli1994/vsrn	An open-source PyTorch implementation of a visual semantic reasoning model for image-text matching	294
aidc-ai/parrot	A method and toolkit for fine-tuning large language models to perform visual instruction tasks in multiple languages.	34
bigredt/vico	Multi-sense word embeddings learned from visual cooccurrences	25
rubocop/rubocop-rspec	Analyzes Ruby code for style and syntax errors in RSpec files	810
lxtgh/omg-seg	Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model.	1,336
kacky24/stylenet	A PyTorch implementation of a framework for generating captions with styles for images and videos.	63
jtoy/sketchnet	Generates code in a visual programming language using images as input	40