ComVint
Instruction generator
Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks
The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning''
18 stars
2 watching
0 forks
Language: Python
last commit: about 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
circleradon/osprey | This project presents a new approach to fine-grained visual understanding using pixel-wise mask regions in language instructions | 770 |
opendatalab/vigc | Autonomously generates high-quality image-text instruction fine-tuning datasets | 90 |
rowanz/r2c | An open-source project providing PyTorch code and data for a deep learning model that enables visual commonsense reasoning. | 466 |
deepcs233/visual-cot | Develops a multi-modal language model with a comprehensive dataset and benchmark for chain-of-thought reasoning | 134 |
salt-nlp/llavar | An open-source project that enhances visual instruction tuning for text-rich image understanding by integrating GPT-4 models with multimodal datasets. | 258 |
rbbrdckybk/ai-art-generator | Automates large batches of AI-generated artwork locally using GPU acceleration. | 634 |
baai-dcai/visual-instruction-tuning | A dataset and model designed to scale visual instruction tuning using language-only GPT-4 models. | 163 |
rucaibox/pope | An evaluation framework for detecting object hallucinations in vision-language models | 179 |
kunpengli1994/vsrn | An open-source PyTorch implementation of a visual semantic reasoning model for image-text matching | 294 |
aidc-ai/parrot | A method and toolkit for fine-tuning large language models to perform visual instruction tasks in multiple languages. | 30 |
bigredt/vico | Multi-sense word embeddings learned from visual cooccurrences | 25 |
rubocop/rubocop-rspec | Analyzes Ruby code for style and syntax errors in RSpec files | 810 |
lxtgh/omg-seg | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,300 |
kacky24/stylenet | A PyTorch implementation of a framework for generating captions with styles for images and videos. | 63 |
jtoy/sketchnet | Generates code in a visual programming language using images as input | 40 |