Osprey
Visual guidance
This project presents a new approach to fine-grained visual understanding using pixel-wise mask regions in language instructions
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
781 stars
14 watching
42 forks
Language: Python
last commit: 6 months ago mllmpixel-understandingsamvisual-instruction-tuning
Related projects:
Repository | Description | Stars |
---|---|---|
rucaibox/comvint | Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks | 18 |
salt-nlp/llavar | An open-source project that enhances visual instruction tuning for text-rich image understanding by integrating GPT-4 models with multimodal datasets. | 259 |
roboflow/maestro | A tool to streamline fine-tuning of multimodal models for vision-language tasks | 1,415 |
ys-zong/vlguard | Improves safety and helpfulness of large language models by fine-tuning them using safety-critical tasks | 47 |
jshilong/gpt4roi | Training and deploying large language models on computer vision tasks using region-of-interest inputs | 517 |
aidc-ai/parrot | A method and toolkit for fine-tuning large language models to perform visual instruction tasks in multiple languages. | 34 |
aidc-ai/ovis | An MLLM architecture designed to align visual and textual embeddings through structural alignment | 575 |
penghao-wu/vstar | PyTorch implementation of guided visual search mechanism for multimodal LLMs | 541 |
bigredt/vico | Multi-sense word embeddings learned from visual cooccurrences | 25 |
codeplant/simple-navigation | A Ruby gem for creating hierarchical navigation structures in web applications | 886 |
baai-dcai/visual-instruction-tuning | A dataset and model designed to scale visual instruction tuning using language-only GPT-4 models. | 164 |
byungkwanlee/moai | Improves performance of vision language tasks by integrating computer vision capabilities into large language models | 314 |
kunpengli1994/vsrn | An open-source PyTorch implementation of a visual semantic reasoning model for image-text matching | 294 |
sy-xuan/pink | This project enables multi-modal language models to understand and generate text about visual content using referential comprehension. | 79 |
dannnylo/rtesseract | A Ruby library providing an interface to the Tesseract OCR system. | 838 |