R2D2
Vision-Language Framework
A framework for large-scale cross-modal benchmarks and vision-language tasks in Chinese
157 stars
2 watching
23 forks
Language: Python
last commit: about 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
zhourax/vega | Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
hxyou/idealgpt | A deep learning framework for iteratively decomposing vision and language reasoning via large language models. | 32 |
shizhediao/davinci | An implementation of vision-language models for multimodal learning tasks, enabling generative vision-language models to be fine-tuned for various applications. | 43 |
yuliang-liu/monkey | A toolkit for building conversational AI models that can process images and text inputs. | 1,825 |
yiren-jian/blitext | Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
wpiroboticsprojects/grip | A computer vision framework for robotics applications that simplifies the creation of vision systems and generates code in multiple programming languages. | 379 |
vlf-silkie/vlfeedback | An annotated preference dataset and training framework for improving large vision language models. | 85 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,298 |
baai-wudao/brivl | Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications | 279 |
openuc2/uc2-git | An open-source framework for building modular optical systems with integrated electronics and software for interactive projects. | 461 |
byungkwanlee/moai | Improves performance of vision language tasks by integrating computer vision capabilities into large language models | 311 |
xiaoyufenfei/lednet | A lightweight deep learning framework for real-time semantic segmentation | 513 |
yulingtianxia/core-ml-sample | A demo project demonstrating the integration of Core ML and Vision Framework with Swift 4 for image classification using an Inception V3 network. | 219 |
vishaal27/sus-x | This is an open-source project that proposes a novel method to train large-scale vision-language models with minimal resources and no fine-tuning required. | 94 |
kohjingyu/fromage | A framework for grounding language models to images and handling multimodal inputs and outputs | 478 |