R2D2

Vision-Language Framework

A framework for large-scale cross-modal benchmarks and vision-language tasks in Chinese

GitHub

157 stars
2 watching
23 forks
Language: Python
last commit: about 1 year ago

Related projects:

Repository Description Stars
zhourax/vega Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. 33
hxyou/idealgpt A deep learning framework for iteratively decomposing vision and language reasoning via large language models. 32
shizhediao/davinci An implementation of vision-language models for multimodal learning tasks, enabling generative vision-language models to be fine-tuned for various applications. 43
yuliang-liu/monkey A toolkit for building conversational AI models that can process images and text inputs. 1,825
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
wpiroboticsprojects/grip A computer vision framework for robotics applications that simplifies the creation of vision systems and generates code in multiple programming languages. 379
vlf-silkie/vlfeedback An annotated preference dataset and training framework for improving large vision language models. 85
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,298
baai-wudao/brivl Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications 279
openuc2/uc2-git An open-source framework for building modular optical systems with integrated electronics and software for interactive projects. 461
byungkwanlee/moai Improves performance of vision language tasks by integrating computer vision capabilities into large language models 311
xiaoyufenfei/lednet A lightweight deep learning framework for real-time semantic segmentation 513
yulingtianxia/core-ml-sample A demo project demonstrating the integration of Core ML and Vision Framework with Swift 4 for image classification using an Inception V3 network. 219
vishaal27/sus-x This is an open-source project that proposes a novel method to train large-scale vision-language models with minimal resources and no fine-tuning required. 94
kohjingyu/fromage A framework for grounding language models to images and handling multimodal inputs and outputs 478