GroundedTranslation
Image Description Model
Trains multilingual image description models using neural sequence models and extracts hidden features from trained models.
Multilingual image description
46 stars
8 watching
25 forks
Language: Python
last commit: almost 7 years ago Related projects:
Repository | Description | Stars |
---|---|---|
gink03/alt-i2v | An implementation of a deep learning-based image representation learning approach using a modified fully connected layer and transfer learning from VGG16 | 34 |
autodistill/autodistill | Automatically trains models from large foundation models to perform specific tasks with minimal human intervention. | 1,983 |
zsdonghao/unsup-im2im | A framework for unsupervised image-to-image translation using Generative Adversarial Networks (GANs) and deep learning. | 73 |
mingyuliutw/unit | An unsupervised deep learning framework for translating images between different modalities | 1,988 |
yiren-jian/blitext | Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
liuquande/feddg-elcfs | A framework for federated learning on medical image segmentation using continuous frequency space interpolation. | 240 |
microsoft/som | Enables visual grounding in large language models by overlaying spatial and speakable marks on images | 1,183 |
zhengpeng7/birefnet | An implementation of a deep learning-based image segmentation model for high-resolution images | 1,319 |
uclanlp/elmo-c | Efficient Contextual Representation Learning Model with Continuous Outputs | 4 |
isekai-portal/link-context-learning | An implementation of a multimodal learning approach to improve language models' ability to recognize unseen images and understand novel concepts. | 89 |
tobypde/frrn | A software framework for training and evaluating full-resolution residual networks for semantic image segmentation tasks | 280 |
abbypa/nnproject_deepmask | A deep learning implementation of an object segmentation algorithm. | 187 |
kohjingyu/fromage | A framework for grounding language models to images and handling multimodal inputs and outputs | 478 |
mbzuai-oryx/groundinglmm | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. | 781 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,298 |