GroundedTranslation

Image Description Model

Trains multilingual image description models using neural sequence models and extracts hidden features from trained models.

Multilingual image description

GitHub

46 stars
8 watching
25 forks
Language: Python
last commit: almost 7 years ago

Related projects:

Repository Description Stars
gink03/alt-i2v An implementation of a deep learning-based image representation learning approach using a modified fully connected layer and transfer learning from VGG16 34
autodistill/autodistill Automatically trains models from large foundation models to perform specific tasks with minimal human intervention. 1,997
zsdonghao/unsup-im2im A framework for unsupervised image-to-image translation using Generative Adversarial Networks (GANs) and deep learning. 73
mingyuliutw/unit An unsupervised deep learning framework for translating images between different modalities 1,988
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
liuquande/feddg-elcfs A framework for federated learning on medical image segmentation using continuous frequency space interpolation. 240
microsoft/som Enables visual grounding in large language models by overlaying spatial and speakable marks on images 1,183
zhengpeng7/birefnet This repository provides a software framework and implementation of a neural network model for high-resolution image segmentation tasks 1,379
uclanlp/elmo-c Efficient Contextual Representation Learning Model with Continuous Outputs 4
isekai-portal/link-context-learning An implementation of a multimodal learning approach to improve language models' ability to recognize unseen images and understand novel concepts. 89
tobypde/frrn A software framework for training and evaluating full-resolution residual networks for semantic image segmentation tasks 280
abbypa/nnproject_deepmask A deep learning implementation of an object segmentation algorithm. 187
kohjingyu/fromage A framework for grounding language models to images and handling multimodal inputs and outputs 478
mbzuai-oryx/groundinglmm An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. 781
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,298