GroundedTranslation

Image Description Model

Trains multilingual image description models using neural sequence models and extracts hidden features from trained models.

Multilingual image description

GitHub

46 stars
8 watching
25 forks
Language: Python
last commit: almost 7 years ago

Related projects:

Repository Description Stars
gink03/alt-i2v An implementation of a deep learning-based image representation learning approach using a modified fully connected layer and transfer learning from VGG16 34
autodistill/autodistill Automatically trains models from large foundation models to perform specific tasks with minimal human intervention. 2,022
zsdonghao/unsup-im2im A framework for unsupervised image-to-image translation using Generative Adversarial Networks (GANs) and deep learning. 73
mingyuliutw/unit An unsupervised deep learning framework for translating images between different modalities 1,994
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
liuquande/feddg-elcfs This project presents a framework for federated domain generalization in medical image segmentation using continuous frequency space and episodic learning. 246
microsoft/som Enables visual grounding in large language models by overlaying spatial and speakable marks on images 1,218
zhengpeng7/birefnet An open-source implementation of an image segmentation model that combines background removal and object detection capabilities. 1,484
uclanlp/elmo-c Efficient Contextual Representation Learning Model with Continuous Outputs 4
isekai-portal/link-context-learning An implementation of a multimodal learning approach to improve language models' ability to recognize unseen images and understand novel concepts. 91
tobypde/frrn A software framework for training and evaluating full-resolution residual networks for semantic image segmentation tasks 280
abbypa/nnproject_deepmask A deep learning implementation of an object segmentation algorithm. 187
kohjingyu/fromage A framework for grounding language models to images and handling multimodal inputs and outputs 478
mbzuai-oryx/groundinglmm An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations 797
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,299