GroundedTranslation

Image Description Model

Trains multilingual image description models using neural sequence models and extracts hidden features from trained models.

Multilingual image description

GitHub

46 stars

8 watching

25 forks

Language: Python

last commit: over 8 years ago

Screenshot of elliottd/GroundedTranslation website

staff.fnwi.uva.nl/d.elliott/GroundedTranslation/

Related projects:

Repository	Description	Stars
gink03/alt-i2v	An implementation of a deep learning-based image representation learning approach using a modified fully connected layer and transfer learning from VGG16	34
autodistill/autodistill	Automatically trains models from large foundation models to perform specific tasks with minimal human intervention.	2,022
zsdonghao/unsup-im2im	A framework for unsupervised image-to-image translation using Generative Adversarial Networks (GANs) and deep learning.	73
mingyuliutw/unit	An unsupervised deep learning framework for translating images between different modalities	1,994
yiren-jian/blitext	Develops and trains models for vision-language learning with decoupled language pre-training	24
liuquande/feddg-elcfs	This project presents a framework for federated domain generalization in medical image segmentation using continuous frequency space and episodic learning.	246
microsoft/som	Enables visual grounding in large language models by overlaying spatial and speakable marks on images	1,218
zhengpeng7/birefnet	An open-source implementation of an image segmentation model that combines background removal and object detection capabilities.	1,484
uclanlp/elmo-c	Efficient Contextual Representation Learning Model with Continuous Outputs	4
isekai-portal/link-context-learning	An implementation of a multimodal learning approach to improve language models' ability to recognize unseen images and understand novel concepts.	91
tobypde/frrn	A software framework for training and evaluating full-resolution residual networks for semantic image segmentation tasks	280
abbypa/nnproject_deepmask	A deep learning implementation of an object segmentation algorithm.	187
kohjingyu/fromage	A framework for grounding language models to images and handling multimodal inputs and outputs	478
mbzuai-oryx/groundinglmm	An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations	797
nvlabs/prismer	A deep learning framework for training multi-modal models with vision and language capabilities.	1,299