GroundedTranslation
Image Description Model
Trains multilingual image description models using neural sequence models and extracts hidden features from trained models.
Multilingual image description
46 stars
8 watching
25 forks
Language: Python
last commit: about 7 years ago Related projects:
Repository | Description | Stars |
---|---|---|
| An implementation of a deep learning-based image representation learning approach using a modified fully connected layer and transfer learning from VGG16 | 34 |
| Automatically trains models from large foundation models to perform specific tasks with minimal human intervention. | 2,022 |
| A framework for unsupervised image-to-image translation using Generative Adversarial Networks (GANs) and deep learning. | 73 |
| An unsupervised deep learning framework for translating images between different modalities | 1,994 |
| Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
| This project presents a framework for federated domain generalization in medical image segmentation using continuous frequency space and episodic learning. | 246 |
| Enables visual grounding in large language models by overlaying spatial and speakable marks on images | 1,218 |
| An open-source implementation of an image segmentation model that combines background removal and object detection capabilities. | 1,484 |
| Efficient Contextual Representation Learning Model with Continuous Outputs | 4 |
| An implementation of a multimodal learning approach to improve language models' ability to recognize unseen images and understand novel concepts. | 91 |
| A software framework for training and evaluating full-resolution residual networks for semantic image segmentation tasks | 280 |
| A deep learning implementation of an object segmentation algorithm. | 187 |
| A framework for grounding language models to images and handling multimodal inputs and outputs | 478 |
| An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations | 797 |
| A deep learning framework for training multi-modal models with vision and language capabilities. | 1,299 |