virtex
Caption learning
A pretraining approach that uses semantically dense captions to learn visual representations and improve image understanding tasks.
[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
556 stars
14 watching
61 forks
Language: Python
last commit: about 1 year ago coco-datasetcvpr2021image-captioningmodel-zoopretrained-models
Related projects:
Repository | Description | Stars |
---|---|---|
| PyTorch implementation of video captioning, combining deep learning and computer vision techniques. | 402 |
| A deep learning framework providing a model architecture and training code for image captioning using semantic compositional networks | 70 |
| A framework for training multi-modal language models with a focus on visual inputs and providing interpretable thoughts. | 162 |
| An implementation of a dense video captioning model with attention-based fusion and context gating | 149 |
| A project for pre-training models to support image captioning and question answering tasks. | 416 |
| A deep learning model for generating image captions with semantic attention | 51 |
| An approach to image captioning that leverages the CLIP model and fine-tunes a language model without requiring additional supervision or object annotation. | 1,326 |
| A collection of tutorials and resources on implementing deep learning models using Python libraries such as Keras and Lasagne. | 426 |
| Trains image paragraph captioning models to generate diverse and accurate captions | 90 |
| Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks | 18 |
| Improves pre-trained language models by encouraging an isotropic and discriminative distribution of token representations. | 92 |
| A PyTorch implementation of image captioning models via scene graph decomposition. | 96 |
| Reimplements a popular deep learning model for unsupervised visual representation learning using TensorFlow | 161 |
| This implementation allows users to generate captions from images using a neural network model with visual attention. | 790 |
| An image caption generation system using a neural network architecture with pre-trained models. | 64 |