virtex
Caption learning
A pretraining approach that uses semantically dense captions to learn visual representations and improve image understanding tasks.
[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
556 stars
14 watching
61 forks
Language: Python
last commit: almost 2 years ago coco-datasetcvpr2021image-captioningmodel-zoopretrained-models
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | PyTorch implementation of video captioning, combining deep learning and computer vision techniques. | 402 |
| | A deep learning framework providing a model architecture and training code for image captioning using semantic compositional networks | 70 |
| | A framework for training multi-modal language models with a focus on visual inputs and providing interpretable thoughts. | 162 |
| | An implementation of a dense video captioning model with attention-based fusion and context gating | 149 |
| | A project for pre-training models to support image captioning and question answering tasks. | 416 |
| | A deep learning model for generating image captions with semantic attention | 51 |
| | An approach to image captioning that leverages the CLIP model and fine-tunes a language model without requiring additional supervision or object annotation. | 1,326 |
| | A collection of tutorials and resources on implementing deep learning models using Python libraries such as Keras and Lasagne. | 426 |
| | Trains image paragraph captioning models to generate diverse and accurate captions | 90 |
| | Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks | 18 |
| | Improves pre-trained language models by encouraging an isotropic and discriminative distribution of token representations. | 92 |
| | A PyTorch implementation of image captioning models via scene graph decomposition. | 96 |
| | Reimplements a popular deep learning model for unsupervised visual representation learning using TensorFlow | 161 |
| | This implementation allows users to generate captions from images using a neural network model with visual attention. | 790 |
| | An image caption generation system using a neural network architecture with pre-trained models. | 64 |