virtex
Caption learning
A pretraining approach that uses semantically dense captions to learn visual representations and improve image understanding tasks.
[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
557 stars
14 watching
61 forks
Language: Python
last commit: 11 months ago coco-datasetcvpr2021image-captioningmodel-zoopretrained-models
Related projects:
Repository | Description | Stars |
---|---|---|
xiadingz/video-caption.pytorch | PyTorch implementation of video captioning, combining deep learning and computer vision techniques. | 401 |
zhegan27/semantic_compositional_nets | A deep learning framework providing a model architecture and training code for image captioning using semantic compositional networks | 70 |
deepcs233/visual-cot | Develops a multi-modal language model with a comprehensive dataset and benchmark for chain-of-thought reasoning | 134 |
jaywongwang/densevideocaptioning | An implementation of a dense video captioning model with attention-based fusion and context gating | 148 |
luoweizhou/vlp | A project for pre-training models to support image captioning and question answering tasks. | 412 |
chapternewscu/image-captioning-with-semantic-attention | A deep learning model for generating image captions with semantic attention | 51 |
rmokady/clip_prefix_caption | An approach to image captioning that leverages the CLIP model and fine-tunes a language model without requiring additional supervision or object annotation. | 1,315 |
vict0rsch/deep_learning | A collection of tutorials and resources on implementing deep learning models using Python libraries such as Keras and Lasagne. | 426 |
lukemelas/image-paragraph-captioning | Trains image paragraph captioning models to generate diverse and accurate captions | 90 |
rucaibox/comvint | Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks | 18 |
yxuansu/tacl | Improves pre-trained language models by encouraging an isotropic and discriminative distribution of token representations. | 92 |
yiwuzhong/sub-gc | A PyTorch implementation of image captioning models via scene graph decomposition. | 96 |
ppwwyyxx/moco.tensorflow | Reimplements a popular deep learning model for unsupervised visual representation learning using TensorFlow | 161 |
deeprnn/image_captioning | This implementation allows users to generate captions from images using a neural network model with visual attention. | 786 |
apple2373/chainer-caption | An image caption generation system using a neural network architecture with pre-trained models. | 64 |