virtex

Caption learning

A pretraining approach that uses semantically dense captions to learn visual representations and improve image understanding tasks.

[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations

GitHub

557 stars
14 watching
61 forks
Language: Python
last commit: 11 months ago
coco-datasetcvpr2021image-captioningmodel-zoopretrained-models

Related projects:

Repository Description Stars
xiadingz/video-caption.pytorch PyTorch implementation of video captioning, combining deep learning and computer vision techniques. 401
zhegan27/semantic_compositional_nets A deep learning framework providing a model architecture and training code for image captioning using semantic compositional networks 70
deepcs233/visual-cot Develops a multi-modal language model with a comprehensive dataset and benchmark for chain-of-thought reasoning 134
jaywongwang/densevideocaptioning An implementation of a dense video captioning model with attention-based fusion and context gating 148
luoweizhou/vlp A project for pre-training models to support image captioning and question answering tasks. 412
chapternewscu/image-captioning-with-semantic-attention A deep learning model for generating image captions with semantic attention 51
rmokady/clip_prefix_caption An approach to image captioning that leverages the CLIP model and fine-tunes a language model without requiring additional supervision or object annotation. 1,315
vict0rsch/deep_learning A collection of tutorials and resources on implementing deep learning models using Python libraries such as Keras and Lasagne. 426
lukemelas/image-paragraph-captioning Trains image paragraph captioning models to generate diverse and accurate captions 90
rucaibox/comvint Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks 18
yxuansu/tacl Improves pre-trained language models by encouraging an isotropic and discriminative distribution of token representations. 92
yiwuzhong/sub-gc A PyTorch implementation of image captioning models via scene graph decomposition. 96
ppwwyyxx/moco.tensorflow Reimplements a popular deep learning model for unsupervised visual representation learning using TensorFlow 161
deeprnn/image_captioning This implementation allows users to generate captions from images using a neural network model with visual attention. 786
apple2373/chainer-caption An image caption generation system using a neural network architecture with pre-trained models. 64