ClipBERT

Video-language model

An efficient framework for end-to-end learning on image-text and video-text tasks

[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

GitHub

709 stars
10 watching
86 forks
Language: Python
last commit: over 1 year ago
Linked from 1 awesome list

cvpr2021pytorchvideo-question-answeringvideo-retrievalvision-and-languagevqa

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
jayleicn/tvqa PyTorch implementation of video question answering system based on TVQA dataset 172
cadene/vqa.pytorch A PyTorch implementation of visual question answering with multimodal representation learning 718
zhanghang1989/pytorch-encoding A Python framework for building deep learning models with optimized encoding layers and batch normalization. 2,044
kaiyangzhou/dassl.pytorch A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision. 1,236
zsef123/efficientnets-pytorch A PyTorch implementation of EfficientNet for computer vision tasks 309
davidtvs/pytorch-enet A PyTorch implementation of a real-time semantic segmentation model using ENet architecture 392
kacky24/stylenet A PyTorch implementation of a framework for generating captions with styles for images and videos. 63
codeslake/pvdnet An open-source implementation of a deep learning model for video deblurring and motion estimation. 114
baaivision/eve A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities 246
xiadingz/video-caption.pytorch PyTorch implementation of video captioning, combining deep learning and computer vision techniques. 402
byungkwanlee/collavo Develops a PyTorch implementation of an enhanced vision language model 93
jinsc37/difrint A PyTorch implementation of a deep learning-based method for video stabilization via frame interpolation. 82
jwyang/graph-rcnn.pytorch A collection of PyTorch implementations of various scene graph generation models 732
fartashf/vsepp A PyTorch implementation of visual-semantic embedding methods for image-caption retrieval 492
randl/shufflenetv2-pytorch An implementation of a lightweight convolutional neural network architecture for mobile devices 191