ClipBERT

Video-language model

An efficient framework for end-to-end learning on image-text and video-text tasks

[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

GitHub

709 stars

10 watching

86 forks

Language: Python

last commit: almost 2 years ago

Linked from 1 awesome list

cvpr2021pytorchvideo-question-answeringvideo-retrievalvision-and-languagevqa

arxiv.org/abs/2102.06183

Backlinks from these awesome lists:

danieljf24/awesome-video-text-retrieval

Related projects:

Repository	Description	Stars
jayleicn/tvqa	PyTorch implementation of video question answering system based on TVQA dataset	172
cadene/vqa.pytorch	A PyTorch implementation of visual question answering with multimodal representation learning	718
zhanghang1989/pytorch-encoding	A Python framework for building deep learning models with optimized encoding layers and batch normalization.	2,044
kaiyangzhou/dassl.pytorch	A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision.	1,236
zsef123/efficientnets-pytorch	A PyTorch implementation of EfficientNet for computer vision tasks	309
davidtvs/pytorch-enet	A PyTorch implementation of a real-time semantic segmentation model using ENet architecture	392
kacky24/stylenet	A PyTorch implementation of a framework for generating captions with styles for images and videos.	63
codeslake/pvdnet	An open-source implementation of a deep learning model for video deblurring and motion estimation.	114
baaivision/eve	A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities	246
xiadingz/video-caption.pytorch	PyTorch implementation of video captioning, combining deep learning and computer vision techniques.	402
byungkwanlee/collavo	Develops a PyTorch implementation of an enhanced vision language model	93
jinsc37/difrint	A PyTorch implementation of a deep learning-based method for video stabilization via frame interpolation.	82
jwyang/graph-rcnn.pytorch	A collection of PyTorch implementations of various scene graph generation models	732
fartashf/vsepp	A PyTorch implementation of visual-semantic embedding methods for image-caption retrieval	492
randl/shufflenetv2-pytorch	An implementation of a lightweight convolutional neural network architecture for mobile devices	191