ClipBERT
Video-language model
An efficient framework for end-to-end learning on image-text and video-text tasks
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
709 stars
10 watching
86 forks
Language: Python
last commit: over 1 year ago
Linked from 1 awesome list
cvpr2021pytorchvideo-question-answeringvideo-retrievalvision-and-languagevqa
Related projects:
Repository | Description | Stars |
---|---|---|
jayleicn/tvqa | PyTorch implementation of video question answering system based on TVQA dataset | 172 |
cadene/vqa.pytorch | A PyTorch implementation of visual question answering with multimodal representation learning | 718 |
zhanghang1989/pytorch-encoding | A Python framework for building deep learning models with optimized encoding layers and batch normalization. | 2,044 |
kaiyangzhou/dassl.pytorch | A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision. | 1,236 |
zsef123/efficientnets-pytorch | A PyTorch implementation of EfficientNet for computer vision tasks | 309 |
davidtvs/pytorch-enet | A PyTorch implementation of a real-time semantic segmentation model using ENet architecture | 392 |
kacky24/stylenet | A PyTorch implementation of a framework for generating captions with styles for images and videos. | 63 |
codeslake/pvdnet | An open-source implementation of a deep learning model for video deblurring and motion estimation. | 114 |
baaivision/eve | A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 246 |
xiadingz/video-caption.pytorch | PyTorch implementation of video captioning, combining deep learning and computer vision techniques. | 402 |
byungkwanlee/collavo | Develops a PyTorch implementation of an enhanced vision language model | 93 |
jinsc37/difrint | A PyTorch implementation of a deep learning-based method for video stabilization via frame interpolation. | 82 |
jwyang/graph-rcnn.pytorch | A collection of PyTorch implementations of various scene graph generation models | 732 |
fartashf/vsepp | A PyTorch implementation of visual-semantic embedding methods for image-caption retrieval | 492 |
randl/shufflenetv2-pytorch | An implementation of a lightweight convolutional neural network architecture for mobile devices | 191 |