ClipBERT
Video-language model
An efficient framework for end-to-end learning on image-text and video-text tasks
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
709 stars
10 watching
86 forks
Language: Python
last commit: over 1 year ago
Linked from 1 awesome list
cvpr2021pytorchvideo-question-answeringvideo-retrievalvision-and-languagevqa
Related projects:
Repository | Description | Stars |
---|---|---|
| PyTorch implementation of video question answering system based on TVQA dataset | 172 |
| A PyTorch implementation of visual question answering with multimodal representation learning | 718 |
| A Python framework for building deep learning models with optimized encoding layers and batch normalization. | 2,044 |
| A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision. | 1,236 |
| A PyTorch implementation of EfficientNet for computer vision tasks | 309 |
| A PyTorch implementation of a real-time semantic segmentation model using ENet architecture | 392 |
| A PyTorch implementation of a framework for generating captions with styles for images and videos. | 63 |
| An open-source implementation of a deep learning model for video deblurring and motion estimation. | 114 |
| A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 246 |
| PyTorch implementation of video captioning, combining deep learning and computer vision techniques. | 402 |
| Develops a PyTorch implementation of an enhanced vision language model | 93 |
| A PyTorch implementation of a deep learning-based method for video stabilization via frame interpolation. | 82 |
| A collection of PyTorch implementations of various scene graph generation models | 732 |
| A PyTorch implementation of visual-semantic embedding methods for image-caption retrieval | 492 |
| An implementation of a lightweight convolutional neural network architecture for mobile devices | 191 |