coot-videotext
Video transformer
An open-source implementation of a video-text representation learning framework using transformers and PyTorch.
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
288 stars
8 watching
55 forks
Language: Python
last commit: about 2 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
swintransformer/video-swin-transformer | An implementation of the Video Swin Transformer architecture for video recognition tasks | 1,444 |
hanzhanggit/stackgan | A PyTorch implementation of a generative adversarial network for image synthesis from text descriptions | 1,860 |
chaoyuaw/pytorch-coviar | A PyTorch implementation of a compressed video action recognition system | 502 |
xiadingz/video-caption.pytorch | PyTorch implementation of video captioning, combining deep learning and computer vision techniques. | 401 |
clementpinard/sfmlearner-pytorch | PyTorch implementation of unsupervised depth and ego-motion learning from video sequences | 1,014 |
thudm/cogview | A framework for generating images from text using transformers. | 1,722 |
pixart-alpha/pixart-sigma | Develops a PyTorch model for 4K text-to-image generation using diffusion transformer | 1,675 |
jeonsworld/vit-pytorch | A PyTorch implementation of the Vision Transformer model for image recognition tasks. | 1,940 |
microsoft/megatron-deepspeed | Research tool for training large transformer language models at scale | 1,895 |
bigscience-workshop/megatron-deepspeed | A collection of tools and scripts for training large transformer language models at scale | 1,335 |
pylons/colander | A library for serializing and deserializing data structures into strings, mappings, and lists while performing validation. | 451 |
leviswind/pytorch-transformer | Implementation of a transformer-based translation model in PyTorch | 239 |
tongjilibo/bert4torch | An implementation of transformer models in PyTorch for natural language processing tasks | 1,241 |
mchong6/soat | This repository provides a PyTorch implementation of an image manipulation technique using a pretrained StyleGAN model. | 380 |
locuslab/pytorch_fft | Provides an efficient wrapper around CUDA FFTs for PyTorch transformations | 314 |