coot-videotext

Video transformer

An open-source implementation of a video-text representation learning framework using transformers and PyTorch.

COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning

GitHub

288 stars
8 watching
55 forks
Language: Python
last commit: about 2 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
swintransformer/video-swin-transformer An implementation of the Video Swin Transformer architecture for video recognition tasks 1,444
hanzhanggit/stackgan A PyTorch implementation of a generative adversarial network for image synthesis from text descriptions 1,860
chaoyuaw/pytorch-coviar A PyTorch implementation of a compressed video action recognition system 502
xiadingz/video-caption.pytorch PyTorch implementation of video captioning, combining deep learning and computer vision techniques. 401
clementpinard/sfmlearner-pytorch PyTorch implementation of unsupervised depth and ego-motion learning from video sequences 1,014
thudm/cogview A framework for generating images from text using transformers. 1,722
pixart-alpha/pixart-sigma Develops a PyTorch model for 4K text-to-image generation using diffusion transformer 1,675
jeonsworld/vit-pytorch A PyTorch implementation of the Vision Transformer model for image recognition tasks. 1,940
microsoft/megatron-deepspeed Research tool for training large transformer language models at scale 1,895
bigscience-workshop/megatron-deepspeed A collection of tools and scripts for training large transformer language models at scale 1,335
pylons/colander A library for serializing and deserializing data structures into strings, mappings, and lists while performing validation. 451
leviswind/pytorch-transformer Implementation of a transformer-based translation model in PyTorch 239
tongjilibo/bert4torch An implementation of transformer models in PyTorch for natural language processing tasks 1,241
mchong6/soat This repository provides a PyTorch implementation of an image manipulation technique using a pretrained StyleGAN model. 380
locuslab/pytorch_fft Provides an efficient wrapper around CUDA FFTs for PyTorch transformations 314