coot-videotext
Video transformer
An open-source implementation of a video-text representation learning framework using transformers and PyTorch.
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
288 stars
8 watching
55 forks
Language: Python
last commit: over 2 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| An implementation of the Video Swin Transformer architecture for video recognition tasks | 1,463 |
| A PyTorch implementation of a generative adversarial network for image synthesis from text descriptions | 1,863 |
| A PyTorch implementation of a compressed video action recognition system | 502 |
| PyTorch implementation of video captioning, combining deep learning and computer vision techniques. | 402 |
| Pytorch implementation of unsupervised depth and ego-motion learning from video sequences | 1,022 |
| A framework for generating images from text using transformers. | 1,735 |
| Develops a PyTorch model for 4K text-to-image generation using diffusion transformer | 1,711 |
| A PyTorch implementation of the Vision Transformer model for image recognition tasks. | 1,959 |
| Research tool for training large transformer language models at scale | 1,926 |
| A collection of tools and scripts for training large transformer language models at scale | 1,342 |
| A library for serializing and deserializing data structures into strings, mappings, and lists while performing validation. | 451 |
| Implementation of a transformer-based translation model in PyTorch | 240 |
| An implementation of transformer models in PyTorch for natural language processing tasks | 1,257 |
| This repository provides a PyTorch implementation of an image manipulation technique using a pretrained StyleGAN model. | 380 |
| Provides an efficient wrapper around CUDA FFTs for PyTorch transformations | 315 |