T2T-ViT

Vision transformer trainer

A deep learning framework for training vision transformers from scratch on image data.

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

GitHub

1k stars
18 watching
176 forks
Language: Jupyter Notebook
last commit: about 1 year ago
Linked from 1 awesome list

t2t-transformervision-transformervit

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
zhoudaquan/dvit_repo An implementation of Deep Vision Transformer models with modifications to improve performance by preventing attention collapse 136
jeonsworld/vit-pytorch A PyTorch implementation of the Vision Transformer model for image recognition tasks. 1,940
whai362/pvt An implementation of Pyramid Vision Transformers for image classification, object detection, and semantic segmentation tasks 1,728
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
bigscience-workshop/megatron-deepspeed A collection of tools and scripts for training large transformer language models at scale 1,335
google-research/nested-transformer An implementation of a transformer-based vision model that aggregates local transformers on image blocks to improve accuracy and efficiency. 193
keyvank/femtogpt A Rust implementation of a minimal Generative Pretrained Transformer architecture. 834
microsoft/megatron-deepspeed Research tool for training large transformer language models at scale 1,895
google-research/vision_transformer Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax 10,450
microsoft/cvt An implementation of a new neural network architecture that combines the strengths of convolutional and transformer designs to improve performance on image classification tasks. 555
zsdonghao/spatial-transformer-nets An implementation of Spatial Transformer Networks in TensorFlow for learning to apply transformations to images via classification tasks. 36
fastnlp/cpt A pre-trained transformer model for natural language understanding and generation tasks in Chinese 481
dirtyharrylyl/transformer-in-vision A collection of resources and papers related to Transformer-based computer vision models and techniques. 1,319
atiyo/deep_image_prior Reconstructs images using untrained neural networks to manipulate and transform existing images 215
gordonhu608/mqt-llava A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens. 97