T2T-ViT

Vision transformer trainer

A deep learning framework for training vision transformers from scratch on image data.

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

GitHub

1k stars

18 watching

176 forks

Language: Jupyter Notebook

last commit: almost 2 years ago

Linked from 1 awesome list

t2t-transformervision-transformervit

Backlinks from these awesome lists:

weiaicunzai/awesome-image-classification

Related projects:

Repository	Description	Stars
zhoudaquan/dvit_repo	An implementation of Deep Vision Transformer models with modifications to improve performance by preventing attention collapse	137
jeonsworld/vit-pytorch	A PyTorch implementation of the Vision Transformer model for image recognition tasks.	1,959
whai362/pvt	An implementation of Pyramid Vision Transformers for image classification, object detection, and semantic segmentation tasks	1,745
yiren-jian/blitext	Develops and trains models for vision-language learning with decoupled language pre-training	24
bigscience-workshop/megatron-deepspeed	A collection of tools and scripts for training large transformer language models at scale	1,342
google-research/nested-transformer	An implementation of a transformer-based vision model that aggregates local transformers on image blocks to improve accuracy and efficiency.	195
keyvank/femtogpt	A Rust implementation of a minimal Generative Pretrained Transformer architecture.	845
microsoft/megatron-deepspeed	Research tool for training large transformer language models at scale	1,926
google-research/vision_transformer	Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax	10,620
microsoft/cvt	An implementation of a new neural network architecture that combines the strengths of convolutional and transformer designs to improve performance on image classification tasks.	559
zsdonghao/spatial-transformer-nets	An implementation of Spatial Transformer Networks in TensorFlow for learning to apply transformations to images via classification tasks.	36
fastnlp/cpt	A pre-trained transformer model for natural language understanding and generation tasks in Chinese	482
dirtyharrylyl/transformer-in-vision	A collection of resources and papers related to Transformer-based computer vision models and techniques.	1,324
atiyo/deep_image_prior	Reconstructs images using untrained neural networks to manipulate and transform existing images	216
gordonhu608/mqt-llava	A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens.	101