T2T-ViT
Vision transformer trainer
A deep learning framework for training vision transformers from scratch on image data.
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
1k stars
18 watching
176 forks
Language: Jupyter Notebook
last commit: about 1 year ago
Linked from 1 awesome list
t2t-transformervision-transformervit
Related projects:
Repository | Description | Stars |
---|---|---|
zhoudaquan/dvit_repo | An implementation of Deep Vision Transformer models with modifications to improve performance by preventing attention collapse | 136 |
jeonsworld/vit-pytorch | A PyTorch implementation of the Vision Transformer model for image recognition tasks. | 1,940 |
whai362/pvt | An implementation of Pyramid Vision Transformers for image classification, object detection, and semantic segmentation tasks | 1,728 |
yiren-jian/blitext | Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
bigscience-workshop/megatron-deepspeed | A collection of tools and scripts for training large transformer language models at scale | 1,335 |
google-research/nested-transformer | An implementation of a transformer-based vision model that aggregates local transformers on image blocks to improve accuracy and efficiency. | 193 |
keyvank/femtogpt | A Rust implementation of a minimal Generative Pretrained Transformer architecture. | 834 |
microsoft/megatron-deepspeed | Research tool for training large transformer language models at scale | 1,895 |
google-research/vision_transformer | Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax | 10,450 |
microsoft/cvt | An implementation of a new neural network architecture that combines the strengths of convolutional and transformer designs to improve performance on image classification tasks. | 555 |
zsdonghao/spatial-transformer-nets | An implementation of Spatial Transformer Networks in TensorFlow for learning to apply transformations to images via classification tasks. | 36 |
fastnlp/cpt | A pre-trained transformer model for natural language understanding and generation tasks in Chinese | 481 |
dirtyharrylyl/transformer-in-vision | A collection of resources and papers related to Transformer-based computer vision models and techniques. | 1,319 |
atiyo/deep_image_prior | Reconstructs images using untrained neural networks to manipulate and transform existing images | 215 |
gordonhu608/mqt-llava | A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens. | 97 |