T2T-ViT
Vision transformer trainer
A deep learning framework for training vision transformers from scratch on image data.
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
1k stars
18 watching
176 forks
Language: Jupyter Notebook
last commit: about 2 years ago
Linked from 1 awesome list
t2t-transformervision-transformervit
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | An implementation of Deep Vision Transformer models with modifications to improve performance by preventing attention collapse | 137 |
| | A PyTorch implementation of the Vision Transformer model for image recognition tasks. | 1,959 |
| | An implementation of Pyramid Vision Transformers for image classification, object detection, and semantic segmentation tasks | 1,745 |
| | Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
| | A collection of tools and scripts for training large transformer language models at scale | 1,342 |
| | An implementation of a transformer-based vision model that aggregates local transformers on image blocks to improve accuracy and efficiency. | 195 |
| | A Rust implementation of a minimal Generative Pretrained Transformer architecture. | 845 |
| | Research tool for training large transformer language models at scale | 1,926 |
| | Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax | 10,620 |
| | An implementation of a new neural network architecture that combines the strengths of convolutional and transformer designs to improve performance on image classification tasks. | 559 |
| | An implementation of Spatial Transformer Networks in TensorFlow for learning to apply transformations to images via classification tasks. | 36 |
| | A pre-trained transformer model for natural language understanding and generation tasks in Chinese | 482 |
| | A collection of resources and papers related to Transformer-based computer vision models and techniques. | 1,324 |
| | Reconstructs images using untrained neural networks to manipulate and transform existing images | 216 |
| | A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens. | 101 |