T2T-ViT
Vision transformer trainer
A deep learning framework for training vision transformers from scratch on image data.
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
1k stars
18 watching
176 forks
Language: Jupyter Notebook
last commit: over 1 year ago
Linked from 1 awesome list
t2t-transformervision-transformervit
Related projects:
Repository | Description | Stars |
---|---|---|
| An implementation of Deep Vision Transformer models with modifications to improve performance by preventing attention collapse | 137 |
| A PyTorch implementation of the Vision Transformer model for image recognition tasks. | 1,959 |
| An implementation of Pyramid Vision Transformers for image classification, object detection, and semantic segmentation tasks | 1,745 |
| Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
| A collection of tools and scripts for training large transformer language models at scale | 1,342 |
| An implementation of a transformer-based vision model that aggregates local transformers on image blocks to improve accuracy and efficiency. | 195 |
| A Rust implementation of a minimal Generative Pretrained Transformer architecture. | 845 |
| Research tool for training large transformer language models at scale | 1,926 |
| Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax | 10,620 |
| An implementation of a new neural network architecture that combines the strengths of convolutional and transformer designs to improve performance on image classification tasks. | 559 |
| An implementation of Spatial Transformer Networks in TensorFlow for learning to apply transformations to images via classification tasks. | 36 |
| A pre-trained transformer model for natural language understanding and generation tasks in Chinese | 482 |
| A collection of resources and papers related to Transformer-based computer vision models and techniques. | 1,324 |
| Reconstructs images using untrained neural networks to manipulate and transform existing images | 216 |
| A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens. | 101 |