dvit_repo
Vision transformer improvement
An implementation of Deep Vision Transformer models with modifications to improve performance by preventing attention collapse
137 stars
5 watching
23 forks
Language: Python
last commit: about 3 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| A deep learning framework for training vision transformers from scratch on image data. | 1,162 |
| An implementation of a deep learning model for restoring images in various conditions | 817 |
| A PyTorch implementation of the Vision Transformer model for image recognition tasks. | 1,959 |
| An implementation of Pyramid Vision Transformers for image classification, object detection, and semantic segmentation tasks | 1,745 |
| An implementation of a transformer-based vision model that aggregates local transformers on image blocks to improve accuracy and efficiency. | 195 |
| A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens. | 101 |
| An implementation of Spatial Transformer Networks in TensorFlow for learning to apply transformations to images via classification tasks. | 36 |
| Reconstructs images using untrained neural networks to manipulate and transform existing images | 216 |
| Improves image restoration performance by converting global operations to local ones during inference | 231 |
| This project develops a pre-trained transformer model for image processing tasks such as denoising, super-resolution, and deraining. | 451 |
| Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
| Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability. | 136 |
| A pre-trained transformer model for natural language understanding and generation tasks in Chinese | 482 |
| A method for blind image deconvolution using variational deep image prior. | 13 |
| A collection of resources and papers related to Transformer-based computer vision models and techniques. | 1,324 |