dvit_repo

Vision transformer improvement

An implementation of Deep Vision Transformer models with modifications to improve performance by preventing attention collapse

GitHub

136 stars
5 watching
23 forks
Language: Python
last commit: almost 3 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
yitu-opensource/t2t-vit A deep learning framework for training vision transformers from scratch on image data. 1,160
zhendongwang6/uformer An implementation of a deep learning model for restoring images in various conditions 813
jeonsworld/vit-pytorch A PyTorch implementation of the Vision Transformer model for image recognition tasks. 1,959
whai362/pvt An implementation of Pyramid Vision Transformers for image classification, object detection, and semantic segmentation tasks 1,742
google-research/nested-transformer An implementation of a transformer-based vision model that aggregates local transformers on image blocks to improve accuracy and efficiency. 195
gordonhu608/mqt-llava A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens. 101
zsdonghao/spatial-transformer-nets An implementation of Spatial Transformer Networks in TensorFlow for learning to apply transformations to images via classification tasks. 36
atiyo/deep_image_prior Reconstructs images using untrained neural networks to manipulate and transform existing images 216
megvii-research/tlc Improves image restoration performance by converting global operations to local ones during inference 231
huawei-noah/pretrained-ipt This project develops a pre-trained transformer model for image processing tasks such as denoising, super-resolution, and deraining. 451
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
yiyangzhou/lure Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability. 136
fastnlp/cpt A pre-trained transformer model for natural language understanding and generation tasks in Chinese 482
dong-huo/vdip-deconvolution A method for blind image deconvolution using variational deep image prior. 13
dirtyharrylyl/transformer-in-vision A collection of resources and papers related to Transformer-based computer vision models and techniques. 1,324