dvit_repo
Vision transformer improvement
An implementation of Deep Vision Transformer models with modifications to improve performance by preventing attention collapse
136 stars
5 watching
23 forks
Language: Python
last commit: almost 3 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
yitu-opensource/t2t-vit | A deep learning framework for training vision transformers from scratch on image data. | 1,160 |
zhendongwang6/uformer | An implementation of a deep learning model for restoring images in various conditions | 813 |
jeonsworld/vit-pytorch | A PyTorch implementation of the Vision Transformer model for image recognition tasks. | 1,959 |
whai362/pvt | An implementation of Pyramid Vision Transformers for image classification, object detection, and semantic segmentation tasks | 1,742 |
google-research/nested-transformer | An implementation of a transformer-based vision model that aggregates local transformers on image blocks to improve accuracy and efficiency. | 195 |
gordonhu608/mqt-llava | A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens. | 101 |
zsdonghao/spatial-transformer-nets | An implementation of Spatial Transformer Networks in TensorFlow for learning to apply transformations to images via classification tasks. | 36 |
atiyo/deep_image_prior | Reconstructs images using untrained neural networks to manipulate and transform existing images | 216 |
megvii-research/tlc | Improves image restoration performance by converting global operations to local ones during inference | 231 |
huawei-noah/pretrained-ipt | This project develops a pre-trained transformer model for image processing tasks such as denoising, super-resolution, and deraining. | 451 |
yiren-jian/blitext | Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
yiyangzhou/lure | Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability. | 136 |
fastnlp/cpt | A pre-trained transformer model for natural language understanding and generation tasks in Chinese | 482 |
dong-huo/vdip-deconvolution | A method for blind image deconvolution using variational deep image prior. | 13 |
dirtyharrylyl/transformer-in-vision | A collection of resources and papers related to Transformer-based computer vision models and techniques. | 1,324 |