ViT-pytorch

Vision Transformer

A PyTorch implementation of the Vision Transformer model for image recognition tasks.

Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)

GitHub

2k stars
13 watching
371 forks
Language: Jupyter Notebook
last commit: over 2 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
kaiyangzhou/dassl.pytorch A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision. 1,217
lucidrains/reformer-pytorch An implementation of Reformer, an efficient Transformer model for natural language processing tasks. 2,120
yitu-opensource/t2t-vit A deep learning framework for training vision transformers from scratch on image data. 1,148
google-research/nested-transformer An implementation of a transformer-based vision model that aggregates local transformers on image blocks to improve accuracy and efficiency. 193
felixgwu/img_classification_pk_pytorch A PyTorch project for comparing image classification models and facilitating quick experiment setup 365
whai362/pvt An implementation of Pyramid Vision Transformers for image classification, object detection, and semantic segmentation tasks 1,728
pixart-alpha/pixart-sigma Develops a PyTorch model for 4K text-to-image generation using diffusion transformer 1,675
leviswind/pytorch-transformer Implementation of a transformer-based translation model in PyTorch 239
t-vi/pytorch-tvmisc A collection of utilities and tools for building and improving deep learning models in PyTorch 468
jhjacobsen/pytorch-i-revnet Deep invertible neural network implementation using PyTorch for image recognition and reconstruction tasks. 389
kunpengli1994/vsrn An open-source PyTorch implementation of a visual semantic reasoning model for image-text matching 294
nickjiang2378/vl-interp This project provides an official PyTorch implementation of a method to interpret and edit vision-language representations to mitigate hallucinations in image captions. 31
potterhsu/svhnclassifier-pytorch A PyTorch implementation of multi-digit number recognition from street view imagery using deep convolutional neural networks 200
mattmacy/vnet.pytorch A PyTorch implementation of V-Net for volumetric medical image segmentation 694
mchong6/soat This repository provides a PyTorch implementation of an image manipulation technique using a pretrained StyleGAN model. 380