Focal-Transformer
Attention-based transformer
A vision transformer architecture that uses a novel attention mechanism to capture local-global interactions in images
[NeurIPS 2021 Spotlight] Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"
545 stars
16 watching
60 forks
Language: Python
last commit: over 2 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
google-research/nested-transformer | An implementation of a transformer-based vision model that aggregates local transformers on image blocks to improve accuracy and efficiency. | 193 |
leviswind/pytorch-transformer | Implementation of a transformer-based translation model in PyTorch | 239 |
openai/sparse_attention | Provides primitives for sparse attention mechanisms used in transformer models to improve computational efficiency and scalability | 1,524 |
microsoft/vision-longformer | An implementation of a vision transformer architecture designed for high-resolution image encoding with multiple efficient attention mechanisms | 241 |
gamrix/cs231n_proj | This project focuses on manipulating 3D views using deep learning techniques. | 6 |
ngxbac/gain | A PyTorch implementation of an attention-guided inference network to focus on specific areas of objects in images | 48 |
chrislemke/sk-transformers | Provides a collection of reusable data transformation tools | 8 |
microsoft/megatron-deepspeed | Research tool for training large transformer language models at scale | 1,895 |
feature-engine/feature_engine | A Python library with multiple transformers to engineer and select features for use in machine learning models. | 1,926 |
gabeur/mmt | Develops a cross-modal architecture for video retrieval by combining multiple types of features from videos and text | 258 |
canjie-luo/moran_v2 | A deep learning framework for scene text recognition with rectification and attention mechanisms. | 636 |
pp00704831/banet-tip-2022 | A PyTorch implementation of an attention network for dynamic scene deblurring | 37 |
focalnet/networks-beyond-attention | A collection of modern neural network architectures for computer vision tasks that don't use self-attention mechanisms. | 77 |
pixart-alpha/pixart-sigma | Develops a PyTorch model for 4K text-to-image generation using diffusion transformer | 1,675 |
jeonsworld/vit-pytorch | A PyTorch implementation of the Vision Transformer model for image recognition tasks. | 1,940 |