Focal-Transformer
Attention-based transformer
A vision transformer architecture that uses a novel attention mechanism to capture local-global interactions in images
[NeurIPS 2021 Spotlight] Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"
547 stars
16 watching
60 forks
Language: Python
last commit: almost 3 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| An implementation of a transformer-based vision model that aggregates local transformers on image blocks to improve accuracy and efficiency. | 195 |
| Implementation of a transformer-based translation model in PyTorch | 240 |
| Provides primitives for sparse attention mechanisms used in transformer models to improve computational efficiency and scalability | 1,533 |
| An implementation of a vision transformer architecture designed for high-resolution image encoding with multiple efficient attention mechanisms | 243 |
| This project focuses on manipulating 3D views using deep learning techniques. | 6 |
| A PyTorch implementation of an attention-guided inference network to focus on specific areas of objects in images | 48 |
| Provides a collection of reusable data transformation tools | 10 |
| Research tool for training large transformer language models at scale | 1,926 |
| A Python library with multiple transformers to engineer and select features for use in machine learning models. | 1,956 |
| Develops a cross-modal architecture for video retrieval by combining multiple types of features from videos and text | 259 |
| A deep learning framework for scene text recognition with rectification and attention mechanisms. | 639 |
| A PyTorch implementation of an attention network for dynamic scene deblurring | 37 |
| A collection of modern neural network architectures for computer vision tasks that don't use self-attention mechanisms. | 77 |
| Develops a PyTorch model for 4K text-to-image generation using diffusion transformer | 1,711 |
| A PyTorch implementation of the Vision Transformer model for image recognition tasks. | 1,959 |