flash-attention
Attention algorithms
Implementations of efficient exact attention mechanisms for machine learning
Fast and memory-efficient exact attention
15k stars
122 watching
1k forks
Language: Python
last commit: 2 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving. | 4,573 |
| An optimizer that combines the benefits of Adam and SGD algorithms | 2,908 |
| Provides state-of-the-art video understanding codebase with efficient training methods and pre-trained models for various tasks | 6,680 |
| A Python library providing a flexible and fast image augmentation tool for machine learning and computer vision tasks. | 14,386 |
| A high-level abstraction of data on parallel architectures for efficient tensor computing and machine learning applications. | 4,587 |
| Automates machine learning workflows and optimizes model performance using large language models and efficient algorithms | 3,968 |
| A GPU-accelerated data manipulation library built on top of C++/CUDA and Apache Arrow. | 8,534 |
| An open-source memory debugger for multiple operating systems and platforms | 2,468 |
| A high-performance distributed deep learning framework supporting multiple frameworks and networks | 3,635 |
| A suite of libraries implementing machine learning algorithms and mathematical primitives on NVIDIA GPUs | 4,292 |
| A tool to simplify training and deployment of PyTorch models on various devices and configurations | 8,056 |
| A deep learning optimization library that simplifies distributed training and inference on modern computing hardware. | 35,863 |
| A C++ machine learning library with autograd support and high-performance defaults for efficient computation. | 5,300 |
| A framework for high-speed packet processing on Linux kernels. | 2,718 |
| A framework that automatically compresses and accelerates deep learning models to make them suitable for mobile devices with limited computational resources. | 2,787 |