flash-attention

Attention library

An open-source implementation of efficient attention mechanisms for neural networks

Fast and memory-efficient exact attention

GitHub

14k stars
119 watching
1k forks
Language: Python
last commit: 5 days ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
facebookincubator/aitemplate A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving. 4,561
luolc/adabound An optimizer that combines the benefits of Adam and SGD algorithms 2,907
facebookresearch/slowfast Provides state-of-the-art video understanding codebase with efficient training methods and pre-trained models for various tasks 6,623
albumentations-team/albumentations A Python library for applying image transformations to data used in deep learning and computer vision tasks 14,254
arrayfire/arrayfire A high-level abstraction of data on parallel architectures for efficient tensor computing and machine learning applications. 4,564
microsoft/flaml Automates machine learning workflows and optimizes model performance using large language models and efficient algorithms 3,919
rapidsai/cudf A GPU-accelerated data manipulation library built on top of Arrow and libcudf. 8,448
dynamorio/drmemory An open-source memory debugger for multiple operating systems and platforms 2,443
bytedance/byteps A high-performance distributed deep learning framework supporting multiple frameworks and networks 3,630
rapidsai/cuml A suite of libraries implementing machine learning algorithms and mathematical primitives on NVIDIA GPUs 4,238
huggingface/accelerate A tool to simplify training and deployment of PyTorch models on various devices and configurations 7,947
microsoft/deepspeed A deep learning optimization library that makes distributed training and inference easy, efficient, and effective. 35,463
flashlight/flashlight A C++ machine learning library with autograd support and high-performance defaults for efficient computation. 5,285
ntop/pf_ring A framework for high-speed packet processing on Linux kernels. 2,698
tencent/pocketflow A framework that automatically compresses and accelerates deep learning models to make them suitable for mobile devices with limited computational resources. 2,788