flash-attention

Attention algorithms

Implementations of efficient exact attention mechanisms for machine learning

Fast and memory-efficient exact attention

15k stars

122 watching

1k forks

Language: Python

last commit: 12 months ago

Linked from 1 awesome list

Backlinks from these awesome lists:

hannibal046/awesome-llm

Related projects:

Repository	Description	Stars
facebookincubator/aitemplate	A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving.	4,573
luolc/adabound	An optimizer that combines the benefits of Adam and SGD algorithms	2,908
facebookresearch/slowfast	Provides state-of-the-art video understanding codebase with efficient training methods and pre-trained models for various tasks	6,680
albumentations-team/albumentations	A Python library providing a flexible and fast image augmentation tool for machine learning and computer vision tasks.	14,386
arrayfire/arrayfire	A high-level abstraction of data on parallel architectures for efficient tensor computing and machine learning applications.	4,587
microsoft/flaml	Automates machine learning workflows and optimizes model performance using large language models and efficient algorithms	3,968
rapidsai/cudf	A GPU-accelerated data manipulation library built on top of C++/CUDA and Apache Arrow.	8,534
dynamorio/drmemory	An open-source memory debugger for multiple operating systems and platforms	2,468
bytedance/byteps	A high-performance distributed deep learning framework supporting multiple frameworks and networks	3,635
rapidsai/cuml	A suite of libraries implementing machine learning algorithms and mathematical primitives on NVIDIA GPUs	4,292
huggingface/accelerate	A tool to simplify training and deployment of PyTorch models on various devices and configurations	8,056
microsoft/deepspeed	A deep learning optimization library that simplifies distributed training and inference on modern computing hardware.	35,863
flashlight/flashlight	A C++ machine learning library with autograd support and high-performance defaults for efficient computation.	5,300
ntop/pf_ring	A framework for high-speed packet processing on Linux kernels.	2,718
tencent/pocketflow	A framework that automatically compresses and accelerates deep learning models to make them suitable for mobile devices with limited computational resources.	2,787