AITemplate

Neural network optimizer

A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving.

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

GitHub

5k stars

82 watching

372 forks

Language: Python

last commit: 8 months ago

Related projects:

Repository	Description	Stars
nvlabs/tiny-cuda-nn	A C++/CUDA framework for training and querying neural networks using GPUs	3,791
nvlabs/instant-ngp	A software toolkit for training and rendering neural graphics primitives	16,115
anishathalye/neural-style	This is an implementation of neural style transfer in TensorFlow using the Adam optimizer.	5,542
sony/nnabla	A deep learning framework that provides a flexible and expressive Python API for building and training neural networks on various platforms.	2,729
pytorch/pytorch	A Python library providing tensors and dynamic neural networks with strong GPU acceleration	84,978
plasma-umass/scalene	A high-performance Python profiler that analyzes CPU, GPU, and memory usage, providing detailed information and AI-powered optimization suggestions.	12,274
dao-ailab/flash-attention	Implementations of efficient exact attention mechanisms for machine learning	14,650
nvidia/minkowskiengine	An auto-differentiation library for sparse tensors used in computer vision and deep learning applications.	2,513
facebookresearch/metaseq	A codebase for working with Open Pre-trained Transformers, enabling deployment and fine-tuning of transformer models on various platforms.	6,519
facebookresearch/pytorch3d	A deep learning library for 3D data processing and computer vision research using PyTorch	8,889
tflearn/tflearn	A high-level API for deep learning that builds upon TensorFlow	9,621
dmitryulyanov/deep-image-prior	A project demonstrating image restoration using neural networks without learning	7,920
luolc/adabound	An optimizer that combines the benefits of Adam and SGD algorithms	2,908
nvlabs/neuralangelo	An implementation of high-fidelity neural surface reconstruction from video frames using deep learning	4,418
nvidia/fastertransformer	A high-performance transformer-based NLP component optimized for GPU acceleration and integration into various frameworks.	5,937