AITemplate

Neural network optimizer

A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving.

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

GitHub

5k stars
82 watching
370 forks
Language: Python
last commit: 29 days ago

Related projects:

Repository Description Stars
nvlabs/tiny-cuda-nn A C++/CUDA framework for training and querying neural networks using GPUs 3,763
nvlabs/instant-ngp A software toolkit for training and rendering neural graphics primitives 16,033
anishathalye/neural-style This is an implementation of neural style transfer in TensorFlow using the Adam optimizer. 5,541
sony/nnabla A deep learning framework that provides a flexible and expressive Python API for building and training neural networks on various platforms. 2,728
pytorch/pytorch A Python library providing tensors and dynamic neural networks with strong GPU acceleration 83,959
plasma-umass/scalene A high-performance Python profiler that analyzes CPU, GPU, and memory usage, providing detailed information and AI-powered optimization suggestions. 12,186
dao-ailab/flash-attention An open-source implementation of efficient attention mechanisms for neural networks 14,248
nvidia/minkowskiengine An auto-differentiation library for sparse tensors used in computer vision and deep learning applications. 2,485
facebookresearch/metaseq A codebase for working with Open Pre-trained Transformers, enabling deployment and fine-tuning of transformer models on various platforms. 6,515
facebookresearch/pytorch3d A deep learning library for 3D data processing and computer vision research using PyTorch 8,806
tflearn/tflearn A high-level API for deep learning that builds upon TensorFlow 9,619
dmitryulyanov/deep-image-prior A project demonstrating image restoration using neural networks without learning 7,886
luolc/adabound An optimizer that combines the benefits of Adam and SGD algorithms 2,907
nvlabs/neuralangelo An implementation of high-fidelity neural surface reconstruction from video frames using deep learning 4,396
nvidia/fastertransformer A high-performance transformer-based NLP component optimized for GPU acceleration and integration into various frameworks. 5,886