AITemplate

Neural network optimizer

A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving.

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

GitHub

5k stars
82 watching
372 forks
Language: Python
last commit: about 1 month ago

Related projects:

Repository Description Stars
nvlabs/tiny-cuda-nn A C++/CUDA framework for training and querying neural networks using GPUs 3,791
nvlabs/instant-ngp A software toolkit for training and rendering neural graphics primitives 16,115
anishathalye/neural-style This is an implementation of neural style transfer in TensorFlow using the Adam optimizer. 5,542
sony/nnabla A deep learning framework that provides a flexible and expressive Python API for building and training neural networks on various platforms. 2,729
pytorch/pytorch A Python library providing tensors and dynamic neural networks with strong GPU acceleration 84,978
plasma-umass/scalene A high-performance Python profiler that analyzes CPU, GPU, and memory usage, providing detailed information and AI-powered optimization suggestions. 12,274
dao-ailab/flash-attention Implementations of efficient exact attention mechanisms for machine learning 14,650
nvidia/minkowskiengine An auto-differentiation library for sparse tensors used in computer vision and deep learning applications. 2,513
facebookresearch/metaseq A codebase for working with Open Pre-trained Transformers, enabling deployment and fine-tuning of transformer models on various platforms. 6,519
facebookresearch/pytorch3d A deep learning library for 3D data processing and computer vision research using PyTorch 8,889
tflearn/tflearn A high-level API for deep learning that builds upon TensorFlow 9,621
dmitryulyanov/deep-image-prior A project demonstrating image restoration using neural networks without learning 7,920
luolc/adabound An optimizer that combines the benefits of Adam and SGD algorithms 2,908
nvlabs/neuralangelo An implementation of high-fidelity neural surface reconstruction from video frames using deep learning 4,418
nvidia/fastertransformer A high-performance transformer-based NLP component optimized for GPU acceleration and integration into various frameworks. 5,937