AITemplate
Neural network optimizer
A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving.
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
5k stars
82 watching
370 forks
Language: Python
last commit: 29 days ago Related projects:
Repository | Description | Stars |
---|---|---|
nvlabs/tiny-cuda-nn | A C++/CUDA framework for training and querying neural networks using GPUs | 3,763 |
nvlabs/instant-ngp | A software toolkit for training and rendering neural graphics primitives | 16,033 |
anishathalye/neural-style | This is an implementation of neural style transfer in TensorFlow using the Adam optimizer. | 5,541 |
sony/nnabla | A deep learning framework that provides a flexible and expressive Python API for building and training neural networks on various platforms. | 2,728 |
pytorch/pytorch | A Python library providing tensors and dynamic neural networks with strong GPU acceleration | 83,959 |
plasma-umass/scalene | A high-performance Python profiler that analyzes CPU, GPU, and memory usage, providing detailed information and AI-powered optimization suggestions. | 12,186 |
dao-ailab/flash-attention | An open-source implementation of efficient attention mechanisms for neural networks | 14,248 |
nvidia/minkowskiengine | An auto-differentiation library for sparse tensors used in computer vision and deep learning applications. | 2,485 |
facebookresearch/metaseq | A codebase for working with Open Pre-trained Transformers, enabling deployment and fine-tuning of transformer models on various platforms. | 6,515 |
facebookresearch/pytorch3d | A deep learning library for 3D data processing and computer vision research using PyTorch | 8,806 |
tflearn/tflearn | A high-level API for deep learning that builds upon TensorFlow | 9,619 |
dmitryulyanov/deep-image-prior | A project demonstrating image restoration using neural networks without learning | 7,886 |
luolc/adabound | An optimizer that combines the benefits of Adam and SGD algorithms | 2,907 |
nvlabs/neuralangelo | An implementation of high-fidelity neural surface reconstruction from video frames using deep learning | 4,396 |
nvidia/fastertransformer | A high-performance transformer-based NLP component optimized for GPU acceleration and integration into various frameworks. | 5,886 |