AITemplate
Neural network optimizer
A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving.
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
5k stars
82 watching
372 forks
Language: Python
last commit: 3 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| A C++/CUDA framework for training and querying neural networks using GPUs | 3,791 |
| A software toolkit for training and rendering neural graphics primitives | 16,115 |
| This is an implementation of neural style transfer in TensorFlow using the Adam optimizer. | 5,542 |
| A deep learning framework that provides a flexible and expressive Python API for building and training neural networks on various platforms. | 2,729 |
| A Python library providing tensors and dynamic neural networks with strong GPU acceleration | 84,978 |
| A high-performance Python profiler that analyzes CPU, GPU, and memory usage, providing detailed information and AI-powered optimization suggestions. | 12,274 |
| Implementations of efficient exact attention mechanisms for machine learning | 14,650 |
| An auto-differentiation library for sparse tensors used in computer vision and deep learning applications. | 2,513 |
| A codebase for working with Open Pre-trained Transformers, enabling deployment and fine-tuning of transformer models on various platforms. | 6,519 |
| A deep learning library for 3D data processing and computer vision research using PyTorch | 8,889 |
| A high-level API for deep learning that builds upon TensorFlow | 9,621 |
| A project demonstrating image restoration using neural networks without learning | 7,920 |
| An optimizer that combines the benefits of Adam and SGD algorithms | 2,908 |
| An implementation of high-fidelity neural surface reconstruction from video frames using deep learning | 4,418 |
| A high-performance transformer-based NLP component optimized for GPU acceleration and integration into various frameworks. | 5,937 |