server
Inference server
An open-source software that enables deployment of AI models from multiple deep learning and machine learning frameworks on various devices
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
8k stars
145 watching
1k forks
Language: Python
last commit: about 1 month ago
Linked from 3 awesome lists
clouddatacenterdeep-learningedgegpuinferencemachine-learning
Related projects:
Repository | Description | Stars |
---|---|---|
triton-inference-server/client | Client libraries and examples for communicating with Triton using various programming languages | 579 |
tensorflow/serving | A high-performance serving system for machine learning models in production environments. | 6,195 |
openvinotoolkit/openvino | A toolkit for optimizing and deploying artificial intelligence models in various applications | 7,439 |
triton-lang/triton | An intermediate language and compiler for efficient custom Deep-Learning primitives | 13,712 |
jonathansalwan/triton | A dynamic binary analysis library providing tools and components for program analysis, reverse engineering, and software verification. | 3,565 |
nvidia/tensorrt | Provides a set of tools and libraries for optimizing deep learning inference on NVIDIA GPUs. | 10,926 |
huggingface/text-generation-inference | A toolkit for deploying and serving Large Language Models (LLMs) for high-performance text generation | 9,456 |
google-research/t5x | A modular framework for training and deploying sequence models at scale | 2,706 |
seldonio/mlserver | An inference server for machine learning models with support for multiple frameworks and scalable deployment options. | 737 |
bentoml/bentoml | An open-source Python framework for building model inference APIs and serving AI models in production environments. | 7,222 |
eleutherai/gpt-neox | Provides a framework for training large-scale language models on GPUs with advanced features and optimizations. | 6,997 |
microsoft/onnxruntime | A cross-platform, high-performance machine learning accelerator | 14,990 |
facebookincubator/aitemplate | A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving. | 4,573 |
sjtu-ipads/powerinfer | An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs | 8,011 |
fauxpilot/fauxpilot | An open-source alternative to GitHub Copilot server using NVIDIA's Triton Inference Server with the FasterTransformer backend | 14,629 |