server
Inference server
Provides an optimized cloud and edge inferencing solution for AI models
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
8k stars
144 watching
1k forks
Language: Python
last commit: 3 days ago
Linked from 3 awesome lists
clouddatacenterdeep-learningedgegpuinferencemachine-learning
Related projects:
Repository | Description | Stars |
---|---|---|
triton-inference-server/client | Client libraries and examples for communicating with Triton using various programming languages | 567 |
tensorflow/serving | A high-performance serving system for machine learning models in production environments. | 6,185 |
openvinotoolkit/openvino | A toolkit for optimizing and deploying artificial intelligence models in various applications | 7,279 |
triton-lang/triton | A compiler and language for writing efficient custom Deep-Learning primitives. | 13,431 |
jonathansalwan/triton | A dynamic binary analysis library providing tools and components for program analysis, reverse engineering, and software verification. | 3,539 |
nvidia/tensorrt | A high-performance deep learning inference platform on NVIDIA GPUs | 10,807 |
huggingface/text-generation-inference | A toolkit for deploying and serving Large Language Models. | 9,106 |
google-research/t5x | A modular framework for training and deploying sequence models at scale | 2,682 |
seldonio/mlserver | An inference server for machine learning models with support for multiple frameworks and scalable deployment options. | 720 |
bentoml/bentoml | An open-source Python framework for building model inference APIs and serving AI models in production environments. | 7,153 |
eleutherai/gpt-neox | Provides a framework for training large-scale language models on GPUs with advanced features and optimizations. | 6,941 |
microsoft/onnxruntime | An open source software framework for high-performance machine learning inference and training acceleration | 14,697 |
facebookincubator/aitemplate | A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving. | 4,561 |
sjtu-ipads/powerinfer | An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs | 7,964 |
fauxpilot/fauxpilot | An open-source alternative to GitHub Copilot server using NVIDIA's Triton Inference Server with the FasterTransformer backend | 14,605 |