server

Inference server

An open-source software that enables deployment of AI models from multiple deep learning and machine learning frameworks on various devices

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

GitHub

8k stars

145 watching

1k forks

Language: Python

last commit: 10 months ago

Linked from 3 awesome lists

clouddatacenterdeep-learningedgegpuinferencemachine-learning

Screenshot of triton-inference-server/server website

docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
triton-inference-server/client	Client libraries and examples for communicating with Triton using various programming languages	579
tensorflow/serving	A high-performance serving system for machine learning models in production environments.	6,195
openvinotoolkit/openvino	A toolkit for optimizing and deploying artificial intelligence models in various applications	7,439
triton-lang/triton	An intermediate language and compiler for efficient custom Deep-Learning primitives	13,712
jonathansalwan/triton	A dynamic binary analysis library providing tools and components for program analysis, reverse engineering, and software verification.	3,565
nvidia/tensorrt	Provides a set of tools and libraries for optimizing deep learning inference on NVIDIA GPUs.	10,926
huggingface/text-generation-inference	A toolkit for deploying and serving Large Language Models (LLMs) for high-performance text generation	9,456
google-research/t5x	A modular framework for training and deploying sequence models at scale	2,706
seldonio/mlserver	An inference server for machine learning models with support for multiple frameworks and scalable deployment options.	737
bentoml/bentoml	An open-source Python framework for building model inference APIs and serving AI models in production environments.	7,222
eleutherai/gpt-neox	Provides a framework for training large-scale language models on GPUs with advanced features and optimizations.	6,997
microsoft/onnxruntime	A cross-platform, high-performance machine learning accelerator	14,990
facebookincubator/aitemplate	A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving.	4,573
sjtu-ipads/powerinfer	An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs	8,011
fauxpilot/fauxpilot	An open-source alternative to GitHub Copilot server using NVIDIA's Triton Inference Server with the FasterTransformer backend	14,629