server

Inference server

An open-source software that enables deployment of AI models from multiple deep learning and machine learning frameworks on various devices

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

GitHub

8k stars
145 watching
1k forks
Language: Python
last commit: about 1 month ago
Linked from 3 awesome lists

clouddatacenterdeep-learningedgegpuinferencemachine-learning

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
triton-inference-server/client Client libraries and examples for communicating with Triton using various programming languages 579
tensorflow/serving A high-performance serving system for machine learning models in production environments. 6,195
openvinotoolkit/openvino A toolkit for optimizing and deploying artificial intelligence models in various applications 7,439
triton-lang/triton An intermediate language and compiler for efficient custom Deep-Learning primitives 13,712
jonathansalwan/triton A dynamic binary analysis library providing tools and components for program analysis, reverse engineering, and software verification. 3,565
nvidia/tensorrt Provides a set of tools and libraries for optimizing deep learning inference on NVIDIA GPUs. 10,926
huggingface/text-generation-inference A toolkit for deploying and serving Large Language Models (LLMs) for high-performance text generation 9,456
google-research/t5x A modular framework for training and deploying sequence models at scale 2,706
seldonio/mlserver An inference server for machine learning models with support for multiple frameworks and scalable deployment options. 737
bentoml/bentoml An open-source Python framework for building model inference APIs and serving AI models in production environments. 7,222
eleutherai/gpt-neox Provides a framework for training large-scale language models on GPUs with advanced features and optimizations. 6,997
microsoft/onnxruntime A cross-platform, high-performance machine learning accelerator 14,990
facebookincubator/aitemplate A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving. 4,573
sjtu-ipads/powerinfer An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs 8,011
fauxpilot/fauxpilot An open-source alternative to GitHub Copilot server using NVIDIA's Triton Inference Server with the FasterTransformer backend 14,629