server

Inference server

Provides an optimized cloud and edge inferencing solution for AI models

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

GitHub

8k stars
144 watching
1k forks
Language: Python
last commit: 3 days ago
Linked from 3 awesome lists

clouddatacenterdeep-learningedgegpuinferencemachine-learning

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
triton-inference-server/client Client libraries and examples for communicating with Triton using various programming languages 567
tensorflow/serving A high-performance serving system for machine learning models in production environments. 6,185
openvinotoolkit/openvino A toolkit for optimizing and deploying artificial intelligence models in various applications 7,279
triton-lang/triton A compiler and language for writing efficient custom Deep-Learning primitives. 13,431
jonathansalwan/triton A dynamic binary analysis library providing tools and components for program analysis, reverse engineering, and software verification. 3,539
nvidia/tensorrt A high-performance deep learning inference platform on NVIDIA GPUs 10,807
huggingface/text-generation-inference A toolkit for deploying and serving Large Language Models. 9,106
google-research/t5x A modular framework for training and deploying sequence models at scale 2,682
seldonio/mlserver An inference server for machine learning models with support for multiple frameworks and scalable deployment options. 720
bentoml/bentoml An open-source Python framework for building model inference APIs and serving AI models in production environments. 7,153
eleutherai/gpt-neox Provides a framework for training large-scale language models on GPUs with advanced features and optimizations. 6,941
microsoft/onnxruntime An open source software framework for high-performance machine learning inference and training acceleration 14,697
facebookincubator/aitemplate A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving. 4,561
sjtu-ipads/powerinfer An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs 7,964
fauxpilot/fauxpilot An open-source alternative to GitHub Copilot server using NVIDIA's Triton Inference Server with the FasterTransformer backend 14,605