TensorRT-LLM

Inference optimizer

A software framework providing an easy-to-use Python API to optimize Large Language Models on NVIDIA GPUs for efficient inference.

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

GitHub

9k stars

95 watching

1k forks

Language: C++

last commit: 10 months ago

Linked from 1 awesome list

Screenshot of NVIDIA/TensorRT-LLM website

nvidia.github.io/TensorRT-LLM

Backlinks from these awesome lists:

hannibal046/awesome-llm

Related projects:

Repository	Description	Stars
tensorzero/tensorzero	A tool for optimizing large language models by collecting feedback and metrics to improve their performance over time	1,245
nvidia/tensorrt	Provides a set of tools and libraries for optimizing deep learning inference on NVIDIA GPUs.	10,926
mlc-ai/mlc-llm	A machine learning compiler and deployment engine for large language models	19,396
nvidia/fastertransformer	A high-performance transformer-based NLP component optimized for GPU acceleration and integration into various frameworks.	5,937
microsoft/deepspeed	A deep learning optimization library that simplifies distributed training and inference on modern computing hardware.	35,863
linkedin/liger-kernel	A collection of optimized kernels and post-training loss functions for large language models	3,840
internlm/lmdeploy	A toolkit for optimizing and serving large language models	4,854
sjtu-ipads/powerinfer	An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs	8,011
lifanghe/neurips18_surf	A toolbox implementing a sparse and low-rank tensor regression algorithm with boosting	12
langfuse/langfuse	An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools.	7,123
sgl-project/sglang	A fast serving framework for large language models and vision language models.	6,551
tensorlayer/tensorlayer	A deep learning and reinforcement learning library that provides an extensive collection of customizable neural layers to build advanced AI models quickly.	7,337
intel/neural-compressor	Tools and techniques for optimizing large language models on various frameworks and hardware platforms.	2,257
modeltc/lightllm	A Python-based framework for serving large language models with low latency and high scalability.	2,691
li2109/langtorch	Builds composable LLM applications with Java	295