TensorRT-LLM
Inference optimizer
A software framework providing an easy-to-use Python API to optimize Large Language Models on NVIDIA GPUs for efficient inference.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
9k stars
95 watching
1k forks
Language: C++
last commit: about 1 month ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
tensorzero/tensorzero | A tool for optimizing large language models by collecting feedback and metrics to improve their performance over time | 1,245 |
nvidia/tensorrt | Provides a set of tools and libraries for optimizing deep learning inference on NVIDIA GPUs. | 10,926 |
mlc-ai/mlc-llm | A machine learning compiler and deployment engine for large language models | 19,396 |
nvidia/fastertransformer | A high-performance transformer-based NLP component optimized for GPU acceleration and integration into various frameworks. | 5,937 |
microsoft/deepspeed | A deep learning optimization library that simplifies distributed training and inference on modern computing hardware. | 35,863 |
linkedin/liger-kernel | A collection of optimized kernels and post-training loss functions for large language models | 3,840 |
internlm/lmdeploy | A toolkit for optimizing and serving large language models | 4,854 |
sjtu-ipads/powerinfer | An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs | 8,011 |
lifanghe/neurips18_surf | A toolbox implementing a sparse and low-rank tensor regression algorithm with boosting | 12 |
langfuse/langfuse | An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools. | 7,123 |
sgl-project/sglang | A fast serving framework for large language models and vision language models. | 6,551 |
tensorlayer/tensorlayer | A deep learning and reinforcement learning library that provides an extensive collection of customizable neural layers to build advanced AI models quickly. | 7,337 |
intel/neural-compressor | Tools and techniques for optimizing large language models on various frameworks and hardware platforms. | 2,257 |
modeltc/lightllm | A Python-based framework for serving large language models with low latency and high scalability. | 2,691 |
li2109/langtorch | Builds composable LLM applications with Java | 295 |