TensorRT-LLM

Language Model Optimizer

Provides an easy-to-use API to define and optimize Large Language Models (LLMs) for efficient inference on NVIDIA GPUs

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

GitHub

9k stars
93 watching
990 forks
Language: C++
last commit: 9 days ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
tensorzero/tensorzero A tool that creates a feedback loop to optimize large language models by integrating model gateways and providing data analytics and machine learning capabilities. 569
nvidia/tensorrt A high-performance deep learning inference platform on NVIDIA GPUs 10,807
mlc-ai/mlc-llm Enables the development, optimization, and deployment of large language models on various platforms using a unified high-performance inference engine. 19,197
nvidia/fastertransformer A high-performance transformer-based NLP component optimized for GPU acceleration and integration into various frameworks. 5,886
microsoft/deepspeed A deep learning optimization library that makes distributed training and inference easy, efficient, and effective. 35,463
linkedin/liger-kernel A collection of optimized kernels for efficient Large Language Model training on distributed computing frameworks 3,431
internlm/lmdeploy A toolkit for optimizing and serving large language models 4,653
sjtu-ipads/powerinfer An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs 7,964
lifanghe/neurips18_surf A toolbox implementing a sparse and low-rank tensor regression algorithm with boosting 12
langfuse/langfuse An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools. 6,537
sgl-project/sglang A framework for serving large language models and vision models with efficient runtime and flexible interface. 6,082
tensorlayer/tensorlayer A deep learning and reinforcement learning library that provides an extensive collection of customizable neural layers to build advanced AI models quickly. 7,334
intel/neural-compressor Tools and techniques for optimizing large language models on various frameworks and hardware platforms. 2,226
modeltc/lightllm An LLM inference and serving framework providing a lightweight design, scalability, and high-speed performance for large language models. 2,609
li2109/langtorch Builds composable LLM applications with Java 294