TensorRT-LLM
Language Model Optimizer
Provides an easy-to-use API to define and optimize Large Language Models (LLMs) for efficient inference on NVIDIA GPUs
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
9k stars
93 watching
990 forks
Language: C++
last commit: 9 days ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
tensorzero/tensorzero | A tool that creates a feedback loop to optimize large language models by integrating model gateways and providing data analytics and machine learning capabilities. | 569 |
nvidia/tensorrt | A high-performance deep learning inference platform on NVIDIA GPUs | 10,807 |
mlc-ai/mlc-llm | Enables the development, optimization, and deployment of large language models on various platforms using a unified high-performance inference engine. | 19,197 |
nvidia/fastertransformer | A high-performance transformer-based NLP component optimized for GPU acceleration and integration into various frameworks. | 5,886 |
microsoft/deepspeed | A deep learning optimization library that makes distributed training and inference easy, efficient, and effective. | 35,463 |
linkedin/liger-kernel | A collection of optimized kernels for efficient Large Language Model training on distributed computing frameworks | 3,431 |
internlm/lmdeploy | A toolkit for optimizing and serving large language models | 4,653 |
sjtu-ipads/powerinfer | An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs | 7,964 |
lifanghe/neurips18_surf | A toolbox implementing a sparse and low-rank tensor regression algorithm with boosting | 12 |
langfuse/langfuse | An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools. | 6,537 |
sgl-project/sglang | A framework for serving large language models and vision models with efficient runtime and flexible interface. | 6,082 |
tensorlayer/tensorlayer | A deep learning and reinforcement learning library that provides an extensive collection of customizable neural layers to build advanced AI models quickly. | 7,334 |
intel/neural-compressor | Tools and techniques for optimizing large language models on various frameworks and hardware platforms. | 2,226 |
modeltc/lightllm | An LLM inference and serving framework providing a lightweight design, scalability, and high-speed performance for large language models. | 2,609 |
li2109/langtorch | Builds composable LLM applications with Java | 294 |