PowerInfer
LLM inference engine
An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
8k stars
78 watching
412 forks
Language: C++
last commit: 3 months ago
Linked from 2 awesome lists
bamboo-7bfalconlarge-language-modelsllamallmllm-inferencelocal-inference
Related projects:
Repository | Description | Stars |
---|---|---|
fminference/flexllmgen | Generates large language model outputs in high-throughput mode on single GPUs | 9,192 |
internlm/lmdeploy | A toolkit for optimizing and serving large language models | 4,653 |
microsoft/deepspeed | A deep learning optimization library that makes distributed training and inference easy, efficient, and effective. | 35,545 |
lyogavin/airllm | A Python library that optimizes inference memory usage for large language models on limited GPU resources. | 5,259 |
thudm/glm-130b | An open-source implementation of a large bilingual language model pre-trained on vast amounts of text data. | 7,659 |
sgl-project/sglang | A framework for serving large language models and vision models with efficient runtime and flexible interface. | 6,082 |
vllm-project/vllm | A high-performance inference and serving engine for large language models. | 30,303 |
hpcaitech/colossalai | A toolkit for training and deploying large AI models in parallel on distributed computing infrastructure | 38,828 |
xiaomi/mace | A framework for deep learning inference on mobile devices | 4,934 |
rapidsai/cuml | A suite of libraries implementing machine learning algorithms and mathematical primitives on NVIDIA GPUs | 4,251 |
mit-han-lab/llm-awq | A tool for efficient and accurate weight quantization in large language models | 2,517 |
huggingface/text-generation-inference | A toolkit for deploying and serving Large Language Models. | 9,106 |
autumnai/leaf | An open machine learning framework for building classical, deep, or hybrid models on various hardware platforms. | 5,557 |
tencent/hunyuandit | A PyTorch-based diffusion transformer model for generating images with fine-grained Chinese understanding and text-to-image synthesis | 3,456 |
higgsfield-ai/higgsfield | A framework for efficient and fault-tolerant distributed training of large neural networks on multiple GPUs. | 3,293 |