PowerInfer
LLM inference engine
An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
8k stars
78 watching
415 forks
Language: C++
last commit: 4 months ago
Linked from 2 awesome lists
bamboo-7bfalconlarge-language-modelsllamallmllm-inferencelocal-inference
Related projects:
Repository | Description | Stars |
---|---|---|
fminference/flexllmgen | Generates large language model outputs in high-throughput mode on single GPUs | 9,236 |
internlm/lmdeploy | A toolkit for optimizing and serving large language models | 4,854 |
microsoft/deepspeed | A deep learning optimization library that simplifies distributed training and inference on modern computing hardware. | 35,863 |
lyogavin/airllm | Optimizes large language model inference on limited GPU resources | 5,446 |
thudm/glm-130b | An open-source implementation of a large bilingual language model pre-trained on vast amounts of text data. | 7,672 |
sgl-project/sglang | A fast serving framework for large language models and vision language models. | 6,551 |
vllm-project/vllm | An inference and serving engine for large language models | 31,982 |
hpcaitech/colossalai | A toolkit for training and deploying large AI models in parallel on distributed computing infrastructure | 38,907 |
xiaomi/mace | A framework for deep learning inference on mobile devices | 4,949 |
rapidsai/cuml | A suite of libraries implementing machine learning algorithms and mathematical primitives on NVIDIA GPUs | 4,292 |
mit-han-lab/llm-awq | An open-source software project that enables efficient and accurate low-bit weight quantization for large language models. | 2,593 |
huggingface/text-generation-inference | A toolkit for deploying and serving Large Language Models (LLMs) for high-performance text generation | 9,456 |
autumnai/leaf | An open machine learning framework for building classical, deep, or hybrid models on various hardware platforms. | 5,555 |
tencent/hunyuandit | A PyTorch model definition and inference/sampling code repository for a powerful diffusion transformer with fine-grained Chinese understanding | 3,678 |
higgsfield-ai/higgsfield | A framework for efficient and fault-tolerant distributed training of large neural networks on multiple GPUs. | 3,299 |