PowerInfer

LLM inference engine

An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

GitHub

8k stars
78 watching
415 forks
Language: C++
last commit: 4 months ago
Linked from 2 awesome lists

bamboo-7bfalconlarge-language-modelsllamallmllm-inferencelocal-inference

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
fminference/flexllmgen Generates large language model outputs in high-throughput mode on single GPUs 9,236
internlm/lmdeploy A toolkit for optimizing and serving large language models 4,854
microsoft/deepspeed A deep learning optimization library that simplifies distributed training and inference on modern computing hardware. 35,863
lyogavin/airllm Optimizes large language model inference on limited GPU resources 5,446
thudm/glm-130b An open-source implementation of a large bilingual language model pre-trained on vast amounts of text data. 7,672
sgl-project/sglang A fast serving framework for large language models and vision language models. 6,551
vllm-project/vllm An inference and serving engine for large language models 31,982
hpcaitech/colossalai A toolkit for training and deploying large AI models in parallel on distributed computing infrastructure 38,907
xiaomi/mace A framework for deep learning inference on mobile devices 4,949
rapidsai/cuml A suite of libraries implementing machine learning algorithms and mathematical primitives on NVIDIA GPUs 4,292
mit-han-lab/llm-awq An open-source software project that enables efficient and accurate low-bit weight quantization for large language models. 2,593
huggingface/text-generation-inference A toolkit for deploying and serving Large Language Models (LLMs) for high-performance text generation 9,456
autumnai/leaf An open machine learning framework for building classical, deep, or hybrid models on various hardware platforms. 5,555
tencent/hunyuandit A PyTorch model definition and inference/sampling code repository for a powerful diffusion transformer with fine-grained Chinese understanding 3,678
higgsfield-ai/higgsfield A framework for efficient and fault-tolerant distributed training of large neural networks on multiple GPUs. 3,299