PowerInfer

LLM inference engine

An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

GitHub

8k stars
78 watching
412 forks
Language: C++
last commit: 3 months ago
Linked from 2 awesome lists

bamboo-7bfalconlarge-language-modelsllamallmllm-inferencelocal-inference

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
fminference/flexllmgen Generates large language model outputs in high-throughput mode on single GPUs 9,192
internlm/lmdeploy A toolkit for optimizing and serving large language models 4,653
microsoft/deepspeed A deep learning optimization library that makes distributed training and inference easy, efficient, and effective. 35,545
lyogavin/airllm A Python library that optimizes inference memory usage for large language models on limited GPU resources. 5,259
thudm/glm-130b An open-source implementation of a large bilingual language model pre-trained on vast amounts of text data. 7,659
sgl-project/sglang A framework for serving large language models and vision models with efficient runtime and flexible interface. 6,082
vllm-project/vllm A high-performance inference and serving engine for large language models. 30,303
hpcaitech/colossalai A toolkit for training and deploying large AI models in parallel on distributed computing infrastructure 38,828
xiaomi/mace A framework for deep learning inference on mobile devices 4,934
rapidsai/cuml A suite of libraries implementing machine learning algorithms and mathematical primitives on NVIDIA GPUs 4,251
mit-han-lab/llm-awq A tool for efficient and accurate weight quantization in large language models 2,517
huggingface/text-generation-inference A toolkit for deploying and serving Large Language Models. 9,106
autumnai/leaf An open machine learning framework for building classical, deep, or hybrid models on various hardware platforms. 5,557
tencent/hunyuandit A PyTorch-based diffusion transformer model for generating images with fine-grained Chinese understanding and text-to-image synthesis 3,456
higgsfield-ai/higgsfield A framework for efficient and fault-tolerant distributed training of large neural networks on multiple GPUs. 3,293