vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
27k stars
225 watching
4k forks
Language: Python
last commit: 10 days ago
Linked from 2 awesome lists
amdcudagptinferenceinferentiallamallmllm-servingllmopsmlopsmodel-servingpytorchrocmtputrainiumtransformerxpu