vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

GitHub

27k stars
225 watching
4k forks
Language: Python
last commit: 10 days ago
Linked from 2 awesome lists

amdcudagptinferenceinferentiallamallmllm-servingllmopsmlopsmodel-servingpytorchrocmtputrainiumtransformerxpu

Backlinks from these awesome lists: