vllm

LLM server

An inference and serving engine for large language models

A high-throughput and memory-efficient inference and serving engine for LLMs

GitHub

32k stars

258 watching

5k forks

Language: Python

last commit: 7 months ago

Linked from 3 awesome lists

amdcudagpthpuinferenceinferentiallamallmllm-servingllmopsmlopsmodel-servingpytorchrocmtputrainiumtransformerxpu

docs.vllm.ai

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
alpha-vllm/llama2-accessory	An open-source toolkit for pretraining and fine-tuning large language models	2,732
internlm/lmdeploy	A toolkit for optimizing and serving large language models	4,854
opengvlab/llama-adapter	An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy	5,775
modeltc/lightllm	A Python-based framework for serving large language models with low latency and high scalability.	2,691
mit-han-lab/llm-awq	An open-source software project that enables efficient and accurate low-bit weight quantization for large language models.	2,593
lyogavin/airllm	Optimizes large language model inference on limited GPU resources	5,446
nomic-ai/gpt4all	An open-source Python client for running Large Language Models (LLMs) locally on any device.	71,176
optimalscale/lmflow	A toolkit for fine-tuning and inferring large machine learning models	8,312
lm-sys/fastchat	An open platform for training, serving, and evaluating large language models used in chatbots.	37,269
mintplex-labs/anything-llm	An all-in-one Desktop & Docker AI application with built-in RAG and support for multiple LLMs and vector databases.	28,746
ggerganov/llama.cpp	Enables LLM inference with minimal setup and high performance on various hardware platforms	69,185
scisharp/llamasharp	An efficient C#/.NET library for running Large Language Models (LLMs) on local devices	2,750
sgl-project/sglang	A fast serving framework for large language models and vision language models.	6,551
sjtu-ipads/powerinfer	An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs	8,011
eleutherai/lm-evaluation-harness	Provides a unified framework to test generative language models on various evaluation tasks.	7,200