vllm

LLM server

A high-performance inference and serving engine for large language models.

A high-throughput and memory-efficient inference and serving engine for LLMs

GitHub

30k stars
247 watching
5k forks
Language: Python
last commit: 4 days ago
Linked from 3 awesome lists

amdcudagpthpuinferenceinferentiallamallmllm-servingllmopsmlopsmodel-servingpytorchrocmtputrainiumtransformerxpu

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,720
internlm/lmdeploy A toolkit for optimizing and serving large language models 4,653
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,754
modeltc/lightllm An LLM inference and serving framework providing a lightweight design, scalability, and high-speed performance for large language models. 2,609
mit-han-lab/llm-awq A tool for efficient and accurate weight quantization in large language models 2,517
lyogavin/airllm A Python library that optimizes inference memory usage for large language models on limited GPU resources. 5,259
nomic-ai/gpt4all An open-source Python client for running Large Language Models (LLMs) locally on any device. 70,694
optimalscale/lmflow A toolkit for finetuning large language models and providing efficient inference capabilities 8,273
lm-sys/fastchat An open platform for training, serving, and evaluating large language models used in chatbots. 36,975
mintplex-labs/anything-llm A full-stack application that enables users to turn any document into context for chatting with various Large Language Models (LLMs) and vector databases 27,283
ggerganov/llama.cpp Enables efficient inference of large language models using optimized C/C++ implementations and various backend frameworks 67,866
scisharp/llamasharp A C#/.NET library to efficiently run Large Language Models (LLMs) on local devices 2,673
sgl-project/sglang A framework for serving large language models and vision models with efficient runtime and flexible interface. 6,082
sjtu-ipads/powerinfer An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs 7,964
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 6,970