vllm

LLM server

An inference and serving engine for large language models

A high-throughput and memory-efficient inference and serving engine for LLMs

GitHub

32k stars
258 watching
5k forks
Language: Python
last commit: about 1 month ago
Linked from 3 awesome lists

amdcudagpthpuinferenceinferentiallamallmllm-servingllmopsmlopsmodel-servingpytorchrocmtputrainiumtransformerxpu

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,732
internlm/lmdeploy A toolkit for optimizing and serving large language models 4,854
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,775
modeltc/lightllm A Python-based framework for serving large language models with low latency and high scalability. 2,691
mit-han-lab/llm-awq An open-source software project that enables efficient and accurate low-bit weight quantization for large language models. 2,593
lyogavin/airllm Optimizes large language model inference on limited GPU resources 5,446
nomic-ai/gpt4all An open-source Python client for running Large Language Models (LLMs) locally on any device. 71,176
optimalscale/lmflow A toolkit for fine-tuning and inferring large machine learning models 8,312
lm-sys/fastchat An open platform for training, serving, and evaluating large language models used in chatbots. 37,269
mintplex-labs/anything-llm An all-in-one Desktop & Docker AI application with built-in RAG and support for multiple LLMs and vector databases. 28,746
ggerganov/llama.cpp Enables LLM inference with minimal setup and high performance on various hardware platforms 69,185
scisharp/llamasharp An efficient C#/.NET library for running Large Language Models (LLMs) on local devices 2,750
sgl-project/sglang A fast serving framework for large language models and vision language models. 6,551
sjtu-ipads/powerinfer An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs 8,011
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 7,200