vllm
LLM server
A high-performance inference and serving engine for large language models.
A high-throughput and memory-efficient inference and serving engine for LLMs
30k stars
247 watching
5k forks
Language: Python
last commit: 4 days ago
Linked from 3 awesome lists
amdcudagpthpuinferenceinferentiallamallmllm-servingllmopsmlopsmodel-servingpytorchrocmtputrainiumtransformerxpu
Related projects:
Repository | Description | Stars |
---|---|---|
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,720 |
internlm/lmdeploy | A toolkit for optimizing and serving large language models | 4,653 |
opengvlab/llama-adapter | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,754 |
modeltc/lightllm | An LLM inference and serving framework providing a lightweight design, scalability, and high-speed performance for large language models. | 2,609 |
mit-han-lab/llm-awq | A tool for efficient and accurate weight quantization in large language models | 2,517 |
lyogavin/airllm | A Python library that optimizes inference memory usage for large language models on limited GPU resources. | 5,259 |
nomic-ai/gpt4all | An open-source Python client for running Large Language Models (LLMs) locally on any device. | 70,694 |
optimalscale/lmflow | A toolkit for finetuning large language models and providing efficient inference capabilities | 8,273 |
lm-sys/fastchat | An open platform for training, serving, and evaluating large language models used in chatbots. | 36,975 |
mintplex-labs/anything-llm | A full-stack application that enables users to turn any document into context for chatting with various Large Language Models (LLMs) and vector databases | 27,283 |
ggerganov/llama.cpp | Enables efficient inference of large language models using optimized C/C++ implementations and various backend frameworks | 67,866 |
scisharp/llamasharp | A C#/.NET library to efficiently run Large Language Models (LLMs) on local devices | 2,673 |
sgl-project/sglang | A framework for serving large language models and vision models with efficient runtime and flexible interface. | 6,082 |
sjtu-ipads/powerinfer | An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs | 7,964 |
eleutherai/lm-evaluation-harness | Provides a unified framework to test generative language models on various evaluation tasks. | 6,970 |