vllm
LLM server
An inference and serving engine for large language models
A high-throughput and memory-efficient inference and serving engine for LLMs
32k stars
258 watching
5k forks
Language: Python
last commit: about 1 month ago
Linked from 3 awesome lists
amdcudagpthpuinferenceinferentiallamallmllm-servingllmopsmlopsmodel-servingpytorchrocmtputrainiumtransformerxpu
Related projects:
Repository | Description | Stars |
---|---|---|
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,732 |
internlm/lmdeploy | A toolkit for optimizing and serving large language models | 4,854 |
opengvlab/llama-adapter | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,775 |
modeltc/lightllm | A Python-based framework for serving large language models with low latency and high scalability. | 2,691 |
mit-han-lab/llm-awq | An open-source software project that enables efficient and accurate low-bit weight quantization for large language models. | 2,593 |
lyogavin/airllm | Optimizes large language model inference on limited GPU resources | 5,446 |
nomic-ai/gpt4all | An open-source Python client for running Large Language Models (LLMs) locally on any device. | 71,176 |
optimalscale/lmflow | A toolkit for fine-tuning and inferring large machine learning models | 8,312 |
lm-sys/fastchat | An open platform for training, serving, and evaluating large language models used in chatbots. | 37,269 |
mintplex-labs/anything-llm | An all-in-one Desktop & Docker AI application with built-in RAG and support for multiple LLMs and vector databases. | 28,746 |
ggerganov/llama.cpp | Enables LLM inference with minimal setup and high performance on various hardware platforms | 69,185 |
scisharp/llamasharp | An efficient C#/.NET library for running Large Language Models (LLMs) on local devices | 2,750 |
sgl-project/sglang | A fast serving framework for large language models and vision language models. | 6,551 |
sjtu-ipads/powerinfer | An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs | 8,011 |
eleutherai/lm-evaluation-harness | Provides a unified framework to test generative language models on various evaluation tasks. | 7,200 |