lightllm

LLM server

A Python-based framework for serving large language models with low latency and high scalability.

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

GitHub

3k stars

22 watching

216 forks

Language: Python

last commit: 8 months ago

Linked from 1 awesome list

deep-learninggptllamallmmodel-servingnlpopenai-triton

Backlinks from these awesome lists:

ethicalml/awesome-production-machine-learning

Related projects:

Repository	Description	Stars
lyogavin/airllm	Optimizes large language model inference on limited GPU resources	5,446
mlabonne/llm-course	A comprehensive course and resource package on building and deploying Large Language Models (LLMs)	40,053
alpha-vllm/llama2-accessory	An open-source toolkit for pretraining and fine-tuning large language models	2,732
young-geng/easylm	A framework for training and serving large language models using JAX/Flax	2,428
meta-llama/codellama	Provides inference code and tools for fine-tuning large language models, specifically designed for code generation tasks	16,097
nlpxucan/wizardlm	Large pre-trained language models trained to follow complex instructions using an evolutionary instruction framework	9,295
optimalscale/lmflow	A toolkit for fine-tuning and inferring large machine learning models	8,312
sgl-project/sglang	A fast serving framework for large language models and vision language models.	6,551
ericlbuehler/mistral.rs	A high-performance LLM inference framework written in Rust	4,677
ggerganov/llama.cpp	Enables LLM inference with minimal setup and high performance on various hardware platforms	69,185
vllm-project/vllm	An inference and serving engine for large language models	31,982
zilliztech/gptcache	A semantic cache designed to reduce the cost and improve the speed of LLM API calls by storing responses.	7,293
mlc-ai/mlc-llm	A machine learning compiler and deployment engine for large language models	19,396
fminference/flexllmgen	Generates large language model outputs in high-throughput mode on single GPUs	9,236
mooler0410/llmspracticalguide	A curated list of resources to help developers navigate the landscape of large language models and their applications in NLP	9,551