lightllm

LLM server

A Python-based framework for serving large language models with low latency and high scalability.

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

GitHub

3k stars
22 watching
216 forks
Language: Python
last commit: about 1 month ago
Linked from 1 awesome list

deep-learninggptllamallmmodel-servingnlpopenai-triton

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
lyogavin/airllm Optimizes large language model inference on limited GPU resources 5,446
mlabonne/llm-course A comprehensive course and resource package on building and deploying Large Language Models (LLMs) 40,053
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,732
young-geng/easylm A framework for training and serving large language models using JAX/Flax 2,428
meta-llama/codellama Provides inference code and tools for fine-tuning large language models, specifically designed for code generation tasks 16,097
nlpxucan/wizardlm Large pre-trained language models trained to follow complex instructions using an evolutionary instruction framework 9,295
optimalscale/lmflow A toolkit for fine-tuning and inferring large machine learning models 8,312
sgl-project/sglang A fast serving framework for large language models and vision language models. 6,551
ericlbuehler/mistral.rs A high-performance LLM inference framework written in Rust 4,677
ggerganov/llama.cpp Enables LLM inference with minimal setup and high performance on various hardware platforms 69,185
vllm-project/vllm An inference and serving engine for large language models 31,982
zilliztech/gptcache A semantic cache designed to reduce the cost and improve the speed of LLM API calls by storing responses. 7,293
mlc-ai/mlc-llm A machine learning compiler and deployment engine for large language models 19,396
fminference/flexllmgen Generates large language model outputs in high-throughput mode on single GPUs 9,236
mooler0410/llmspracticalguide A curated list of resources to help developers navigate the landscape of large language models and their applications in NLP 9,551