lightllm
LLM server
A Python-based framework for serving large language models with low latency and high scalability.
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
3k stars
22 watching
216 forks
Language: Python
last commit: about 1 month ago
Linked from 1 awesome list
deep-learninggptllamallmmodel-servingnlpopenai-triton
Related projects:
Repository | Description | Stars |
---|---|---|
lyogavin/airllm | Optimizes large language model inference on limited GPU resources | 5,446 |
mlabonne/llm-course | A comprehensive course and resource package on building and deploying Large Language Models (LLMs) | 40,053 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,732 |
young-geng/easylm | A framework for training and serving large language models using JAX/Flax | 2,428 |
meta-llama/codellama | Provides inference code and tools for fine-tuning large language models, specifically designed for code generation tasks | 16,097 |
nlpxucan/wizardlm | Large pre-trained language models trained to follow complex instructions using an evolutionary instruction framework | 9,295 |
optimalscale/lmflow | A toolkit for fine-tuning and inferring large machine learning models | 8,312 |
sgl-project/sglang | A fast serving framework for large language models and vision language models. | 6,551 |
ericlbuehler/mistral.rs | A high-performance LLM inference framework written in Rust | 4,677 |
ggerganov/llama.cpp | Enables LLM inference with minimal setup and high performance on various hardware platforms | 69,185 |
vllm-project/vllm | An inference and serving engine for large language models | 31,982 |
zilliztech/gptcache | A semantic cache designed to reduce the cost and improve the speed of LLM API calls by storing responses. | 7,293 |
mlc-ai/mlc-llm | A machine learning compiler and deployment engine for large language models | 19,396 |
fminference/flexllmgen | Generates large language model outputs in high-throughput mode on single GPUs | 9,236 |
mooler0410/llmspracticalguide | A curated list of resources to help developers navigate the landscape of large language models and their applications in NLP | 9,551 |