lightllm
LLM framework
An LLM inference and serving framework providing a lightweight design, scalability, and high-speed performance for large language models.
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
3k stars
23 watching
205 forks
Language: Python
last commit: 8 days ago
Linked from 1 awesome list
deep-learninggptllamallmmodel-servingnlpopenai-triton
Related projects:
Repository | Description | Stars |
---|---|---|
lyogavin/airllm | A Python library that optimizes inference memory usage for large language models on limited GPU resources. | 5,259 |
mlabonne/llm-course | A comprehensive course and resource package on building and deploying Large Language Models (LLMs) | 39,120 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,722 |
young-geng/easylm | A framework for training and serving large language models using JAX/Flax | 2,409 |
meta-llama/codellama | Provides inference code and tools for fine-tuning large language models, specifically designed for code generation tasks | 16,039 |
nlpxucan/wizardlm | Large pre-trained language models trained to follow complex instructions using an evolutionary instruction framework | 9,268 |
optimalscale/lmflow | A toolkit for finetuning large language models and providing efficient inference capabilities | 8,273 |
sgl-project/sglang | A framework for serving large language models and vision models with efficient runtime and flexible interface. | 6,082 |
ericlbuehler/mistral.rs | A fast and flexible LLM inference platform supporting various models and devices | 4,466 |
ggerganov/llama.cpp | Enables efficient inference of large language models using optimized C/C++ implementations and various backend frameworks | 68,190 |
vllm-project/vllm | A high-performance inference and serving engine for large language models. | 30,303 |
zilliztech/gptcache | A semantic cache designed to reduce the cost and improve the speed of LLM API calls by storing responses. | 7,232 |
mlc-ai/mlc-llm | Enables the development, optimization, and deployment of large language models on various platforms using a unified high-performance inference engine. | 19,197 |
fminference/flexllmgen | Generates large language model outputs in high-throughput mode on single GPUs | 9,192 |
mooler0410/llmspracticalguide | A curated list of resources to help developers navigate the landscape of large language models and their applications in NLP | 9,489 |