lightllm

LLM framework

An LLM inference and serving framework providing a lightweight design, scalability, and high-speed performance for large language models.

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

GitHub

3k stars
23 watching
205 forks
Language: Python
last commit: 8 days ago
Linked from 1 awesome list

deep-learninggptllamallmmodel-servingnlpopenai-triton

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
lyogavin/airllm A Python library that optimizes inference memory usage for large language models on limited GPU resources. 5,259
mlabonne/llm-course A comprehensive course and resource package on building and deploying Large Language Models (LLMs) 39,120
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,722
young-geng/easylm A framework for training and serving large language models using JAX/Flax 2,409
meta-llama/codellama Provides inference code and tools for fine-tuning large language models, specifically designed for code generation tasks 16,039
nlpxucan/wizardlm Large pre-trained language models trained to follow complex instructions using an evolutionary instruction framework 9,268
optimalscale/lmflow A toolkit for finetuning large language models and providing efficient inference capabilities 8,273
sgl-project/sglang A framework for serving large language models and vision models with efficient runtime and flexible interface. 6,082
ericlbuehler/mistral.rs A fast and flexible LLM inference platform supporting various models and devices 4,466
ggerganov/llama.cpp Enables efficient inference of large language models using optimized C/C++ implementations and various backend frameworks 68,190
vllm-project/vllm A high-performance inference and serving engine for large language models. 30,303
zilliztech/gptcache A semantic cache designed to reduce the cost and improve the speed of LLM API calls by storing responses. 7,232
mlc-ai/mlc-llm Enables the development, optimization, and deployment of large language models on various platforms using a unified high-performance inference engine. 19,197
fminference/flexllmgen Generates large language model outputs in high-throughput mode on single GPUs 9,192
mooler0410/llmspracticalguide A curated list of resources to help developers navigate the landscape of large language models and their applications in NLP 9,489