lmdeploy

LLM toolkit

A toolkit for optimizing and serving large language models

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

GitHub

5k stars
39 watching
439 forks
Language: Python
last commit: about 1 month ago
Linked from 1 awesome list

codellamacuda-kernelsdeepspeedfastertransformerinternlmllamallama2llama3llmllm-inferenceturbomind

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
vllm-project/vllm An inference and serving engine for large language models 31,982
internlm/internlm A collection of large language models designed to improve reasoning and tool use capabilities in chatbots. 6,572
sjtu-ipads/powerinfer An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs 8,011
lyogavin/airllm Optimizes large language model inference on limited GPU resources 5,446
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,732
optimalscale/lmflow A toolkit for fine-tuning and inferring large machine learning models 8,312
mit-han-lab/llm-awq An open-source software project that enables efficient and accurate low-bit weight quantization for large language models. 2,593
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,775
nomic-ai/gpt4all An open-source Python client for running Large Language Models (LLMs) locally on any device. 71,176
modeltc/lightllm A Python-based framework for serving large language models with low latency and high scalability. 2,691
opengvlab/internvl Develops large language models capable of processing multiple data types and modalities 6,394
young-geng/easylm A framework for training and serving large language models using JAX/Flax 2,428
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 7,200
microsoft/deepspeed A deep learning optimization library that simplifies distributed training and inference on modern computing hardware. 35,863
hiyouga/llama-factory A tool for efficiently fine-tuning large language models across multiple architectures and methods. 36,219