airllm

Model optimizer

A Python library that optimizes inference memory usage for large language models on limited GPU resources.

AirLLM 70B inference with single 4GB GPU

GitHub

5k stars
126 watching
422 forks
Language: Jupyter Notebook
last commit: about 2 months ago
Linked from 1 awesome list

chinese-llmchinese-nlpfinetunegenerative-aiinstruct-gptinstruction-setllamallmloraopen-modelsopen-sourceopen-source-modelsqlora

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
modeltc/lightllm An LLM inference and serving framework providing a lightweight design, scalability, and high-speed performance for large language models. 2,609
ggerganov/llama.cpp Enables efficient inference of large language models using optimized C/C++ implementations and various backend frameworks 67,866
mit-han-lab/llm-awq A tool for efficient and accurate weight quantization in large language models 2,517
internlm/lmdeploy A toolkit for optimizing and serving large language models 4,653
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,754
vllm-project/vllm A high-performance inference and serving engine for large language models. 30,303
optimalscale/lmflow A toolkit for finetuning large language models and providing efficient inference capabilities 8,273
fminference/flexllmgen Generates large language model outputs in high-throughput mode on single GPUs 9,192
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,720
young-geng/easylm A framework for training and serving large language models using JAX/Flax 2,409
nomic-ai/gpt4all An open-source Python client for running Large Language Models (LLMs) locally on any device. 70,694
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,232
sjtu-ipads/powerinfer An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs 7,964
thudm/glm-130b An open-source implementation of a large bilingual language model pre-trained on vast amounts of text data. 7,659
tloen/alpaca-lora Tuning a large language model on consumer hardware using low-rank adaptation 18,651