airllm

Inference optimizer

Optimizes large language model inference on limited GPU resources

AirLLM 70B inference with single 4GB GPU

GitHub

5k stars
128 watching
437 forks
Language: Jupyter Notebook
last commit: about 2 months ago
Linked from 1 awesome list

chinese-llmchinese-nlpfinetunegenerative-aiinstruct-gptinstruction-setllamallmloraopen-modelsopen-sourceopen-source-modelsqlora

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
modeltc/lightllm A Python-based framework for serving large language models with low latency and high scalability. 2,691
ggerganov/llama.cpp Enables LLM inference with minimal setup and high performance on various hardware platforms 69,185
mit-han-lab/llm-awq An open-source software project that enables efficient and accurate low-bit weight quantization for large language models. 2,593
internlm/lmdeploy A toolkit for optimizing and serving large language models 4,854
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,775
vllm-project/vllm An inference and serving engine for large language models 31,982
optimalscale/lmflow A toolkit for fine-tuning and inferring large machine learning models 8,312
fminference/flexllmgen Generates large language model outputs in high-throughput mode on single GPUs 9,236
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,732
young-geng/easylm A framework for training and serving large language models using JAX/Flax 2,428
nomic-ai/gpt4all An open-source Python client for running Large Language Models (LLMs) locally on any device. 71,176
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,683
sjtu-ipads/powerinfer An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs 8,011
thudm/glm-130b An open-source implementation of a large bilingual language model pre-trained on vast amounts of text data. 7,672
tloen/alpaca-lora Tuning a large language model on consumer hardware using low-rank adaptation 18,710