airllm

Inference optimizer

Optimizes large language model inference on limited GPU resources

AirLLM 70B inference with single 4GB GPU

GitHub

5k stars

128 watching

437 forks

Language: Jupyter Notebook

last commit: 8 months ago

Linked from 1 awesome list

chinese-llmchinese-nlpfinetunegenerative-aiinstruct-gptinstruction-setllamallmloraopen-modelsopen-sourceopen-source-modelsqlora

Backlinks from these awesome lists:

ethicalml/awesome-production-machine-learning

Related projects:

Repository	Description	Stars
modeltc/lightllm	A Python-based framework for serving large language models with low latency and high scalability.	2,691
ggerganov/llama.cpp	Enables LLM inference with minimal setup and high performance on various hardware platforms	69,185
mit-han-lab/llm-awq	An open-source software project that enables efficient and accurate low-bit weight quantization for large language models.	2,593
internlm/lmdeploy	A toolkit for optimizing and serving large language models	4,854
opengvlab/llama-adapter	An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy	5,775
vllm-project/vllm	An inference and serving engine for large language models	31,982
optimalscale/lmflow	A toolkit for fine-tuning and inferring large machine learning models	8,312
fminference/flexllmgen	Generates large language model outputs in high-throughput mode on single GPUs	9,236
alpha-vllm/llama2-accessory	An open-source toolkit for pretraining and fine-tuning large language models	2,732
young-geng/easylm	A framework for training and serving large language models using JAX/Flax	2,428
nomic-ai/gpt4all	An open-source Python client for running Large Language Models (LLMs) locally on any device.	71,176
haotian-liu/llava	A system that uses large language and vision models to generate and process visual instructions	20,683
sjtu-ipads/powerinfer	An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs	8,011
thudm/glm-130b	An open-source implementation of a large bilingual language model pre-trained on vast amounts of text data.	7,672
tloen/alpaca-lora	Tuning a large language model on consumer hardware using low-rank adaptation	18,710