airllm
Model optimizer
A Python library that optimizes inference memory usage for large language models on limited GPU resources.
AirLLM 70B inference with single 4GB GPU
5k stars
126 watching
422 forks
Language: Jupyter Notebook
last commit: about 2 months ago
Linked from 1 awesome list
chinese-llmchinese-nlpfinetunegenerative-aiinstruct-gptinstruction-setllamallmloraopen-modelsopen-sourceopen-source-modelsqlora
Related projects:
Repository | Description | Stars |
---|---|---|
modeltc/lightllm | An LLM inference and serving framework providing a lightweight design, scalability, and high-speed performance for large language models. | 2,609 |
ggerganov/llama.cpp | Enables efficient inference of large language models using optimized C/C++ implementations and various backend frameworks | 67,866 |
mit-han-lab/llm-awq | A tool for efficient and accurate weight quantization in large language models | 2,517 |
internlm/lmdeploy | A toolkit for optimizing and serving large language models | 4,653 |
opengvlab/llama-adapter | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,754 |
vllm-project/vllm | A high-performance inference and serving engine for large language models. | 30,303 |
optimalscale/lmflow | A toolkit for finetuning large language models and providing efficient inference capabilities | 8,273 |
fminference/flexllmgen | Generates large language model outputs in high-throughput mode on single GPUs | 9,192 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,720 |
young-geng/easylm | A framework for training and serving large language models using JAX/Flax | 2,409 |
nomic-ai/gpt4all | An open-source Python client for running Large Language Models (LLMs) locally on any device. | 70,694 |
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,232 |
sjtu-ipads/powerinfer | An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs | 7,964 |
thudm/glm-130b | An open-source implementation of a large bilingual language model pre-trained on vast amounts of text data. | 7,659 |
tloen/alpaca-lora | Tuning a large language model on consumer hardware using low-rank adaptation | 18,651 |