llm-awq
LLM Quantizer
A tool for efficient and accurate weight quantization in large language models
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
3k stars
24 watching
200 forks
Language: Python
last commit: about 1 month ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
lyogavin/airllm | A Python library that optimizes inference memory usage for large language models on limited GPU resources. | 5,259 |
vllm-project/vllm | A high-performance inference and serving engine for large language models. | 30,303 |
optimalscale/lmflow | A toolkit for finetuning large language models and providing efficient inference capabilities | 8,273 |
autogptq/autogptq | A package for efficient inference and training of large language models using quantization techniques | 4,476 |
opengvlab/llama-adapter | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,754 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,720 |
intel/neural-compressor | Tools and techniques for optimizing large language models on various frameworks and hardware platforms. | 2,226 |
internlm/lmdeploy | A toolkit for optimizing and serving large language models | 4,653 |
linkedin/liger-kernel | A collection of optimized kernels for efficient Large Language Model training on distributed computing frameworks | 3,431 |
modeltc/lightllm | An LLM inference and serving framework providing a lightweight design, scalability, and high-speed performance for large language models. | 2,609 |
microsoft/lmops | A research initiative focused on developing fundamental technology to improve the performance and efficiency of large language models. | 3,695 |
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,232 |
ggerganov/llama.cpp | Enables efficient inference of large language models using optimized C/C++ implementations and various backend frameworks | 67,866 |
mlc-ai/mlc-llm | Enables the development, optimization, and deployment of large language models on various platforms using a unified high-performance inference engine. | 19,197 |
nomic-ai/gpt4all | An open-source Python client for running Large Language Models (LLMs) locally on any device. | 70,694 |