llm-awq

LLM Quantizer

A tool for efficient and accurate weight quantization in large language models

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

GitHub

3k stars
24 watching
200 forks
Language: Python
last commit: about 1 month ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
lyogavin/airllm A Python library that optimizes inference memory usage for large language models on limited GPU resources. 5,259
vllm-project/vllm A high-performance inference and serving engine for large language models. 30,303
optimalscale/lmflow A toolkit for finetuning large language models and providing efficient inference capabilities 8,273
autogptq/autogptq A package for efficient inference and training of large language models using quantization techniques 4,476
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,754
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,720
intel/neural-compressor Tools and techniques for optimizing large language models on various frameworks and hardware platforms. 2,226
internlm/lmdeploy A toolkit for optimizing and serving large language models 4,653
linkedin/liger-kernel A collection of optimized kernels for efficient Large Language Model training on distributed computing frameworks 3,431
modeltc/lightllm An LLM inference and serving framework providing a lightweight design, scalability, and high-speed performance for large language models. 2,609
microsoft/lmops A research initiative focused on developing fundamental technology to improve the performance and efficiency of large language models. 3,695
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,232
ggerganov/llama.cpp Enables efficient inference of large language models using optimized C/C++ implementations and various backend frameworks 67,866
mlc-ai/mlc-llm Enables the development, optimization, and deployment of large language models on various platforms using a unified high-performance inference engine. 19,197
nomic-ai/gpt4all An open-source Python client for running Large Language Models (LLMs) locally on any device. 70,694