llm-awq
LLM Quantizer
An open-source software project that enables efficient and accurate low-bit weight quantization for large language models.
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
3k stars
25 watching
212 forks
Language: Python
last commit: 4 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| Optimizes large language model inference on limited GPU resources | 5,446 |
| An inference and serving engine for large language models | 31,982 |
| A toolkit for fine-tuning and inferring large machine learning models | 8,312 |
| A package for optimizing large language models for efficient inference on GPUs and other hardware platforms. | 4,560 |
| An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,775 |
| An open-source toolkit for pretraining and fine-tuning large language models | 2,732 |
| Tools and techniques for optimizing large language models on various frameworks and hardware platforms. | 2,257 |
| A toolkit for optimizing and serving large language models | 4,854 |
| A collection of optimized kernels and post-training loss functions for large language models | 3,840 |
| A Python-based framework for serving large language models with low latency and high scalability. | 2,691 |
| A research initiative focused on developing fundamental technology to improve the performance and efficiency of large language models. | 3,747 |
| A system that uses large language and vision models to generate and process visual instructions | 20,683 |
| Enables LLM inference with minimal setup and high performance on various hardware platforms | 69,185 |
| A machine learning compiler and deployment engine for large language models | 19,396 |
| An open-source Python client for running Large Language Models (LLMs) locally on any device. | 71,176 |