neural-compressor
Model optimizer
Tools and techniques for optimizing large language models on various frameworks and hardware platforms.
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
2k stars
33 watching
257 forks
Language: Python
last commit: 2 months ago
Linked from 2 awesome lists
auto-tuningawqfp4gptqint4int8knowledge-distillationlarge-language-modelslow-precisionmxformatpost-training-quantizationpruningquantizationquantization-aware-trainingsmoothquantsparsegptsparsity
Related projects:
Repository | Description | Stars |
---|---|---|
| Enables the creation of smaller neural network models through efficient pruning and quantization techniques | 2,083 |
| Automates the search for optimal neural network configurations in deep learning applications | 468 |
| Automates model building and deployment process by optimizing hyperparameters and compressing models for edge computing. | 200 |
| An implementation of a method to compress large language models using additive quantization and fine-tuning. | 1,184 |
| A large language model with improved efficiency and performance compared to similar models | 1,024 |
| This project presents an optimization technique for large-scale image models to reduce computational requirements while maintaining performance. | 106 |
| A tool for optimizing large language models by collecting feedback and metrics to improve their performance over time | 1,245 |
| An optimization technique for large language models allowing them to run on limited hardware resources without significant performance loss. | 9 |
| An implementation of Google's QANet for machine reading comprehension using TensorFlow. | 983 |
| A framework that enables efficient training of text-to-image models by compressing the computationally expensive stage into a latent space | 531 |
| Automated machine learning tool for tabular data pipelines | 343 |
| A collection of notebooks demonstrating various techniques for optimizing and quantizing neural networks using TensorFlow Lite | 172 |
| Deep learning models for semantic segmentation of images | 101 |
| A PyTorch quantization backend for models. | 847 |
| Library providing an implementation of the K-FAC optimizer and curvature estimator for second-order optimization in neural networks. | 252 |