neural-compressor
Model optimizer
Tools and techniques for optimizing large language models on various frameworks and hardware platforms.
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
2k stars
34 watching
256 forks
Language: Python
last commit: 6 days ago
Linked from 2 awesome lists
auto-tuningawqfp4gptqint4int8knowledge-distillationlarge-language-modelslow-precisionmxformatpost-training-quantizationpruningquantizationquantization-aware-trainingsmoothquantsparsegptsparsity
Related projects:
Repository | Description | Stars |
---|---|---|
neuralmagic/sparseml | Enables the creation of smaller neural network models through efficient pruning and quantization techniques | 2,071 |
microsoft/archai | Automates the search for optimal neural network configurations in deep learning applications | 467 |
lge-arc-advancedai/auptimizer | Automates model building and deployment process by optimizing hyperparameters and compressing models for edge computing. | 200 |
vahe1994/aqlm | An implementation of a method to compress large language models using additive quantization and fine-tuning. | 1,169 |
deepseek-ai/deepseek-moe | A large language model with improved efficiency and performance compared to similar models | 1,006 |
alibaba/conv-llava | This project presents an optimization technique for large-scale image models to reduce computational requirements while maintaining performance. | 104 |
tensorzero/tensorzero | A tool that creates a feedback loop to optimize large language models by integrating model gateways and providing data analytics and machine learning capabilities. | 569 |
lyogavin/anima | An optimization technique for large language models allowing them to run on limited hardware resources without significant performance loss. | 6 |
localminimum/qanet | An implementation of Google's QANet for machine reading comprehension using TensorFlow. | 983 |
dome272/wuerstchen | A framework that enables efficient training of text-to-image models by compressing the computationally expensive stage into a latent space | 528 |
datacanvasio/hypergbm | An AutoML toolkit designed to automate the entire machine learning process pipeline for tabular data | 337 |
sayakpaul/adventures-in-tensorflow-lite | A collection of notebooks demonstrating various techniques for optimizing and quantizing neural networks using TensorFlow Lite | 171 |
preritj/segmentation | Deep learning models for semantic segmentation of images | 100 |
huggingface/optimum-quanto | A PyTorch quantization backend for models. | 822 |
google-deepmind/kfac-jax | Library providing an implementation of the K-FAC optimizer and curvature estimator for second-order optimization in neural networks. | 248 |