neural-compressor
Model optimizer
Tools and techniques for optimizing large language models on various frameworks and hardware platforms.
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
2k stars
33 watching
257 forks
Language: Python
last commit: about 1 month ago
Linked from 2 awesome lists
auto-tuningawqfp4gptqint4int8knowledge-distillationlarge-language-modelslow-precisionmxformatpost-training-quantizationpruningquantizationquantization-aware-trainingsmoothquantsparsegptsparsity
Related projects:
Repository | Description | Stars |
---|---|---|
neuralmagic/sparseml | Enables the creation of smaller neural network models through efficient pruning and quantization techniques | 2,083 |
microsoft/archai | Automates the search for optimal neural network configurations in deep learning applications | 468 |
lge-arc-advancedai/auptimizer | Automates model building and deployment process by optimizing hyperparameters and compressing models for edge computing. | 200 |
vahe1994/aqlm | An implementation of a method to compress large language models using additive quantization and fine-tuning. | 1,184 |
deepseek-ai/deepseek-moe | A large language model with improved efficiency and performance compared to similar models | 1,024 |
alibaba/conv-llava | This project presents an optimization technique for large-scale image models to reduce computational requirements while maintaining performance. | 106 |
tensorzero/tensorzero | A tool for optimizing large language models by collecting feedback and metrics to improve their performance over time | 1,245 |
lyogavin/anima | An optimization technique for large language models allowing them to run on limited hardware resources without significant performance loss. | 9 |
localminimum/qanet | An implementation of Google's QANet for machine reading comprehension using TensorFlow. | 983 |
dome272/wuerstchen | A framework that enables efficient training of text-to-image models by compressing the computationally expensive stage into a latent space | 531 |
datacanvasio/hypergbm | Automated machine learning tool for tabular data pipelines | 343 |
sayakpaul/adventures-in-tensorflow-lite | A collection of notebooks demonstrating various techniques for optimizing and quantizing neural networks using TensorFlow Lite | 172 |
preritj/segmentation | Deep learning models for semantic segmentation of images | 101 |
huggingface/optimum-quanto | A PyTorch quantization backend for models. | 847 |
google-deepmind/kfac-jax | Library providing an implementation of the K-FAC optimizer and curvature estimator for second-order optimization in neural networks. | 252 |