neural-compressor

Model optimizer

Tools and techniques for optimizing large language models on various frameworks and hardware platforms.

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

GitHub

2k stars
33 watching
257 forks
Language: Python
last commit: about 1 month ago
Linked from 2 awesome lists

auto-tuningawqfp4gptqint4int8knowledge-distillationlarge-language-modelslow-precisionmxformatpost-training-quantizationpruningquantizationquantization-aware-trainingsmoothquantsparsegptsparsity

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
neuralmagic/sparseml Enables the creation of smaller neural network models through efficient pruning and quantization techniques 2,083
microsoft/archai Automates the search for optimal neural network configurations in deep learning applications 468
lge-arc-advancedai/auptimizer Automates model building and deployment process by optimizing hyperparameters and compressing models for edge computing. 200
vahe1994/aqlm An implementation of a method to compress large language models using additive quantization and fine-tuning. 1,184
deepseek-ai/deepseek-moe A large language model with improved efficiency and performance compared to similar models 1,024
alibaba/conv-llava This project presents an optimization technique for large-scale image models to reduce computational requirements while maintaining performance. 106
tensorzero/tensorzero A tool for optimizing large language models by collecting feedback and metrics to improve their performance over time 1,245
lyogavin/anima An optimization technique for large language models allowing them to run on limited hardware resources without significant performance loss. 9
localminimum/qanet An implementation of Google's QANet for machine reading comprehension using TensorFlow. 983
dome272/wuerstchen A framework that enables efficient training of text-to-image models by compressing the computationally expensive stage into a latent space 531
datacanvasio/hypergbm Automated machine learning tool for tabular data pipelines 343
sayakpaul/adventures-in-tensorflow-lite A collection of notebooks demonstrating various techniques for optimizing and quantizing neural networks using TensorFlow Lite 172
preritj/segmentation Deep learning models for semantic segmentation of images 101
huggingface/optimum-quanto A PyTorch quantization backend for models. 847
google-deepmind/kfac-jax Library providing an implementation of the K-FAC optimizer and curvature estimator for second-order optimization in neural networks. 252