neural-compressor

Model optimizer

Tools and techniques for optimizing large language models on various frameworks and hardware platforms.

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

GitHub

2k stars
34 watching
256 forks
Language: Python
last commit: 6 days ago
Linked from 2 awesome lists

auto-tuningawqfp4gptqint4int8knowledge-distillationlarge-language-modelslow-precisionmxformatpost-training-quantizationpruningquantizationquantization-aware-trainingsmoothquantsparsegptsparsity

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
neuralmagic/sparseml Enables the creation of smaller neural network models through efficient pruning and quantization techniques 2,071
microsoft/archai Automates the search for optimal neural network configurations in deep learning applications 467
lge-arc-advancedai/auptimizer Automates model building and deployment process by optimizing hyperparameters and compressing models for edge computing. 200
vahe1994/aqlm An implementation of a method to compress large language models using additive quantization and fine-tuning. 1,169
deepseek-ai/deepseek-moe A large language model with improved efficiency and performance compared to similar models 1,006
alibaba/conv-llava This project presents an optimization technique for large-scale image models to reduce computational requirements while maintaining performance. 104
tensorzero/tensorzero A tool that creates a feedback loop to optimize large language models by integrating model gateways and providing data analytics and machine learning capabilities. 569
lyogavin/anima An optimization technique for large language models allowing them to run on limited hardware resources without significant performance loss. 6
localminimum/qanet An implementation of Google's QANet for machine reading comprehension using TensorFlow. 983
dome272/wuerstchen A framework that enables efficient training of text-to-image models by compressing the computationally expensive stage into a latent space 528
datacanvasio/hypergbm An AutoML toolkit designed to automate the entire machine learning process pipeline for tabular data 337
sayakpaul/adventures-in-tensorflow-lite A collection of notebooks demonstrating various techniques for optimizing and quantizing neural networks using TensorFlow Lite 171
preritj/segmentation Deep learning models for semantic segmentation of images 100
huggingface/optimum-quanto A PyTorch quantization backend for models. 822
google-deepmind/kfac-jax Library providing an implementation of the K-FAC optimizer and curvature estimator for second-order optimization in neural networks. 248