neural-compressor

Model optimizer

Tools and techniques for optimizing large language models on various frameworks and hardware platforms.

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

GitHub

2k stars

33 watching

257 forks

Language: Python

last commit: 8 months ago

Linked from 2 awesome lists

auto-tuningawqfp4gptqint4int8knowledge-distillationlarge-language-modelslow-precisionmxformatpost-training-quantizationpruningquantizationquantization-aware-trainingsmoothquantsparsegptsparsity

Screenshot of intel/neural-compressor website

intel.github.io/neural-compressor/

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
neuralmagic/sparseml	Enables the creation of smaller neural network models through efficient pruning and quantization techniques	2,083
microsoft/archai	Automates the search for optimal neural network configurations in deep learning applications	468
lge-arc-advancedai/auptimizer	Automates model building and deployment process by optimizing hyperparameters and compressing models for edge computing.	200
vahe1994/aqlm	An implementation of a method to compress large language models using additive quantization and fine-tuning.	1,184
deepseek-ai/deepseek-moe	A large language model with improved efficiency and performance compared to similar models	1,024
alibaba/conv-llava	This project presents an optimization technique for large-scale image models to reduce computational requirements while maintaining performance.	106
tensorzero/tensorzero	A tool for optimizing large language models by collecting feedback and metrics to improve their performance over time	1,245
lyogavin/anima	An optimization technique for large language models allowing them to run on limited hardware resources without significant performance loss.	9
localminimum/qanet	An implementation of Google's QANet for machine reading comprehension using TensorFlow.	983
dome272/wuerstchen	A framework that enables efficient training of text-to-image models by compressing the computationally expensive stage into a latent space	531
datacanvasio/hypergbm	Automated machine learning tool for tabular data pipelines	343
sayakpaul/adventures-in-tensorflow-lite	A collection of notebooks demonstrating various techniques for optimizing and quantizing neural networks using TensorFlow Lite	172
preritj/segmentation	Deep learning models for semantic segmentation of images	101
huggingface/optimum-quanto	A PyTorch quantization backend for models.	847
google-deepmind/kfac-jax	Library providing an implementation of the K-FAC optimizer and curvature estimator for second-order optimization in neural networks.	252