gptq

Quantizer

An implementation of post-training quantization algorithm for transformer models to reduce memory usage and improve inference speed

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

GitHub

2k stars
29 watching
154 forks
Language: Python
last commit: 8 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
microsoft/megatron-deepspeed Research tool for training large transformer language models at scale 1,895
bigscience-workshop/megatron-deepspeed A collection of tools and scripts for training large transformer language models at scale 1,335
keyvank/femtogpt A Rust implementation of a minimal Generative Pretrained Transformer architecture. 834
vahe1994/aqlm An implementation of a method to compress large language models using additive quantization and fine-tuning. 1,169
opengvlab/omniquant A software framework for accurately quantizing large language models using a novel technique 730
neukg/techgpt A generative transformer model designed to process and generate text in various vertical domains, including computer science, finance, and more. 212
pasqal-io/pyqtorch A PyTorch-based simulator for quantum machine learning 45
shi-labs/gfr-dsod Improving Object Detection from Scratch via Gated Feature Reuse 65
intel/neural-compressor Tools and techniques for optimizing large language models on various frameworks and hardware platforms. 2,226
matlab-deep-learning/transformer-models An implementation of deep learning transformer models in MATLAB 206
openai/finetune-transformer-lm This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. 2,160
jshilong/gpt4roi Training and deploying large language models on computer vision tasks using region-of-interest inputs 506
ahmedfgad/torchga Trains PyTorch models using the Genetic Algorithm 95
alex-berard/seq2seq An attention-based sequence-to-sequence learning framework 388
google/qkeras A deep learning library that provides an easy-to-use interface for quantizing neural networks and accelerating their inference on various hardware platforms. 540