gptq
Quantizer
An implementation of post-training quantization algorithm for transformer models to reduce memory usage and improve inference speed
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
2k stars
29 watching
154 forks
Language: Python
last commit: 8 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
microsoft/megatron-deepspeed | Research tool for training large transformer language models at scale | 1,895 |
bigscience-workshop/megatron-deepspeed | A collection of tools and scripts for training large transformer language models at scale | 1,335 |
keyvank/femtogpt | A Rust implementation of a minimal Generative Pretrained Transformer architecture. | 834 |
vahe1994/aqlm | An implementation of a method to compress large language models using additive quantization and fine-tuning. | 1,169 |
opengvlab/omniquant | A software framework for accurately quantizing large language models using a novel technique | 730 |
neukg/techgpt | A generative transformer model designed to process and generate text in various vertical domains, including computer science, finance, and more. | 212 |
pasqal-io/pyqtorch | A PyTorch-based simulator for quantum machine learning | 45 |
shi-labs/gfr-dsod | Improving Object Detection from Scratch via Gated Feature Reuse | 65 |
intel/neural-compressor | Tools and techniques for optimizing large language models on various frameworks and hardware platforms. | 2,226 |
matlab-deep-learning/transformer-models | An implementation of deep learning transformer models in MATLAB | 206 |
openai/finetune-transformer-lm | This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. | 2,160 |
jshilong/gpt4roi | Training and deploying large language models on computer vision tasks using region-of-interest inputs | 506 |
ahmedfgad/torchga | Trains PyTorch models using the Genetic Algorithm | 95 |
alex-berard/seq2seq | An attention-based sequence-to-sequence learning framework | 388 |
google/qkeras | A deep learning library that provides an easy-to-use interface for quantizing neural networks and accelerating their inference on various hardware platforms. | 540 |