exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
3k stars
37 watching
214 forks
Language: Python
last commit: 12 months ago
Linked from 1 awesome list
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.