exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

GitHub

3k stars
37 watching
214 forks
Language: Python
last commit: 12 months ago
Linked from 1 awesome list


Backlinks from these awesome lists: