exllama
GPU-based chat model
A re-implementation of Llama for efficient use with quantized weights on modern GPUs.
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
3k stars
37 watching
220 forks
Language: Python
last commit: over 1 year ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| A native PyTorch library for training large language models using distributed parallelism and optimization techniques. | 2,765 |
| Enables LLM inference with minimal setup and high performance on various hardware platforms | 69,185 |
| An open-source Python client for running Large Language Models (LLMs) locally on any device. | 71,176 |
| A system that uses large language and vision models to generate and process visual instructions | 20,683 |
| A package for optimizing large language models for efficient inference on GPUs and other hardware platforms. | 4,560 |
| An open-source toolkit for pretraining and fine-tuning large language models | 2,732 |
| Provides a framework for training large-scale language models on GPUs with advanced features and optimizations. | 6,997 |
| An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,775 |
| An implementation of a large language model using the nanoGPT architecture | 6,013 |
| A high-performance Python profiler that analyzes CPU, GPU, and memory usage, providing detailed information and AI-powered optimization suggestions. | 12,274 |
| A Python library providing tensors and dynamic neural networks with strong GPU acceleration | 84,978 |
| A framework for training large language models using scalable and optimized GPU techniques | 10,804 |
| An open-source software project that enables efficient and accurate low-bit weight quantization for large language models. | 2,593 |
| A tool for efficiently fine-tuning large language models across multiple architectures and methods. | 36,219 |
| A PyTorch library for easily authoring and experimenting with large language models | 4,479 |