exllama
GPU-based chat model
A re-implementation of Llama for efficient use with quantized weights on modern GPUs.
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
3k stars
37 watching
220 forks
Language: Python
last commit: about 1 year ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
pytorch/torchtitan | A native PyTorch library for training large language models using distributed parallelism and optimization techniques. | 2,765 |
ggerganov/llama.cpp | Enables LLM inference with minimal setup and high performance on various hardware platforms | 69,185 |
nomic-ai/gpt4all | An open-source Python client for running Large Language Models (LLMs) locally on any device. | 71,176 |
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,683 |
autogptq/autogptq | A package for optimizing large language models for efficient inference on GPUs and other hardware platforms. | 4,560 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,732 |
eleutherai/gpt-neox | Provides a framework for training large-scale language models on GPUs with advanced features and optimizations. | 6,997 |
opengvlab/llama-adapter | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,775 |
lightning-ai/lit-llama | An implementation of a large language model using the nanoGPT architecture | 6,013 |
plasma-umass/scalene | A high-performance Python profiler that analyzes CPU, GPU, and memory usage, providing detailed information and AI-powered optimization suggestions. | 12,274 |
pytorch/pytorch | A Python library providing tensors and dynamic neural networks with strong GPU acceleration | 84,978 |
nvidia/megatron-lm | A framework for training large language models using scalable and optimized GPU techniques | 10,804 |
mit-han-lab/llm-awq | An open-source software project that enables efficient and accurate low-bit weight quantization for large language models. | 2,593 |
hiyouga/llama-factory | A tool for efficiently fine-tuning large language models across multiple architectures and methods. | 36,219 |
pytorch/torchtune | A PyTorch library for easily authoring and experimenting with large language models | 4,479 |