exllama

GPU-based chat model

A re-implementation of Llama for efficient use with quantized weights on modern GPUs.

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

GitHub

3k stars
37 watching
220 forks
Language: Python
last commit: about 1 year ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
pytorch/torchtitan A native PyTorch library for training large language models using distributed parallelism and optimization techniques. 2,765
ggerganov/llama.cpp Enables LLM inference with minimal setup and high performance on various hardware platforms 69,185
nomic-ai/gpt4all An open-source Python client for running Large Language Models (LLMs) locally on any device. 71,176
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,683
autogptq/autogptq A package for optimizing large language models for efficient inference on GPUs and other hardware platforms. 4,560
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,732
eleutherai/gpt-neox Provides a framework for training large-scale language models on GPUs with advanced features and optimizations. 6,997
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,775
lightning-ai/lit-llama An implementation of a large language model using the nanoGPT architecture 6,013
plasma-umass/scalene A high-performance Python profiler that analyzes CPU, GPU, and memory usage, providing detailed information and AI-powered optimization suggestions. 12,274
pytorch/pytorch A Python library providing tensors and dynamic neural networks with strong GPU acceleration 84,978
nvidia/megatron-lm A framework for training large language models using scalable and optimized GPU techniques 10,804
mit-han-lab/llm-awq An open-source software project that enables efficient and accurate low-bit weight quantization for large language models. 2,593
hiyouga/llama-factory A tool for efficiently fine-tuning large language models across multiple architectures and methods. 36,219
pytorch/torchtune A PyTorch library for easily authoring and experimenting with large language models 4,479