exllama

GPU-based chat model

A re-implementation of Llama for efficient use with quantized weights on modern GPUs.

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

GitHub

3k stars

37 watching

220 forks

Language: Python

last commit: almost 2 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

hannibal046/awesome-llm

Related projects:

Repository	Description	Stars
pytorch/torchtitan	A native PyTorch library for training large language models using distributed parallelism and optimization techniques.	2,765
ggerganov/llama.cpp	Enables LLM inference with minimal setup and high performance on various hardware platforms	69,185
nomic-ai/gpt4all	An open-source Python client for running Large Language Models (LLMs) locally on any device.	71,176
haotian-liu/llava	A system that uses large language and vision models to generate and process visual instructions	20,683
autogptq/autogptq	A package for optimizing large language models for efficient inference on GPUs and other hardware platforms.	4,560
alpha-vllm/llama2-accessory	An open-source toolkit for pretraining and fine-tuning large language models	2,732
eleutherai/gpt-neox	Provides a framework for training large-scale language models on GPUs with advanced features and optimizations.	6,997
opengvlab/llama-adapter	An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy	5,775
lightning-ai/lit-llama	An implementation of a large language model using the nanoGPT architecture	6,013
plasma-umass/scalene	A high-performance Python profiler that analyzes CPU, GPU, and memory usage, providing detailed information and AI-powered optimization suggestions.	12,274
pytorch/pytorch	A Python library providing tensors and dynamic neural networks with strong GPU acceleration	84,978
nvidia/megatron-lm	A framework for training large language models using scalable and optimized GPU techniques	10,804
mit-han-lab/llm-awq	An open-source software project that enables efficient and accurate low-bit weight quantization for large language models.	2,593
hiyouga/llama-factory	A tool for efficiently fine-tuning large language models across multiple architectures and methods.	36,219
pytorch/torchtune	A PyTorch library for easily authoring and experimenting with large language models	4,479