ipex-llm

LLM accelerator

An LLM acceleration library for Intel GPUs and other XPU devices, enabling fast inference and finetuning of large language models.

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc

GitHub

7k stars
253 watching
1k forks
Language: Python
last commit: about 1 month ago
Linked from 3 awesome lists

gpullmpytorchtransformers

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
lyogavin/airllm Optimizes large language model inference on limited GPU resources 5,446
sjtu-ipads/powerinfer An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs 8,011
vllm-project/vllm An inference and serving engine for large language models 31,982
internlm/lmdeploy A toolkit for optimizing and serving large language models 4,854
fminference/flexllmgen Generates large language model outputs in high-throughput mode on single GPUs 9,236
nomic-ai/gpt4all An open-source Python client for running Large Language Models (LLMs) locally on any device. 71,176
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,683
autogptq/autogptq A package for optimizing large language models for efficient inference on GPUs and other hardware platforms. 4,560
modeltc/lightllm A Python-based framework for serving large language models with low latency and high scalability. 2,691
ggerganov/llama.cpp Enables LLM inference with minimal setup and high performance on various hardware platforms 69,185
microsoft/deepspeed A deep learning optimization library that simplifies distributed training and inference on modern computing hardware. 35,863
mit-han-lab/llm-awq An open-source software project that enables efficient and accurate low-bit weight quantization for large language models. 2,593
linkedin/liger-kernel A collection of optimized kernels and post-training loss functions for large language models 3,840
facebookincubator/aitemplate A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving. 4,573
sgl-project/sglang A fast serving framework for large language models and vision language models. 6,551