ipex-llm

LLM accelerator

An LLM acceleration library for Intel GPUs and other XPU devices, enabling fast inference and finetuning of large language models.

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc

GitHub

7k stars

253 watching

1k forks

Language: Python

last commit: 11 months ago

Linked from 3 awesome lists

gpullmpytorchtransformers

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
lyogavin/airllm	Optimizes large language model inference on limited GPU resources	5,446
sjtu-ipads/powerinfer	An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs	8,011
vllm-project/vllm	An inference and serving engine for large language models	31,982
internlm/lmdeploy	A toolkit for optimizing and serving large language models	4,854
fminference/flexllmgen	Generates large language model outputs in high-throughput mode on single GPUs	9,236
nomic-ai/gpt4all	An open-source Python client for running Large Language Models (LLMs) locally on any device.	71,176
haotian-liu/llava	A system that uses large language and vision models to generate and process visual instructions	20,683
autogptq/autogptq	A package for optimizing large language models for efficient inference on GPUs and other hardware platforms.	4,560
modeltc/lightllm	A Python-based framework for serving large language models with low latency and high scalability.	2,691
ggerganov/llama.cpp	Enables LLM inference with minimal setup and high performance on various hardware platforms	69,185
microsoft/deepspeed	A deep learning optimization library that simplifies distributed training and inference on modern computing hardware.	35,863
mit-han-lab/llm-awq	An open-source software project that enables efficient and accurate low-bit weight quantization for large language models.	2,593
linkedin/liger-kernel	A collection of optimized kernels and post-training loss functions for large language models	3,840
facebookincubator/aitemplate	A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving.	4,573
sgl-project/sglang	A fast serving framework for large language models and vision language models.	6,551