ipex-llm

LLM accelerator

An LLM acceleration library for Intel hardware, providing seamless integration with various frameworks and models.

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc

GitHub

7k stars
251 watching
1k forks
Language: Python
last commit: 5 days ago
Linked from 3 awesome lists

gpullmpytorchtransformers

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
lyogavin/airllm Optimizes large language model inference on limited GPU resources 5,345
sjtu-ipads/powerinfer An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs 7,973
vllm-project/vllm A high-performance inference and serving engine for large language models 30,794
internlm/lmdeploy A toolkit for optimizing and serving large language models 4,715
fminference/flexllmgen Generates large language model outputs in high-throughput mode on single GPUs 9,207
nomic-ai/gpt4all An open-source Python client for running Large Language Models (LLMs) locally on any device. 70,826
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,359
autogptq/autogptq A package for efficient inference and training of large language models using quantization techniques 4,501
modeltc/lightllm A Python-based framework for serving large language models with low latency and high scalability. 2,629
ggerganov/llama.cpp Enables efficient inference of large language models using optimized C/C++ implementations and various backend frameworks 68,190
microsoft/deepspeed A deep learning optimization library that makes distributed training and inference easy, efficient, and effective. 35,545
mit-han-lab/llm-awq A tool for efficient and accurate weight quantization in large language models 2,544
linkedin/liger-kernel Efficient kernels for large language models 3,514
facebookincubator/aitemplate A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving. 4,565
sgl-project/sglang A framework for serving large language models and vision models with optimized performance and control. 6,224