ipex-llm

LLM accelerator

An LLM acceleration library for Intel hardware, providing seamless integration with various frameworks and models.

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc

GitHub

7k stars
251 watching
1k forks
Language: Python
last commit: 6 days ago
Linked from 3 awesome lists

gpullmpytorchtransformers

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
lyogavin/airllm A Python library that optimizes inference memory usage for large language models on limited GPU resources. 5,259
sjtu-ipads/powerinfer An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs 7,964
vllm-project/vllm A high-performance inference and serving engine for large language models. 30,303
internlm/lmdeploy A toolkit for optimizing and serving large language models 4,653
fminference/flexllmgen Generates large language model outputs in high-throughput mode on single GPUs 9,192
nomic-ai/gpt4all An open-source Python client for running Large Language Models (LLMs) locally on any device. 70,694
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,232
autogptq/autogptq A package for efficient inference and training of large language models using quantization techniques 4,476
modeltc/lightllm An LLM inference and serving framework providing a lightweight design, scalability, and high-speed performance for large language models. 2,609
ggerganov/llama.cpp Enables efficient inference of large language models using optimized C/C++ implementations and various backend frameworks 67,866
microsoft/deepspeed A deep learning optimization library that makes distributed training and inference easy, efficient, and effective. 35,463
mit-han-lab/llm-awq A tool for efficient and accurate weight quantization in large language models 2,517
linkedin/liger-kernel A collection of optimized kernels for efficient Large Language Model training on distributed computing frameworks 3,431
facebookincubator/aitemplate A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving. 4,561
sgl-project/sglang A framework for serving large language models and vision models with efficient runtime and flexible interface. 6,082