ipex-llm
LLM accelerator
An LLM acceleration library for Intel GPUs and other XPU devices, enabling fast inference and finetuning of large language models.
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
7k stars
253 watching
1k forks
Language: Python
last commit: about 1 month ago
Linked from 3 awesome lists
gpullmpytorchtransformers
Related projects:
Repository | Description | Stars |
---|---|---|
lyogavin/airllm | Optimizes large language model inference on limited GPU resources | 5,446 |
sjtu-ipads/powerinfer | An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs | 8,011 |
vllm-project/vllm | An inference and serving engine for large language models | 31,982 |
internlm/lmdeploy | A toolkit for optimizing and serving large language models | 4,854 |
fminference/flexllmgen | Generates large language model outputs in high-throughput mode on single GPUs | 9,236 |
nomic-ai/gpt4all | An open-source Python client for running Large Language Models (LLMs) locally on any device. | 71,176 |
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,683 |
autogptq/autogptq | A package for optimizing large language models for efficient inference on GPUs and other hardware platforms. | 4,560 |
modeltc/lightllm | A Python-based framework for serving large language models with low latency and high scalability. | 2,691 |
ggerganov/llama.cpp | Enables LLM inference with minimal setup and high performance on various hardware platforms | 69,185 |
microsoft/deepspeed | A deep learning optimization library that simplifies distributed training and inference on modern computing hardware. | 35,863 |
mit-han-lab/llm-awq | An open-source software project that enables efficient and accurate low-bit weight quantization for large language models. | 2,593 |
linkedin/liger-kernel | A collection of optimized kernels and post-training loss functions for large language models | 3,840 |
facebookincubator/aitemplate | A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving. | 4,573 |
sgl-project/sglang | A fast serving framework for large language models and vision language models. | 6,551 |