ipex-llm
LLM accelerator
An LLM acceleration library for Intel hardware, providing seamless integration with various frameworks and models.
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
7k stars
251 watching
1k forks
Language: Python
last commit: 5 days ago
Linked from 3 awesome lists
gpullmpytorchtransformers
Related projects:
Repository | Description | Stars |
---|---|---|
lyogavin/airllm | Optimizes large language model inference on limited GPU resources | 5,345 |
sjtu-ipads/powerinfer | An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs | 7,973 |
vllm-project/vllm | A high-performance inference and serving engine for large language models | 30,794 |
internlm/lmdeploy | A toolkit for optimizing and serving large language models | 4,715 |
fminference/flexllmgen | Generates large language model outputs in high-throughput mode on single GPUs | 9,207 |
nomic-ai/gpt4all | An open-source Python client for running Large Language Models (LLMs) locally on any device. | 70,826 |
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,359 |
autogptq/autogptq | A package for efficient inference and training of large language models using quantization techniques | 4,501 |
modeltc/lightllm | A Python-based framework for serving large language models with low latency and high scalability. | 2,629 |
ggerganov/llama.cpp | Enables efficient inference of large language models using optimized C/C++ implementations and various backend frameworks | 68,190 |
microsoft/deepspeed | A deep learning optimization library that makes distributed training and inference easy, efficient, and effective. | 35,545 |
mit-han-lab/llm-awq | A tool for efficient and accurate weight quantization in large language models | 2,544 |
linkedin/liger-kernel | Efficient kernels for large language models | 3,514 |
facebookincubator/aitemplate | A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving. | 4,565 |
sgl-project/sglang | A framework for serving large language models and vision models with optimized performance and control. | 6,224 |