ipex-llm
LLM accelerator
An LLM acceleration library for Intel hardware, providing seamless integration with various frameworks and models.
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
7k stars
251 watching
1k forks
Language: Python
last commit: 6 days ago
Linked from 3 awesome lists
gpullmpytorchtransformers
Related projects:
Repository | Description | Stars |
---|---|---|
lyogavin/airllm | A Python library that optimizes inference memory usage for large language models on limited GPU resources. | 5,259 |
sjtu-ipads/powerinfer | An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs | 7,964 |
vllm-project/vllm | A high-performance inference and serving engine for large language models. | 30,303 |
internlm/lmdeploy | A toolkit for optimizing and serving large language models | 4,653 |
fminference/flexllmgen | Generates large language model outputs in high-throughput mode on single GPUs | 9,192 |
nomic-ai/gpt4all | An open-source Python client for running Large Language Models (LLMs) locally on any device. | 70,694 |
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,232 |
autogptq/autogptq | A package for efficient inference and training of large language models using quantization techniques | 4,476 |
modeltc/lightllm | An LLM inference and serving framework providing a lightweight design, scalability, and high-speed performance for large language models. | 2,609 |
ggerganov/llama.cpp | Enables efficient inference of large language models using optimized C/C++ implementations and various backend frameworks | 67,866 |
microsoft/deepspeed | A deep learning optimization library that makes distributed training and inference easy, efficient, and effective. | 35,463 |
mit-han-lab/llm-awq | A tool for efficient and accurate weight quantization in large language models | 2,517 |
linkedin/liger-kernel | A collection of optimized kernels for efficient Large Language Model training on distributed computing frameworks | 3,431 |
facebookincubator/aitemplate | A framework that transforms deep neural networks into high-performance GPU-optimized C++ code for efficient inference serving. | 4,561 |
sgl-project/sglang | A framework for serving large language models and vision models with efficient runtime and flexible interface. | 6,082 |