deepsparse

Inference runtime

A sparsity-aware deep learning inference runtime for CPUs that optimizes neural network performance on CPU hardware.

Sparsity-aware deep learning inference runtime for CPUs

GitHub

3k stars
57 watching
173 forks
Language: Python
last commit: 4 months ago
Linked from 1 awesome list

computer-visioncpusdeepsparseinferencellm-inferencemachinelearningnlpobject-detectiononnxperformancepretrained-modelspruningquantizationsparsification

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
neuralmagic/sparseml Enables the creation of smaller neural network models through efficient pruning and quantization techniques 2,071
intel/neural-compressor Tools and techniques for optimizing large language models on various frameworks and hardware platforms. 2,226
microsoft/deepspeed A deep learning optimization library that makes distributed training and inference easy, efficient, and effective. 35,463
mit-han-lab/llm-awq A tool for efficient and accurate weight quantization in large language models 2,517
labmlai/annotated_deep_learning_paper_implementations Implementations of various deep learning algorithms and techniques with accompanying documentation 56,215
confident-ai/deepeval A framework for evaluating large language models 3,669
ludwig-ai/ludwig A low-code framework for building custom deep learning models and neural networks 11,189
oxford-cs-deepnlp-2017/lectures An open-source repository containing lecture slides and course materials for an advanced natural language processing course. 15,683
deepseek-ai/deepseek-v2 A high-performance mixture-of-experts language model with strong performance and efficient inference capabilities. 3,590
fminference/flexllmgen Generates large language model outputs in high-throughput mode on single GPUs 9,192
google-deepmind/deepmind-research Provides implementations and illustrative code to accompany DeepMind research publications 13,250
microsoft/lightgbm A high-performance gradient boosting framework for machine learning tasks 16,694
modeltc/lightllm An LLM inference and serving framework providing a lightweight design, scalability, and high-speed performance for large language models. 2,609
optimalscale/lmflow A toolkit for finetuning large language models and providing efficient inference capabilities 8,273
lyogavin/airllm A Python library that optimizes inference memory usage for large language models on limited GPU resources. 5,259