deepsparse

Inference runtime

A sparsity-aware deep learning inference runtime for CPUs that optimizes neural network performance on CPU hardware.

Sparsity-aware deep learning inference runtime for CPUs

GitHub

3k stars
57 watching
175 forks
Language: Python
last commit: 6 months ago
Linked from 1 awesome list

computer-visioncpusdeepsparseinferencellm-inferencemachinelearningnlpobject-detectiononnxperformancepretrained-modelspruningquantizationsparsification

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
neuralmagic/sparseml Enables the creation of smaller neural network models through efficient pruning and quantization techniques 2,083
intel/neural-compressor Tools and techniques for optimizing large language models on various frameworks and hardware platforms. 2,257
microsoft/deepspeed A deep learning optimization library that simplifies distributed training and inference on modern computing hardware. 35,863
mit-han-lab/llm-awq An open-source software project that enables efficient and accurate low-bit weight quantization for large language models. 2,593
labmlai/annotated_deep_learning_paper_implementations Implementations of various deep learning algorithms and techniques with accompanying documentation 57,177
confident-ai/deepeval A framework for evaluating large language models 4,003
ludwig-ai/ludwig A low-code framework for building custom deep learning models and neural networks 11,236
oxford-cs-deepnlp-2017/lectures An open-source repository containing lecture slides and course materials for an advanced natural language processing course. 15,702
deepseek-ai/deepseek-v2 A high-performance mixture-of-experts language model with strong performance and efficient inference capabilities. 3,758
fminference/flexllmgen Generates large language model outputs in high-throughput mode on single GPUs 9,236
google-deepmind/deepmind-research Provides implementations and illustrative code to accompany DeepMind research publications 13,329
microsoft/lightgbm A high-performance gradient boosting framework for machine learning tasks 16,769
modeltc/lightllm A Python-based framework for serving large language models with low latency and high scalability. 2,691
optimalscale/lmflow A toolkit for fine-tuning and inferring large machine learning models 8,312
lyogavin/airllm Optimizes large language model inference on limited GPU resources 5,446