deepsparse

Inference runtime

A sparsity-aware deep learning inference runtime for CPUs that optimizes neural network performance on CPU hardware.

Sparsity-aware deep learning inference runtime for CPUs

GitHub

3k stars

57 watching

175 forks

Language: Python

last commit: about 1 year ago

Linked from 1 awesome list

computer-visioncpusdeepsparseinferencellm-inferencemachinelearningnlpobject-detectiononnxperformancepretrained-modelspruningquantizationsparsification

Screenshot of neuralmagic/deepsparse website

neuralmagic.com/deepsparse/

Backlinks from these awesome lists:

ethicalml/awesome-production-machine-learning

Related projects:

Repository	Description	Stars
neuralmagic/sparseml	Enables the creation of smaller neural network models through efficient pruning and quantization techniques	2,083
intel/neural-compressor	Tools and techniques for optimizing large language models on various frameworks and hardware platforms.	2,257
microsoft/deepspeed	A deep learning optimization library that simplifies distributed training and inference on modern computing hardware.	35,863
mit-han-lab/llm-awq	An open-source software project that enables efficient and accurate low-bit weight quantization for large language models.	2,593
labmlai/annotated_deep_learning_paper_implementations	Implementations of various deep learning algorithms and techniques with accompanying documentation	57,177
confident-ai/deepeval	A framework for evaluating large language models	4,003
ludwig-ai/ludwig	A low-code framework for building custom deep learning models and neural networks	11,236
oxford-cs-deepnlp-2017/lectures	An open-source repository containing lecture slides and course materials for an advanced natural language processing course.	15,702
deepseek-ai/deepseek-v2	A high-performance mixture-of-experts language model with strong performance and efficient inference capabilities.	3,758
fminference/flexllmgen	Generates large language model outputs in high-throughput mode on single GPUs	9,236
google-deepmind/deepmind-research	Provides implementations and illustrative code to accompany DeepMind research publications	13,329
microsoft/lightgbm	A high-performance gradient boosting framework for machine learning tasks	16,769
modeltc/lightllm	A Python-based framework for serving large language models with low latency and high scalability.	2,691
optimalscale/lmflow	A toolkit for fine-tuning and inferring large machine learning models	8,312
lyogavin/airllm	Optimizes large language model inference on limited GPU resources	5,446