deepsparse
Inference runtime
A sparsity-aware deep learning inference runtime for CPUs that optimizes neural network performance on CPU hardware.
Sparsity-aware deep learning inference runtime for CPUs
3k stars
57 watching
175 forks
Language: Python
last commit: 6 months ago
Linked from 1 awesome list
computer-visioncpusdeepsparseinferencellm-inferencemachinelearningnlpobject-detectiononnxperformancepretrained-modelspruningquantizationsparsification
Related projects:
Repository | Description | Stars |
---|---|---|
neuralmagic/sparseml | Enables the creation of smaller neural network models through efficient pruning and quantization techniques | 2,083 |
intel/neural-compressor | Tools and techniques for optimizing large language models on various frameworks and hardware platforms. | 2,257 |
microsoft/deepspeed | A deep learning optimization library that simplifies distributed training and inference on modern computing hardware. | 35,863 |
mit-han-lab/llm-awq | An open-source software project that enables efficient and accurate low-bit weight quantization for large language models. | 2,593 |
labmlai/annotated_deep_learning_paper_implementations | Implementations of various deep learning algorithms and techniques with accompanying documentation | 57,177 |
confident-ai/deepeval | A framework for evaluating large language models | 4,003 |
ludwig-ai/ludwig | A low-code framework for building custom deep learning models and neural networks | 11,236 |
oxford-cs-deepnlp-2017/lectures | An open-source repository containing lecture slides and course materials for an advanced natural language processing course. | 15,702 |
deepseek-ai/deepseek-v2 | A high-performance mixture-of-experts language model with strong performance and efficient inference capabilities. | 3,758 |
fminference/flexllmgen | Generates large language model outputs in high-throughput mode on single GPUs | 9,236 |
google-deepmind/deepmind-research | Provides implementations and illustrative code to accompany DeepMind research publications | 13,329 |
microsoft/lightgbm | A high-performance gradient boosting framework for machine learning tasks | 16,769 |
modeltc/lightllm | A Python-based framework for serving large language models with low latency and high scalability. | 2,691 |
optimalscale/lmflow | A toolkit for fine-tuning and inferring large machine learning models | 8,312 |
lyogavin/airllm | Optimizes large language model inference on limited GPU resources | 5,446 |