deepsparse
Inference runtime
A sparsity-aware deep learning inference runtime for CPUs that optimizes neural network performance on CPU hardware.
Sparsity-aware deep learning inference runtime for CPUs
3k stars
57 watching
175 forks
Language: Python
last commit: over 1 year ago
Linked from 1 awesome list
computer-visioncpusdeepsparseinferencellm-inferencemachinelearningnlpobject-detectiononnxperformancepretrained-modelspruningquantizationsparsification
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | Enables the creation of smaller neural network models through efficient pruning and quantization techniques | 2,083 |
| | Tools and techniques for optimizing large language models on various frameworks and hardware platforms. | 2,257 |
| | A deep learning optimization library that simplifies distributed training and inference on modern computing hardware. | 35,863 |
| | An open-source software project that enables efficient and accurate low-bit weight quantization for large language models. | 2,593 |
| | Implementations of various deep learning algorithms and techniques with accompanying documentation | 57,177 |
| | A framework for evaluating large language models | 4,003 |
| | A low-code framework for building custom deep learning models and neural networks | 11,236 |
| | An open-source repository containing lecture slides and course materials for an advanced natural language processing course. | 15,702 |
| | A high-performance mixture-of-experts language model with strong performance and efficient inference capabilities. | 3,758 |
| | Generates large language model outputs in high-throughput mode on single GPUs | 9,236 |
| | Provides implementations and illustrative code to accompany DeepMind research publications | 13,329 |
| | A high-performance gradient boosting framework for machine learning tasks | 16,769 |
| | A Python-based framework for serving large language models with low latency and high scalability. | 2,691 |
| | A toolkit for fine-tuning and inferring large machine learning models | 8,312 |
| | Optimizes large language model inference on limited GPU resources | 5,446 |