deepsparse
Inference runtime
A sparsity-aware deep learning inference runtime for CPUs that optimizes neural network performance on CPU hardware.
Sparsity-aware deep learning inference runtime for CPUs
3k stars
57 watching
175 forks
Language: Python
last commit: 8 months ago
Linked from 1 awesome list
computer-visioncpusdeepsparseinferencellm-inferencemachinelearningnlpobject-detectiononnxperformancepretrained-modelspruningquantizationsparsification
Related projects:
Repository | Description | Stars |
---|---|---|
| Enables the creation of smaller neural network models through efficient pruning and quantization techniques | 2,083 |
| Tools and techniques for optimizing large language models on various frameworks and hardware platforms. | 2,257 |
| A deep learning optimization library that simplifies distributed training and inference on modern computing hardware. | 35,863 |
| An open-source software project that enables efficient and accurate low-bit weight quantization for large language models. | 2,593 |
| Implementations of various deep learning algorithms and techniques with accompanying documentation | 57,177 |
| A framework for evaluating large language models | 4,003 |
| A low-code framework for building custom deep learning models and neural networks | 11,236 |
| An open-source repository containing lecture slides and course materials for an advanced natural language processing course. | 15,702 |
| A high-performance mixture-of-experts language model with strong performance and efficient inference capabilities. | 3,758 |
| Generates large language model outputs in high-throughput mode on single GPUs | 9,236 |
| Provides implementations and illustrative code to accompany DeepMind research publications | 13,329 |
| A high-performance gradient boosting framework for machine learning tasks | 16,769 |
| A Python-based framework for serving large language models with low latency and high scalability. | 2,691 |
| A toolkit for fine-tuning and inferring large machine learning models | 8,312 |
| Optimizes large language model inference on limited GPU resources | 5,446 |