DeepSpeed-MII
Inference accelerator
A Python library designed to accelerate model inference with high-throughput and low latency capabilities
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
2k stars
41 watching
175 forks
Language: Python
last commit: 13 days ago
Linked from 2 awesome lists
deep-learninginferencepytorch
Related projects:
Repository | Description | Stars |
---|---|---|
intel/scikit-learn-intelex | An acceleration toolkit for scikit-learn machine learning algorithms on Intel hardware | 1,227 |
utensor/utensor | A lightweight machine learning inference framework built on Tensorflow optimized for Arm targets. | 1,729 |
xboot/libonnx | A lightweight onnx inference engine for embedded devices with hardware acceleration support | 583 |
megvii-research/tlc | Improves image restoration performance by converting global operations to local ones during inference | 231 |
xilinx/finn | Fast and scalable neural network inference framework for FPGAs. | 747 |
microsoft/archai | Automates the search for optimal neural network configurations in deep learning applications | 467 |
microsoft/megatron-deepspeed | Research tool for training large transformer language models at scale | 1,895 |
mims-harvard/ohmnet | An algorithm for learning feature representations in multi-layer networks | 79 |
torchpipe/torchpipe | A C++ library that provides a multi-instance pipeline parallel framework for accelerating deep learning inference on various hardware accelerators. | 144 |
jgreenemi/parris | Automates the setup and training of machine learning algorithms on remote servers | 316 |
lge-arc-advancedai/auptimizer | Automates model building and deployment process by optimizing hyperparameters and compressing models for edge computing. | 200 |
denosaurs/netsaur | A machine learning library with GPU, CPU, and WASM backends for building neural networks. | 232 |
arthurpaulino/miraiml | An asynchronous engine for continuous and autonomous machine learning | 26 |
intel/neural-compressor | Tools and techniques for optimizing large language models on various frameworks and hardware platforms. | 2,226 |
mlcommons/inference | Measures the performance of deep learning models in various deployment scenarios. | 1,236 |