DeepSpeed-MII
Inference accelerator
A Python library designed to accelerate model inference with high-throughput and low latency capabilities
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
2k stars
42 watching
175 forks
Language: Python
last commit: about 2 months ago
Linked from 2 awesome lists
deep-learninginferencepytorch
Related projects:
Repository | Description | Stars |
---|---|---|
utensor/utensor | A lightweight machine learning inference framework built on Tensorflow optimized for Arm targets. | 1,742 |
xboot/libonnx | An onnx inference engine for embedded devices with hardware acceleration support | 589 |
megvii-research/tlc | Improves image restoration performance by converting global operations to local ones during inference | 231 |
xilinx/finn | Fast and scalable neural network inference framework for FPGAs. | 770 |
microsoft/archai | Automates the search for optimal neural network configurations in deep learning applications | 468 |
microsoft/megatron-deepspeed | Research tool for training large transformer language models at scale | 1,926 |
mims-harvard/ohmnet | An algorithm for learning feature representations in multi-layer networks | 81 |
torchpipe/torchpipe | An open-source framework that enables the deployment and serving of PyTorch models in various acceleration frameworks. | 147 |
jgreenemi/parris | Automates the setup and training of machine learning algorithms on remote servers | 316 |
lge-arc-advancedai/auptimizer | Automates model building and deployment process by optimizing hyperparameters and compressing models for edge computing. | 200 |
denosaurs/netsaur | A machine learning library with GPU, CPU, and WASM backends for building neural networks. | 233 |
arthurpaulino/miraiml | An asynchronous engine for continuous and autonomous machine learning | 26 |
intel/neural-compressor | Tools and techniques for optimizing large language models on various frameworks and hardware platforms. | 2,257 |
mlcommons/inference | Measures the performance of deep learning models in various deployment scenarios. | 1,256 |