DeepSpeed
Deep Learning Optimizer
A deep learning optimization library that simplifies distributed training and inference on modern computing hardware.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
36k stars
346 watching
4k forks
Language: Python
last commit: 2 months ago
Linked from 6 awesome lists
billion-parameterscompressiondata-parallelismdeep-learninggpuinferencemachine-learningmixture-of-expertsmodel-parallelismpipeline-parallelismpytorchtrillion-parameterszero
Related projects:
Repository | Description | Stars |
---|---|---|
| An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs | 8,011 |
| A Python library designed to accelerate model inference with high-throughput and low latency capabilities | 1,924 |
| A high-performance mixture-of-experts language model with strong performance and efficient inference capabilities. | 3,758 |
| A sparsity-aware deep learning inference runtime for CPUs that optimizes neural network performance on CPU hardware. | 3,052 |
| Provides a framework for training large-scale language models on GPUs with advanced features and optimizations. | 6,997 |
| A collection of notes and summaries on various deep learning research papers, including their topics, techniques, and applications. | 4,416 |
| A machine learning API and server written in C++ that supports multiple deep learning libraries and provides a flexible interface for building and deploying machine learning models. | 2,520 |
| A high-performance deep learning framework designed for industrial-scale training and deployment of neural networks. | 22,340 |
| A deep learning toolkit for generating human-like speech from text | 36,118 |
| A toolkit for easy and high-performance deployment of deep learning models on various hardware platforms | 3,034 |
| Research tool for training large transformer language models at scale | 1,926 |
| A toolkit for training and deploying large AI models in parallel on distributed computing infrastructure | 38,907 |
| A toolkit for optimizing and serving large language models | 4,854 |
| A text-to-image synthesis model with a modular design, utilizing a frozen text encoder and cascaded pixel diffusion modules to generate photorealistic images. | 7,699 |
| A guide to building production-ready deep learning systems for real-world applications | 4,371 |