DeepSpeed

Deep Learning Optimizer

A deep learning optimization library that simplifies distributed training and inference on modern computing hardware.

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

GitHub

36k stars
346 watching
4k forks
Language: Python
last commit: about 1 month ago
Linked from 6 awesome lists

billion-parameterscompressiondata-parallelismdeep-learninggpuinferencemachine-learningmixture-of-expertsmodel-parallelismpipeline-parallelismpytorchtrillion-parameterszero

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
sjtu-ipads/powerinfer An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs 8,011
microsoft/deepspeed-mii A Python library designed to accelerate model inference with high-throughput and low latency capabilities 1,924
deepseek-ai/deepseek-v2 A high-performance mixture-of-experts language model with strong performance and efficient inference capabilities. 3,758
neuralmagic/deepsparse A sparsity-aware deep learning inference runtime for CPUs that optimizes neural network performance on CPU hardware. 3,052
eleutherai/gpt-neox Provides a framework for training large-scale language models on GPUs with advanced features and optimizations. 6,997
dennybritz/deeplearning-papernotes A collection of notes and summaries on various deep learning research papers, including their topics, techniques, and applications. 4,416
jolibrain/deepdetect A machine learning API and server written in C++ that supports multiple deep learning libraries and provides a flexible interface for building and deploying machine learning models. 2,520
paddlepaddle/paddle A high-performance deep learning framework designed for industrial-scale training and deployment of neural networks. 22,340
coqui-ai/tts A deep learning toolkit for generating human-like speech from text 36,118
paddlepaddle/fastdeploy A toolkit for easy and high-performance deployment of deep learning models on various hardware platforms 3,034
microsoft/megatron-deepspeed Research tool for training large transformer language models at scale 1,926
hpcaitech/colossalai A toolkit for training and deploying large AI models in parallel on distributed computing infrastructure 38,907
internlm/lmdeploy A toolkit for optimizing and serving large language models 4,854
deep-floyd/if A text-to-image synthesis model with a modular design, utilizing a frozen text encoder and cascaded pixel diffusion modules to generate photorealistic images. 7,699
alirezadir/production-level-deep-learning A guide to building production-ready deep learning systems for real-world applications 4,371