Megatron-DeepSpeed

Transformer trainer

Research tool for training large transformer language models at scale

Ongoing research training transformer language models at scale, including: BERT & GPT-2

GitHub

2k stars
25 watching
343 forks
Language: Python
last commit: about 1 month ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
bigscience-workshop/megatron-deepspeed A collection of tools and scripts for training large transformer language models at scale 1,342
german-nlp-group/german-transformer-training Trains German transformer models to improve language understanding 23
matlab-deep-learning/transformer-models An implementation of deep learning transformer models in MATLAB 209
openai/finetune-transformer-lm This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. 2,167
fastnlp/cpt A pre-trained transformer model for natural language understanding and generation tasks in Chinese 482
jsksxs360/how-to-use-transformers A comprehensive guide to using the Transformers library for natural language processing tasks 1,220
rdspring1/pytorch_gbw_lm Trains a large-scale PyTorch language model on the 1-Billion Word dataset 123
huggingface/nanotron A pretraining framework for large language models using 3D parallelism and scalable training techniques 1,332
jshilong/gpt4roi Training and deploying large language models on computer vision tasks using region-of-interest inputs 517
maxpumperla/elephas Enables distributed deep learning with Keras and Spark for scalable model training 1,574
nlpodyssey/cybertron A Go package providing an easy interface to use pre-trained NLP models from the HuggingFace repository for tasks like text classification and machine translation. 293
marella/ctransformers Provides a unified interface to various transformer models implemented in C/C++ using GGML library 1,823
tongjilibo/bert4torch An implementation of transformer models in PyTorch for natural language processing tasks 1,257
gram-ai/radio-transformer-networks An implementation of a machine learning-based communications system using deep learning techniques. 127
pixart-alpha/pixart-sigma Develops a PyTorch model for 4K text-to-image generation using diffusion transformer 1,711