Megatron-DeepSpeed
Transformer trainer
A collection of tools and scripts for training large transformer language models at scale
Ongoing research training transformer language models at scale, including: BERT & GPT-2
1k stars
24 watching
220 forks
Language: Python
last commit: 10 months ago Related projects:
Repository | Description | Stars |
---|---|---|
microsoft/megatron-deepspeed | Research tool for training large transformer language models at scale | 1,926 |
german-nlp-group/german-transformer-training | Trains German transformer models to improve language understanding | 23 |
openai/finetune-transformer-lm | This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. | 2,167 |
openbmb/bmtrain | A toolkit for training large models in a distributed manner while keeping code simple and efficient. | 570 |
fastnlp/cpt | A pre-trained transformer model for natural language understanding and generation tasks in Chinese | 482 |
matlab-deep-learning/transformer-models | An implementation of deep learning transformer models in MATLAB | 209 |
jsksxs360/how-to-use-transformers | A comprehensive guide to using the Transformers library for natural language processing tasks | 1,220 |
huggingface/nanotron | A pretraining framework for large language models using 3D parallelism and scalable training techniques | 1,332 |
pytorchbearer/torchbearer | A PyTorch model fitting library designed to simplify the process of training deep learning models. | 636 |
ist-daslab/gptq | An implementation of post-training quantization algorithm for transformer models to reduce memory usage and improve inference speed | 1,964 |
tongjilibo/bert4torch | An implementation of transformer models in PyTorch for natural language processing tasks | 1,257 |
marella/ctransformers | Provides a unified interface to various transformer models implemented in C/C++ using GGML library | 1,823 |
chrislemke/sk-transformers | Provides a collection of reusable data transformation tools | 10 |
maxpumperla/elephas | Enables distributed deep learning with Keras and Spark for scalable model training | 1,574 |
ibrahimsobh/transformers | An implementation of deep neural network architectures, including Transformers, in Python. | 214 |