gpt-neox

Language model trainer

Provides a framework for training large-scale language models on GPUs with advanced features and optimizations.

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

GitHub

7k stars
125 watching
1k forks
Language: Python
last commit: about 1 month ago
Linked from 1 awesome list

deepspeed-librarygpt-3language-modeltransformers

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
nvidia/megatron-lm A framework for training large language models using scalable and optimized GPU techniques 10,804
karpathy/mingpt A minimal PyTorch implementation of a transformer-based language model 20,474
microsoft/megatron-deepspeed Research tool for training large transformer language models at scale 1,926
facebookresearch/metaseq A codebase for working with Open Pre-trained Transformers, enabling deployment and fine-tuning of transformer models on various platforms. 6,519
opennmt/ctranslate2 A high-performance inference engine for transformer models 3,467
openai/gpt-2 A repository providing code and models for research into language modeling and multitask learning 22,644
eleutherai/pythia Analyzing knowledge development and evolution in large language models during training 2,309
autogptq/autogptq A package for optimizing large language models for efficient inference on GPUs and other hardware platforms. 4,560
bigscience-workshop/megatron-deepspeed A collection of tools and scripts for training large transformer language models at scale 1,342
labmlai/annotated_deep_learning_paper_implementations Implementations of various deep learning algorithms and techniques with accompanying documentation 57,177
google-deepmind/mctx An open-source library providing efficient implementations of search algorithms for reinforcement learning 2,377
ther1d/shell_gpt A command-line tool using AI-powered language models to generate shell commands and code snippets 9,933
google-research/vision_transformer Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax 10,620
carperai/trlx A framework for distributed reinforcement learning of large language models with human feedback 4,537
huggingface/peft An efficient method for fine-tuning large pre-trained models by adapting only a small fraction of their parameters 16,699