gpt-neox

Language model trainer

Provides a framework for training large-scale language models on GPUs with advanced features and optimizations.

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

GitHub

7k stars
124 watching
1k forks
Language: Python
last commit: 8 days ago
Linked from 1 awesome list

deepspeed-librarygpt-3language-modeltransformers

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
nvidia/megatron-lm A framework for training large language models using scalable and optimized GPU techniques 10,623
karpathy/mingpt A minimal PyTorch implementation of a transformer-based language model 20,175
microsoft/megatron-deepspeed Research tool for training large transformer language models at scale 1,895
facebookresearch/metaseq A codebase for working with Open Pre-trained Transformers, enabling deployment and fine-tuning of transformer models on various platforms. 6,517
opennmt/ctranslate2 A high-performance library for efficient inference with Transformer models on CPUs and GPUs. 3,404
openai/gpt-2 A repository providing code and models for research into language modeling and multitask learning 22,559
eleutherai/pythia Analyzing knowledge development and evolution in large language models during training 2,280
autogptq/autogptq A package for efficient inference and training of large language models using quantization techniques 4,501
bigscience-workshop/megatron-deepspeed A collection of tools and scripts for training large transformer language models at scale 1,335
labmlai/annotated_deep_learning_paper_implementations Implementations of various deep learning algorithms and techniques with accompanying documentation 56,215
google-deepmind/mctx An open-source library providing efficient implementations of search algorithms for reinforcement learning 2,356
ther1d/shell_gpt A command-line tool using AI-powered language models to generate shell commands and code snippets 9,672
google-research/vision_transformer Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax 10,502
carperai/trlx A framework for distributed reinforcement learning of large language models with human feedback 4,502
huggingface/peft An efficient method for fine-tuning large pre-trained models by adapting only a small fraction of their parameters 16,505