gpt-neox
Language model trainer
Provides a framework for training large-scale language models on GPUs with advanced features and optimizations.
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
7k stars
124 watching
1k forks
Language: Python
last commit: 8 days ago
Linked from 1 awesome list
deepspeed-librarygpt-3language-modeltransformers
Related projects:
Repository | Description | Stars |
---|---|---|
nvidia/megatron-lm | A framework for training large language models using scalable and optimized GPU techniques | 10,623 |
karpathy/mingpt | A minimal PyTorch implementation of a transformer-based language model | 20,175 |
microsoft/megatron-deepspeed | Research tool for training large transformer language models at scale | 1,895 |
facebookresearch/metaseq | A codebase for working with Open Pre-trained Transformers, enabling deployment and fine-tuning of transformer models on various platforms. | 6,517 |
opennmt/ctranslate2 | A high-performance library for efficient inference with Transformer models on CPUs and GPUs. | 3,404 |
openai/gpt-2 | A repository providing code and models for research into language modeling and multitask learning | 22,559 |
eleutherai/pythia | Analyzing knowledge development and evolution in large language models during training | 2,280 |
autogptq/autogptq | A package for efficient inference and training of large language models using quantization techniques | 4,501 |
bigscience-workshop/megatron-deepspeed | A collection of tools and scripts for training large transformer language models at scale | 1,335 |
labmlai/annotated_deep_learning_paper_implementations | Implementations of various deep learning algorithms and techniques with accompanying documentation | 56,215 |
google-deepmind/mctx | An open-source library providing efficient implementations of search algorithms for reinforcement learning | 2,356 |
ther1d/shell_gpt | A command-line tool using AI-powered language models to generate shell commands and code snippets | 9,672 |
google-research/vision_transformer | Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax | 10,502 |
carperai/trlx | A framework for distributed reinforcement learning of large language models with human feedback | 4,502 |
huggingface/peft | An efficient method for fine-tuning large pre-trained models by adapting only a small fraction of their parameters | 16,505 |