Skywork-MoE
MoE model
A high-performance mixture-of-experts model with innovative training techniques for language processing tasks
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
126 stars
7 watching
7 forks
last commit: 8 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| A pre-trained language model developed on 3.2TB of high-quality multilingual and code data for various applications including chatbots, text generation, and math calculations. | 1,228 |
| An empirical study aiming to develop a large language model capable of effectively integrating multiple input modalities | 23 |
| A large vision-language model using a mixture-of-experts architecture to improve performance on multi-modal learning tasks | 2,023 |
| A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation | 182 |
| Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding. | 36 |
| Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. | 37 |
| Large language models designed to perform well in multiple languages and address performance issues with current multilingual models. | 476 |
| A repository of pre-trained language models for various tasks and domains. | 121 |
| A collection of lightweight state-of-the-art language models designed to support multilinguality, coding, and reasoning tasks on constrained resources. | 232 |
| A toolkit for training neural network language models | 14 |
| A large language model with improved efficiency and performance compared to similar models | 1,024 |
| An implementation of a neural network model for character-level language modeling. | 50 |
| A series of large language models trained from scratch to excel in multiple NLP tasks | 7,743 |
| Improves pre-trained Chinese language models by incorporating a correction task to alleviate inconsistency issues with downstream tasks | 646 |
| This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. | 2,167 |