Skywork-MoE

MoE model

A high-performance mixture-of-experts model with innovative training techniques for language processing tasks

Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models

GitHub

126 stars
7 watching
6 forks
last commit: 6 months ago

Related projects:

Repository Description Stars
skyworkai/skywork A pre-trained language model developed on 3.2TB of high-quality multilingual and code data for various applications including chatbots, text generation, and math calculations. 1,223
will-singularity/skywork-mm An empirical study aiming to develop a large language model capable of effectively integrating multiple input modalities 23
pku-yuangroup/moe-llava Develops a neural network architecture for multi-modal learning with large vision-language models 1,980
ieit-yuan/yuan2.0-m32 A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation 180
xverse-ai/xverse-moe-a4.2b Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding. 36
xverse-ai/xverse-moe-a36b Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. 36
eleutherai/polyglot Large language models designed to perform well in multiple languages and address performance issues with current multilingual models. 475
baai-wudao/model A repository of pre-trained language models for various tasks and domains. 121
ibm-granite/granite-3.0-language-models A collection of lightweight state-of-the-art language models designed to support multilinguality, coding, and reasoning tasks on constrained resources. 214
moses-smt/nplm A toolkit for training neural network language models 14
deepseek-ai/deepseek-moe A large language model with improved efficiency and performance compared to similar models 1,006
elanmart/psmm An implementation of a neural network model for character-level language modeling. 50
01-ai/yi A series of large language models trained from scratch to excel in multiple NLP tasks 7,719
ymcui/macbert Improves pre-trained Chinese language models by incorporating a correction task to alleviate inconsistency issues with downstream tasks 645
openai/finetune-transformer-lm This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. 2,160