Skywork-MoE

MoE model

A high-performance mixture-of-experts model with innovative training techniques for language processing tasks

Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models

GitHub

126 stars

7 watching

7 forks

last commit: over 1 year ago

Related projects:

Repository	Description	Stars
skyworkai/skywork	A pre-trained language model developed on 3.2TB of high-quality multilingual and code data for various applications including chatbots, text generation, and math calculations.	1,228
will-singularity/skywork-mm	An empirical study aiming to develop a large language model capable of effectively integrating multiple input modalities	23
pku-yuangroup/moe-llava	A large vision-language model using a mixture-of-experts architecture to improve performance on multi-modal learning tasks	2,023
ieit-yuan/yuan2.0-m32	A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation	182
xverse-ai/xverse-moe-a4.2b	Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding.	36
xverse-ai/xverse-moe-a36b	Develops and publishes large multilingual language models with advanced mixing-of-experts architecture.	37
eleutherai/polyglot	Large language models designed to perform well in multiple languages and address performance issues with current multilingual models.	476
baai-wudao/model	A repository of pre-trained language models for various tasks and domains.	121
ibm-granite/granite-3.0-language-models	A collection of lightweight state-of-the-art language models designed to support multilinguality, coding, and reasoning tasks on constrained resources.	232
moses-smt/nplm	A toolkit for training neural network language models	14
deepseek-ai/deepseek-moe	A large language model with improved efficiency and performance compared to similar models	1,024
elanmart/psmm	An implementation of a neural network model for character-level language modeling.	50
01-ai/yi	A series of large language models trained from scratch to excel in multiple NLP tasks	7,743
ymcui/macbert	Improves pre-trained Chinese language models by incorporating a correction task to alleviate inconsistency issues with downstream tasks	646
openai/finetune-transformer-lm	This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture.	2,167