Skywork-MoE
MoE model
A high-performance mixture-of-experts model with innovative training techniques for language processing tasks
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
126 stars
7 watching
6 forks
last commit: 6 months ago Related projects:
Repository | Description | Stars |
---|---|---|
skyworkai/skywork | A pre-trained language model developed on 3.2TB of high-quality multilingual and code data for various applications including chatbots, text generation, and math calculations. | 1,223 |
will-singularity/skywork-mm | An empirical study aiming to develop a large language model capable of effectively integrating multiple input modalities | 23 |
pku-yuangroup/moe-llava | Develops a neural network architecture for multi-modal learning with large vision-language models | 1,980 |
ieit-yuan/yuan2.0-m32 | A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation | 180 |
xverse-ai/xverse-moe-a4.2b | Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding. | 36 |
xverse-ai/xverse-moe-a36b | Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. | 36 |
eleutherai/polyglot | Large language models designed to perform well in multiple languages and address performance issues with current multilingual models. | 475 |
baai-wudao/model | A repository of pre-trained language models for various tasks and domains. | 121 |
ibm-granite/granite-3.0-language-models | A collection of lightweight state-of-the-art language models designed to support multilinguality, coding, and reasoning tasks on constrained resources. | 214 |
moses-smt/nplm | A toolkit for training neural network language models | 14 |
deepseek-ai/deepseek-moe | A large language model with improved efficiency and performance compared to similar models | 1,006 |
elanmart/psmm | An implementation of a neural network model for character-level language modeling. | 50 |
01-ai/yi | A series of large language models trained from scratch to excel in multiple NLP tasks | 7,719 |
ymcui/macbert | Improves pre-trained Chinese language models by incorporating a correction task to alleviate inconsistency issues with downstream tasks | 645 |
openai/finetune-transformer-lm | This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. | 2,160 |