Skywork

Multilingual model

A pre-trained language model developed on 3.2TB of high-quality multilingual and code data for various applications including chatbots, text generation, and math calculations.

Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数，训练数据，评估数据，评估方法。

GitHub

1k stars

24 watching

110 forks

Language: Python

last commit: over 1 year ago

llm

Related projects:

Repository	Description	Stars
skyworkai/skywork-moe	A high-performance mixture-of-experts model with innovative training techniques for language processing tasks	126
will-singularity/skywork-mm	An empirical study aiming to develop a large language model capable of effectively integrating multiple input modalities	23
orionstarai/orion	A family of large language models designed to handle multilingual text and provide strong performance in various tasks such as chat, long context, and retrieval augmented generation.	789
skyworkaigc/skytext-chinese-gpt3	An AI-powered text generation model trained on Chinese data to perform various tasks such as conversation, translation, and content creation.	418
eleutherai/polyglot	Large language models designed to perform well in multiple languages and address performance issues with current multilingual models.	476
pleisto/yuren-baichuan-7b	A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks	73
openai/finetune-transformer-lm	This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture.	2,167
csuhan/onellm	A framework for training and fine-tuning multimodal language models on various data types	601
mbzuai-oryx/groundinglmm	An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations	797
yunwentechnology/unilm	This project provides pre-trained models and tools for natural language understanding (NLU) and generation (NLG) tasks in Chinese.	439
bilibili/index-1.9b	A lightweight, multilingual language model with a long context length	920
01-ai/yi	A series of large language models trained from scratch to excel in multiple NLP tasks	7,743
deeplangai/lingowhale-8b	An open bilingual LLM developed using the LingoWhale model, trained on a large dataset of high-quality middle English text, and fine-tuned for specific tasks such as conversation generation.	134
lyuchenyang/macaw-llm	A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation	1,568
langboat/mengzi3	An 8B and 13B language model based on the Llama architecture with multilingual capabilities.	2,031