Skywork
Multilingual model
A pre-trained language model developed on 3.2TB of high-quality multilingual and code data for various applications including chatbots, text generation, and math calculations.
Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数,训练数据,评估数据,评估方法。
1k stars
24 watching
110 forks
Language: Python
last commit: 10 months ago llm
Related projects:
Repository | Description | Stars |
---|---|---|
skyworkai/skywork-moe | A high-performance mixture-of-experts model with innovative training techniques for language processing tasks | 126 |
will-singularity/skywork-mm | An empirical study aiming to develop a large language model capable of effectively integrating multiple input modalities | 23 |
orionstarai/orion | A family of large language models designed to handle multilingual text and provide strong performance in various tasks such as chat, long context, and retrieval augmented generation. | 789 |
skyworkaigc/skytext-chinese-gpt3 | An AI-powered text generation model trained on Chinese data to perform various tasks such as conversation, translation, and content creation. | 418 |
eleutherai/polyglot | Large language models designed to perform well in multiple languages and address performance issues with current multilingual models. | 476 |
pleisto/yuren-baichuan-7b | A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 73 |
openai/finetune-transformer-lm | This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. | 2,167 |
csuhan/onellm | A framework for training and fine-tuning multimodal language models on various data types | 601 |
mbzuai-oryx/groundinglmm | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations | 797 |
yunwentechnology/unilm | This project provides pre-trained models and tools for natural language understanding (NLU) and generation (NLG) tasks in Chinese. | 439 |
bilibili/index-1.9b | A lightweight, multilingual language model with a long context length | 920 |
01-ai/yi | A series of large language models trained from scratch to excel in multiple NLP tasks | 7,743 |
deeplangai/lingowhale-8b | An open bilingual LLM developed using the LingoWhale model, trained on a large dataset of high-quality middle English text, and fine-tuned for specific tasks such as conversation generation. | 134 |
lyuchenyang/macaw-llm | A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation | 1,568 |
langboat/mengzi3 | An 8B and 13B language model based on the Llama architecture with multilingual capabilities. | 2,031 |