Skywork
Multilingual model
A pre-trained language model developed on 3.2TB of high-quality multilingual and code data for various applications including chatbots, text generation, and math calculations.
Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数,训练数据,评估数据,评估方法。
1k stars
24 watching
110 forks
Language: Python
last commit: 12 months ago llm
Related projects:
Repository | Description | Stars |
---|---|---|
| A high-performance mixture-of-experts model with innovative training techniques for language processing tasks | 126 |
| An empirical study aiming to develop a large language model capable of effectively integrating multiple input modalities | 23 |
| A family of large language models designed to handle multilingual text and provide strong performance in various tasks such as chat, long context, and retrieval augmented generation. | 789 |
| An AI-powered text generation model trained on Chinese data to perform various tasks such as conversation, translation, and content creation. | 418 |
| Large language models designed to perform well in multiple languages and address performance issues with current multilingual models. | 476 |
| A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 73 |
| This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. | 2,167 |
| A framework for training and fine-tuning multimodal language models on various data types | 601 |
| An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations | 797 |
| This project provides pre-trained models and tools for natural language understanding (NLU) and generation (NLG) tasks in Chinese. | 439 |
| A lightweight, multilingual language model with a long context length | 920 |
| A series of large language models trained from scratch to excel in multiple NLP tasks | 7,743 |
| An open bilingual LLM developed using the LingoWhale model, trained on a large dataset of high-quality middle English text, and fine-tuned for specific tasks such as conversation generation. | 134 |
| A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation | 1,568 |
| An 8B and 13B language model based on the Llama architecture with multilingual capabilities. | 2,031 |