 alpaca-chinese-dataset
 alpaca-chinese-dataset 
 Chinese prompt dataset
 A dataset for training and fine-tuning large language models on Chinese text prompts.
alpaca中文指令微调数据集
392 stars
 7 watching
 25 forks
 
last commit: over 2 years ago   alpacachatglmllm 
 Related projects:
| Repository | Description | Stars | 
|---|---|---|
|  | Develops and maintains a Chinese language model finetuned on LLaMA, used for text generation and summarization tasks. | 711 | 
|  | Develops a multimodal Chinese language model with visual capabilities | 429 | 
|  | Provides a resource library for training Chinese conversation models with pre-processed datasets and a framework for fine-tuning the models | 1,162 | 
|  | A dataset of multi-turn conversations between users and AI models. | 164 | 
|  | Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks | 230 | 
|  | A cleaned and curated version of an Alpaca dataset used to train a large language model | 1,525 | 
|  | Recreated weights from Stanford Alpaca model fine-tuned for specific task | 406 | 
|  | A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering. | 529 | 
|  | A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research. | 49 | 
|  | A research project that develops a Traditional-Chinese instruction-following language model using Alpaca as a basis. | 134 | 
|  | A large dataset of human matting images and corresponding results for training person segmentation models. | 615 | 
|  | Exploring various LLMs and their applications in natural language processing and related areas | 1,854 | 
|  | Trains and evaluates a Chinese language model using adversarial training on a large corpus. | 140 | 
|  | An implementation of a large language model for Chinese text processing, focusing on MoE (Multi-Headed Attention) architecture and incorporating a vast vocabulary. | 645 | 
|  | A large-scale Chinese corpus for pre-training language models. | 927 |