alpaca-chinese-dataset

Chinese prompt dataset

A dataset for training and fine-tuning large language models on Chinese text prompts.

alpaca中文指令微调数据集

GitHub

390 stars
7 watching
25 forks
last commit: over 1 year ago
alpacachatglmllm

Related projects:

Repository Description Stars
lc1332/chinese-alpaca-lora Develops and maintains a Chinese language model finetuned on LLaMA, used for text generation and summarization tasks. 711
airaria/visual-chinese-llama-alpaca Develops a multimodal Chinese language model with visual capabilities 424
hikariming/chat-dataset-baseline Provides a resource library for training Chinese conversation models with pre-processed datasets and a framework for fine-tuning the models 1,157
icip-cas/chatalpaca A dataset of multi-turn conversations between users and AI models. 164
brightmart/xlnet_zh Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks 230
gururise/alpacadatacleaned A cleaned and curated version of an Alpaca dataset used to train a large language model 1,516
pointnetwork/point-alpaca Recreated weights from Stanford Alpaca model fine-tuned for specific task 406
crownpku/small-chinese-corpus A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering. 531
matbahasa/talpco A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research. 49
ntunlplab/traditional-chinese-alpaca A research project that develops a Traditional-Chinese instruction-following language model using Alpaca as a basis. 134
aisegmentcn/matting_human_datasets A large dataset of human matting images and corresponding results for training person segmentation models. 610
km1994/llmsninestorydemontower Exploring various LLMs and their applications in natural language processing and related areas 1,798
cluebenchmark/electra Trains and evaluates a Chinese language model using adversarial training on a large corpus. 140
hit-scir/chinese-mixtral-8x7b An implementation of a large language model for Chinese text processing, focusing on MoE (Multi-Headed Attention) architecture and incorporating a vast vocabulary. 641
cluebenchmark/cluecorpus2020 A large-scale pre-training corpus for Chinese language models 925