alpaca-chinese-dataset

Chinese prompt dataset

A dataset for training and fine-tuning large language models on Chinese text prompts.

alpaca中文指令微调数据集

392 stars

7 watching

25 forks

last commit: over 3 years ago

alpacachatglmllm

Related projects:

Repository	Description	Stars
lc1332/chinese-alpaca-lora	Develops and maintains a Chinese language model finetuned on LLaMA, used for text generation and summarization tasks.	711
airaria/visual-chinese-llama-alpaca	Develops a multimodal Chinese language model with visual capabilities	429
hikariming/chat-dataset-baseline	Provides a resource library for training Chinese conversation models with pre-processed datasets and a framework for fine-tuning the models	1,162
icip-cas/chatalpaca	A dataset of multi-turn conversations between users and AI models.	164
brightmart/xlnet_zh	Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks	230
gururise/alpacadatacleaned	A cleaned and curated version of an Alpaca dataset used to train a large language model	1,525
pointnetwork/point-alpaca	Recreated weights from Stanford Alpaca model fine-tuned for specific task	406
crownpku/small-chinese-corpus	A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering.	529
matbahasa/talpco	A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research.	49
ntunlplab/traditional-chinese-alpaca	A research project that develops a Traditional-Chinese instruction-following language model using Alpaca as a basis.	134
aisegmentcn/matting_human_datasets	A large dataset of human matting images and corresponding results for training person segmentation models.	615
km1994/llmsninestorydemontower	Exploring various LLMs and their applications in natural language processing and related areas	1,854
cluebenchmark/electra	Trains and evaluates a Chinese language model using adversarial training on a large corpus.	140
hit-scir/chinese-mixtral-8x7b	An implementation of a large language model for Chinese text processing, focusing on MoE (Multi-Headed Attention) architecture and incorporating a vast vocabulary.	645
cluebenchmark/cluecorpus2020	A large-scale Chinese corpus for pre-training language models.	927