alpaca-chinese-dataset
Chinese prompt dataset
A dataset for training and fine-tuning large language models on Chinese text prompts.
alpaca中文指令微调数据集
390 stars
7 watching
25 forks
last commit: over 1 year ago alpacachatglmllm
Related projects:
Repository | Description | Stars |
---|---|---|
lc1332/chinese-alpaca-lora | Develops and maintains a Chinese language model finetuned on LLaMA, used for text generation and summarization tasks. | 711 |
airaria/visual-chinese-llama-alpaca | Develops a multimodal Chinese language model with visual capabilities | 424 |
hikariming/chat-dataset-baseline | Provides a resource library for training Chinese conversation models with pre-processed datasets and a framework for fine-tuning the models | 1,157 |
icip-cas/chatalpaca | A dataset of multi-turn conversations between users and AI models. | 164 |
brightmart/xlnet_zh | Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks | 230 |
gururise/alpacadatacleaned | A cleaned and curated version of an Alpaca dataset used to train a large language model | 1,516 |
pointnetwork/point-alpaca | Recreated weights from Stanford Alpaca model fine-tuned for specific task | 406 |
crownpku/small-chinese-corpus | A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering. | 531 |
matbahasa/talpco | A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research. | 49 |
ntunlplab/traditional-chinese-alpaca | A research project that develops a Traditional-Chinese instruction-following language model using Alpaca as a basis. | 134 |
aisegmentcn/matting_human_datasets | A large dataset of human matting images and corresponding results for training person segmentation models. | 610 |
km1994/llmsninestorydemontower | Exploring various LLMs and their applications in natural language processing and related areas | 1,798 |
cluebenchmark/electra | Trains and evaluates a Chinese language model using adversarial training on a large corpus. | 140 |
hit-scir/chinese-mixtral-8x7b | An implementation of a large language model for Chinese text processing, focusing on MoE (Multi-Headed Attention) architecture and incorporating a vast vocabulary. | 641 |
cluebenchmark/cluecorpus2020 | A large-scale pre-training corpus for Chinese language models | 925 |