Gossiping-Chinese-Corpus
Forum dataset
A collection of question-answer pairs extracted from online Chinese forums.
PTT 八卦版問答中文語料
238 stars
13 watching
35 forks
Language: Jupyter Notebook
last commit: about 1 month ago
Linked from 1 awesome list
chatbotchatbot-corpuschinese-chatbotchinese-corpuschinese-datasetchinese-nlpcorpusdatasetdialogpttquestion-answering
Related projects:
Repository | Description | Stars |
---|---|---|
candlewill/dialog_corpus | A collection of datasets used to train and improve chatbot systems in both English and Chinese. | 2,033 |
chatopera/insuranceqa-corpus-zh | An insurance industry conversation corpus with pre-processed data for natural language processing and question answering tasks. | 1,020 |
hikariming/chat-dataset-baseline | Provides a resource library for training Chinese conversation models with pre-processed datasets and a framework for fine-tuning the models | 1,157 |
abbey4799/cutegpt | A conversational language model developed to improve understanding of complex instructions and Chinese vocabulary. | 62 |
crownpku/small-chinese-corpus | A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering. | 531 |
thu-coai/cdial-gpt | A large-scale Chinese conversation dataset and pre-trained dialog models for text generation | 1,782 |
suprityoung/zhongjing | Develops a large language model capable of handling complex medical conversations with high accuracy and professionalism. | 316 |
clue-ai/chatyuan | Large language model for dialogue support in multiple languages | 1,902 |
cluebenchmark/cluecorpus2020 | A large-scale pre-training corpus for Chinese language models | 925 |
songys/chatbot_data | Data collection and model development for a conversational AI chatbot focused on emotional wellness support in Korean. | 355 |
aceimnorstuvwxz/dgk_lost_conv | A collection of preprocessed Chinese conversation corpora for use in natural language processing tasks. | 1,088 |
thu-coai/eva | Pre-trained chatbot models for Chinese open-domain dialogue systems | 305 |
clue-ai/chatyuan-7b | An updated version of a large language model designed to improve performance on multiple tasks and datasets | 13 |
wangrongsheng/ivygpt | Develops large language models to support medical diagnoses and provide helpful suggestions | 59 |
cluebenchmark/cluepretrainedmodels | Provides pre-trained models for Chinese language tasks with improved performance and smaller model sizes compared to existing models. | 804 |