Gossiping-Chinese-Corpus

Forum dataset

A collection of question-answer pairs extracted from online Chinese forums.

PTT 八卦版問答中文語料

GitHub

236 stars

13 watching

36 forks

Language: Jupyter Notebook

last commit: almost 2 years ago

Linked from 1 awesome list

chatbotchatbot-corpuschinese-chatbotchinese-corpuschinese-datasetchinese-nlpcorpusdatasetdialogpttquestion-answering

Screenshot of zake7749/Gossiping-Chinese-Corpus website

www.kaggle.com/zake7749/pttgossipingcorpus

Backlinks from these awesome lists:

endymecy/awesome-deeplearning-resources

Related projects:

Repository	Description	Stars
candlewill/dialog_corpus	A collection of datasets used to train and improve chatbot systems in both English and Chinese.	2,033
chatopera/insuranceqa-corpus-zh	An insurance industry conversation corpus with pre-processed data for natural language processing and question answering tasks.	1,019
hikariming/chat-dataset-baseline	Provides a resource library for training Chinese conversation models with pre-processed datasets and a framework for fine-tuning the models	1,162
abbey4799/cutegpt	A conversational language model developed to improve understanding of complex instructions and Chinese vocabulary.	62
crownpku/small-chinese-corpus	A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering.	529
thu-coai/cdial-gpt	A large-scale Chinese conversation dataset and pre-trained dialog models for text generation	1,799
suprityoung/zhongjing	Develops a large language model capable of handling complex medical conversations with high accuracy and professionalism.	324
clue-ai/chatyuan	Large language model for dialogue support in multiple languages	1,903
cluebenchmark/cluecorpus2020	A large-scale Chinese corpus for pre-training language models.	927
songys/chatbot_data	Data collection and model development for a conversational AI chatbot focused on emotional wellness support in Korean.	357
aceimnorstuvwxz/dgk_lost_conv	A collection of preprocessed Chinese conversation corpora for use in natural language processing tasks.	1,089
thu-coai/eva	Pre-trained chatbot models for Chinese open-domain dialogue systems	306
clue-ai/chatyuan-7b	An updated version of a large language model designed to improve performance on multiple tasks and datasets	13
wangrongsheng/ivygpt	Develops large language models to support medical diagnoses and provide helpful suggestions	59
cluebenchmark/cluepretrainedmodels	Provides pre-trained models for Chinese language tasks with improved performance and smaller model sizes compared to existing models.	806