cmrc2018

Reading dataset

A collection of data for evaluating Chinese machine reading comprehension systems

A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)

GitHub

419 stars

12 watching

87 forks

Language: Python

last commit: about 3 years ago

bertnatural-language-processingquestion-answeringreading-comprehension

ymcui.github.io/cmrc2018/

Related projects:

Repository	Description	Stars
ymcui/macbert	Improves pre-trained Chinese language models by incorporating a correction task to alleviate inconsistency issues with downstream tasks	646
crownpku/small-chinese-corpus	A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering.	529
ymcui/chinese-mixtral	Develops and releases Mixtral-based models for natural language processing tasks with a focus on Chinese text generation and understanding	589
michael-wzhu/promptcblue	A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain	328
mengtingwan/goodreads	Provides code samples and notebooks to download, read, and analyze Goodreads datasets for research purposes.	252
ymcui/chinese-electra	Provides pre-trained Chinese language models based on the ELECTRA framework for natural language processing tasks	1,405
ymcui/chinese-mobilebert	An implementation of MobileBERT, a pre-trained language model, in Python for NLP tasks.	81
ydli-ai/csl	A large-scale dataset for natural language processing tasks focused on Chinese scientific literature, providing tools and benchmarks for NLP research.	582
hit-scir/semeval-2016	A benchmarking dataset and evaluation framework for semantic dependency parsing in Chinese language texts.	135
ymcui/chinese-xlnet	Provides pre-trained models for Chinese natural language processing tasks using the XLNet architecture	1,652
techascent/tech.ml.dataset	A Clojure library for efficient tabular data processing and analysis	687
felixgithub2017/mmcu	Measures the understanding of massive multitask Chinese datasets using large language models	87
ymcui/pert	Develops a pre-trained language model to learn semantic knowledge from permuted text without mask labels	356
ymcui/lert	A pre-trained language model designed to leverage linguistic features and outperform comparable baselines on Chinese natural language understanding tasks.	202
pratyushmaini/llm_dataset_inference	Detects whether a given text sequence is part of the training data used to train a large language model.	23