cmrc2018

Reading dataset

A collection of data for evaluating Chinese machine reading comprehension systems

A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)

GitHub

415 stars
12 watching
87 forks
Language: Python
last commit: over 2 years ago
bertnatural-language-processingquestion-answeringreading-comprehension

Related projects:

Repository Description Stars
ymcui/macbert Improves pre-trained Chinese language models by incorporating a correction task to alleviate inconsistency issues with downstream tasks 645
crownpku/small-chinese-corpus A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering. 531
ymcui/chinese-mixtral Develops and releases Mixtral-based models for natural language processing tasks with a focus on Chinese text generation and understanding 584
michael-wzhu/promptcblue A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain 323
mengtingwan/goodreads Provides code samples and notebooks to download, read, and analyze Goodreads datasets for research purposes. 251
ymcui/chinese-electra Provides pre-trained Chinese language models based on the ELECTRA framework for natural language processing tasks 1,403
ymcui/chinese-mobilebert An implementation of MobileBERT, a pre-trained language model, in Python for NLP tasks. 80
ydli-ai/csl A large-scale dataset for natural language processing tasks focused on Chinese scientific literature, providing tools and benchmarks for NLP research. 568
hit-scir/semeval-2016 A benchmarking dataset and evaluation framework for semantic dependency parsing in Chinese language texts. 135
ymcui/chinese-xlnet Provides pre-trained models for Chinese natural language processing tasks using the XLNet architecture 1,653
techascent/tech.ml.dataset A Clojure library for efficient tabular data processing and analysis 681
felixgithub2017/mmcu Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. 87
ymcui/pert Develops a pre-trained language model to learn semantic knowledge from permuted text without mask labels 354
ymcui/lert A pre-trained language model designed to leverage linguistic features and outperform comparable baselines on Chinese natural language understanding tasks. 202
pratyushmaini/llm_dataset_inference Detects whether a given text sequence is part of the training data used to train a large language model. 23