CSL

Chinese Scientific Dataset

A large-scale dataset for natural language processing tasks focused on Chinese scientific literature, providing tools and benchmarks for NLP research.

[COLING 2022] CSL: A Large-scale Chinese Scientific Literature Dataset 中文科学文献数据集

GitHub

582 stars
15 watching
57 forks
Language: Python
last commit: over 1 year ago
chinese-nlpdatasetmachine-learningscientific-publications

Related projects:

Repository Description Stars
ys-zong/vl-icl A benchmarking suite for multimodal in-context learning models 31
01-ai/yi A series of large language models trained from scratch to excel in multiple NLP tasks 7,743
vlang/vsl A comprehensive V library for high-performance scientific computations and artificial intelligence. 358
yunwentechnology/unilm This project provides pre-trained models and tools for natural language understanding (NLU) and generation (NLG) tasks in Chinese. 439
ymcui/cmrc2018 A collection of data for evaluating Chinese machine reading comprehension systems 419
crownpku/small-chinese-corpus A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering. 529
scicloj/tablecloth A dataset manipulation library built on top of tech.ml.dataset, providing a simplified API for data processing and analysis. 308
nyu-mll/jiant A toolkit for natural language processing research enabling multitask learning and transfer learning. 1,650
techascent/tech.ml.dataset A Clojure library for efficient tabular data processing and analysis 687
ymcui/chinese-xlnet Provides pre-trained models for Chinese natural language processing tasks using the XLNet architecture 1,652
cisnlp/glotlid A language identification model that supports over 2000 languages and can be used for various NLP tasks. 106
scicloj/scicloj.ml.clj-djl Provides pre-trained machine learning models for natural language processing tasks using Clojure and the clj-djl framework. 0
mirfan899/urdu A collection of Urdu language datasets for various NLP tasks and applications 71
scicloj/scicloj.ml A Clojure machine learning library providing idiomatic and harmonized support for various classification, regression, clustering, and unsupervised models. 220
cstjean/scikitlearn.jl A Julia implementation of popular machine learning algorithms and interfaces. 547