CSL
Chinese Scientific Dataset
A large-scale dataset for natural language processing tasks focused on Chinese scientific literature, providing tools and benchmarks for NLP research.
[COLING 2022] CSL: A Large-scale Chinese Scientific Literature Dataset 中文科学文献数据集
582 stars
15 watching
57 forks
Language: Python
last commit: over 1 year ago chinese-nlpdatasetmachine-learningscientific-publications
Related projects:
Repository | Description | Stars |
---|---|---|
ys-zong/vl-icl | A benchmarking suite for multimodal in-context learning models | 31 |
01-ai/yi | A series of large language models trained from scratch to excel in multiple NLP tasks | 7,743 |
vlang/vsl | A comprehensive V library for high-performance scientific computations and artificial intelligence. | 358 |
yunwentechnology/unilm | This project provides pre-trained models and tools for natural language understanding (NLU) and generation (NLG) tasks in Chinese. | 439 |
ymcui/cmrc2018 | A collection of data for evaluating Chinese machine reading comprehension systems | 419 |
crownpku/small-chinese-corpus | A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering. | 529 |
scicloj/tablecloth | A dataset manipulation library built on top of tech.ml.dataset, providing a simplified API for data processing and analysis. | 308 |
nyu-mll/jiant | A toolkit for natural language processing research enabling multitask learning and transfer learning. | 1,650 |
techascent/tech.ml.dataset | A Clojure library for efficient tabular data processing and analysis | 687 |
ymcui/chinese-xlnet | Provides pre-trained models for Chinese natural language processing tasks using the XLNet architecture | 1,652 |
cisnlp/glotlid | A language identification model that supports over 2000 languages and can be used for various NLP tasks. | 106 |
scicloj/scicloj.ml.clj-djl | Provides pre-trained machine learning models for natural language processing tasks using Clojure and the clj-djl framework. | 0 |
mirfan899/urdu | A collection of Urdu language datasets for various NLP tasks and applications | 71 |
scicloj/scicloj.ml | A Clojure machine learning library providing idiomatic and harmonized support for various classification, regression, clustering, and unsupervised models. | 220 |
cstjean/scikitlearn.jl | A Julia implementation of popular machine learning algorithms and interfaces. | 547 |