CSL
Chinese Scientific Dataset
A large-scale dataset for natural language processing tasks focused on Chinese scientific literature, providing tools and benchmarks for NLP research.
[COLING 2022] CSL: A Large-scale Chinese Scientific Literature Dataset 中文科学文献数据集
568 stars
15 watching
58 forks
Language: Python
last commit: over 1 year ago chinese-nlpdatasetmachine-learningscientific-publications
Related projects:
Repository | Description | Stars |
---|---|---|
ys-zong/vl-icl | A benchmarking suite for multimodal in-context learning models | 28 |
01-ai/yi | A series of large language models trained from scratch to excel in multiple NLP tasks | 7,699 |
vlang/vsl | A comprehensive V library for high-performance scientific computations and artificial intelligence. | 355 |
yunwentechnology/unilm | This project provides pre-trained models for natural language understanding and generation tasks using the UniLM architecture. | 438 |
ymcui/cmrc2018 | A collection of data for evaluating Chinese machine reading comprehension systems | 415 |
crownpku/small-chinese-corpus | A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering. | 531 |
scicloj/tablecloth | A dataset manipulation library built on top of tech.ml.dataset, providing a simplified API for data processing and analysis. | 303 |
nyu-mll/jiant | A toolkit for natural language processing research enabling multitask learning and transfer learning. | 1,644 |
techascent/tech.ml.dataset | A Clojure library for efficient tabular data processing and analysis | 681 |
ymcui/chinese-xlnet | Provides pre-trained models for Chinese natural language processing tasks using the XLNet architecture | 1,653 |
cisnlp/glotlid | A language identification model that supports over 2000 languages and can be used for various NLP tasks. | 90 |
scicloj/scicloj.ml.clj-djl | Provides pre-trained machine learning models for natural language processing tasks using Clojure and the clj-djl framework. | 0 |
mirfan899/urdu | A collection of Urdu language datasets for various NLP tasks and applications | 71 |
scicloj/scicloj.ml | A machine learning library built on top of Clojure with a focus on data preprocessing and model creation | 216 |
cstjean/scikitlearn.jl | A Julia implementation of popular machine learning algorithms and interfaces. | 544 |