wongnai-corpus

Thai NLP Datasets

A collection of datasets for natural language processing research in Thai, including word segmentation and review rating prediction.

Collection of Wongnai's datasets

GitHub

76 stars
6 watching
23 forks
last commit: about 5 years ago
Linked from 1 awesome list

datasetsnlpnlp-machine-learningtokenization

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
louisowen6/nlp_bahasa_resources A curated collection of NLP datasets and resources for Bahasa Indonesia 489
karthikncode/nlp-datasets A curated list of Natural Language Processing datasets used to train and evaluate NLP models. 919
krakenai/synthai A deep learning-based project for segmenting Thai text into words and annotating parts of speech with high accuracy. 41
pythainlp/lexicon-thai A Thai language corpus and lexicon repository for natural language processing 141
mirfan899/urdu A collection of Urdu language datasets for various NLP tasks and applications 71
pythainlp/pythainlp A Python package for text processing and linguistic analysis focused on the Thai language. 987
tmu-nlp/thaitoxicitytweetcorpus Corpus of annotated Thai tweets to analyze toxicity and sentiment 10
vinairesearch/phobert Pre-trained language models for Vietnamese NLP tasks 663
crownpku/small-chinese-corpus A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering. 531
wannaphong/thai-ner A Named Entity Recognition tool for the Thai language. 53
pythainlp/prachathai-67k An article classification dataset created from news articles scraped from Prachathai.com with multiple benchmark models for multi-label classification 16
rkcosmos/deepcut A Thai word tokenization library using Deep Neural Network 420
ymcui/chinese-xlnet Provides pre-trained models for Chinese natural language processing tasks using the XLNet architecture 1,653
matbahasa/talpco A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research. 49
zhuiyitechnology/pretrained-models A collection of pre-trained language models for natural language processing tasks 987