 wongnai-corpus
 wongnai-corpus 
 Thai NLP Datasets
 A collection of datasets for natural language processing research in Thai, including word segmentation and review rating prediction.
Collection of Wongnai's datasets
76 stars
 6 watching
 23 forks
 
last commit: about 6 years ago 
Linked from   1 awesome list  
  datasetsnlpnlp-machine-learningtokenization 
 Related projects:
| Repository | Description | Stars | 
|---|---|---|
|  | A curated collection of NLP datasets and resources for Bahasa Indonesia | 496 | 
|  | A curated list of Natural Language Processing datasets used to train and evaluate NLP models. | 919 | 
|  | A deep learning-based project for segmenting Thai text into words and annotating parts of speech with high accuracy. | 41 | 
|  | A Thai language corpus and lexicon repository for natural language processing | 142 | 
|  | A collection of Urdu language datasets for various NLP tasks and applications | 71 | 
|  | A Python package for text processing and linguistic analysis focused on Thai language | 993 | 
|  | Corpus of annotated Thai tweets to analyze toxicity and sentiment | 10 | 
|  | Pre-trained language models for Vietnamese NLP tasks | 671 | 
|  | A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering. | 529 | 
|  | Named Entity Recognition for Thai Text using PyThaiNLP and custom implementation. | 53 | 
|  | An article classification dataset created from news articles scraped from Prachathai.com with multiple benchmark models for multi-label classification | 16 | 
|  | A Thai word tokenization library using Deep Neural Network | 421 | 
|  | Provides pre-trained models for Chinese natural language processing tasks using the XLNet architecture | 1,652 | 
|  | A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research. | 49 | 
|  | A collection of pre-trained language models for natural language processing tasks | 989 |