wongnai-corpus
Thai NLP Datasets
A collection of datasets for natural language processing research in Thai, including word segmentation and review rating prediction.
Collection of Wongnai's datasets
76 stars
6 watching
23 forks
last commit: over 5 years ago
Linked from 1 awesome list
datasetsnlpnlp-machine-learningtokenization
Related projects:
Repository | Description | Stars |
---|---|---|
| A curated collection of NLP datasets and resources for Bahasa Indonesia | 496 |
| A curated list of Natural Language Processing datasets used to train and evaluate NLP models. | 919 |
| A deep learning-based project for segmenting Thai text into words and annotating parts of speech with high accuracy. | 41 |
| A Thai language corpus and lexicon repository for natural language processing | 142 |
| A collection of Urdu language datasets for various NLP tasks and applications | 71 |
| A Python package for text processing and linguistic analysis focused on Thai language | 993 |
| Corpus of annotated Thai tweets to analyze toxicity and sentiment | 10 |
| Pre-trained language models for Vietnamese NLP tasks | 671 |
| A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering. | 529 |
| Named Entity Recognition for Thai Text using PyThaiNLP and custom implementation. | 53 |
| An article classification dataset created from news articles scraped from Prachathai.com with multiple benchmark models for multi-label classification | 16 |
| A Thai word tokenization library using Deep Neural Network | 421 |
| Provides pre-trained models for Chinese natural language processing tasks using the XLNet architecture | 1,652 |
| A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research. | 49 |
| A collection of pre-trained language models for natural language processing tasks | 989 |