ThaiToxicityTweetCorpus

Toxicity dataset

Corpus of annotated Thai tweets to analyze toxicity and sentiment

GitHub

10 stars
4 watching
3 forks
Language: Jupyter Notebook
last commit: almost 4 years ago

Related projects:

Repository Description Stars
wannaphong/thai-ner A Named Entity Recognition tool for the Thai language. 53
pythainlp/lexicon-thai A Thai language corpus and lexicon repository for natural language processing 141
jagerv3/sentiment_analysis_thai Analyzes sentiment in Thai text using machine learning algorithms and natural language processing techniques. 12
pythainlp/pythainlp A Python package for text processing and linguistic analysis focused on the Thai language. 987
rkcosmos/deepcut A Thai word tokenization library using Deep Neural Network 420
wongnai/wongnai-corpus A collection of datasets for natural language processing research in Thai, including word segmentation and review rating prediction. 76
wittawatj/jtcc A Java library to tokenize Thai text into groups of characters 18
kobkrit/tf-nlp-thai-word-embedding An implementation of a word embedding technique using TensorFlow for Thai language processing 11
kateryna-bobrovnyk/ukr-twi-corpus A collection of Ukrainian Twitter texts for linguistic analysis and research 15
dmulyalin/ttp A template-based text parsing library 349
pythainlp/prachathai-67k An article classification dataset created from news articles scraped from Prachathai.com with multiple benchmark models for multi-label classification 16
krakenai/synthai A deep learning-based project for segmenting Thai text into words and annotating parts of speech with high accuracy. 41
digitalmethodsinitiative/dmi-tcat A toolset for collecting and analyzing tweets from Twitter 367
vchahun/teny Tools and techniques for improving machine translation in resource-constrained environments. 3
tchayintr/thbert A pre-trained BERT model designed to facilitate NLP research and development with limited Thai language resources 6