prachathai-67k

News dataset

An article classification dataset created from news articles scraped from Prachathai.com with multiple benchmark models for multi-label classification

News Article Corpus from Prachathai.com

GitHub

16 stars
5 watching
10 forks
Language: Jupyter Notebook
last commit: over 3 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
pythainlp/wisesight-sentiment A large Thai social media text sentiment dataset with annotated labels 77
pythainlp/lexicon-thai A Thai language corpus and lexicon repository for natural language processing 141
karthikncode/nlp-datasets A curated list of Natural Language Processing datasets used to train and evaluate NLP models. 919
pythainlp/pythainlp A Python package for text processing and linguistic analysis focused on the Thai language. 987
pratyushmaini/llm_dataset_inference Detects whether a given text sequence is part of the training data used to train a large language model. 23
sergioburdisso/pyss3 A Python package implementing an interpretable machine learning model for text classification with visualization tools 336
embodiedgpt/embodiedgpt_pytorch A PyTorch-based toolkit for creating customized multimedia datasets and handling heterogeneous data for training AI models. 340
xiayandi/pytorch_text_classification An implementation of convolutional neural networks for text classification using PyTorch 66
jd-aig/nlp_baai A collection of natural language processing models and tools for collaboration on a joint project between BAAI and JDAI. 252
hanzhenlei767/nlp_learn A comprehensive collection of NLP-related code snippets and notes on various models and techniques, including pre-trained language models and Chinese text processing methods. 25
felixgwu/img_classification_pk_pytorch A PyTorch project for comparing image classification models and facilitating quick experiment setup 365
sandeep42/anuvada This is an open source PyTorch library providing tools and models to explain the predictions of deep neural networks for natural language processing tasks. 19
mirfan899/urdu A collection of Urdu language datasets for various NLP tasks and applications 71
pytorch/data A PyTorch project providing data loading utilities and scalable dataloading solutions 1,133
kaiyangzhou/dassl.pytorch A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision. 1,217