idn-tagged-corpus

Indonesian Corpus

A manually tagged Indonesian language corpus in tab-separated file format

Indonesian Manually Tagged Corpus

GitHub

88 stars
7 watching
26 forks
last commit: over 2 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
famrashel/idn-treebank A manually tagged Indonesian corpus consisting of parse-trees from sentences. 36
louisowen6/nlp_bahasa_resources A curated collection of NLP datasets and resources for Bahasa Indonesia 489
poltextlab/hunempoli_corpus A manually annotated corpus for training and testing machine learning models of Aspect Based Sentiment Analysis (ABSA) in Hungarian language. 0
lantip/baku-tidak-baku A repository of linguistic data for Indonesian words categorized as either standard or non-standard 29
j-min/korean-parallel-corpora A collection of parallel Korean texts used for language processing and machine learning research 12
kmkurn/id-nlp-resource A collection of annotated NLP resources for the Indonesian language 279
ukrainian-to-english-corpora/folktale_corpus A collection of Ukrainian folktales translated into English for linguistic and literary research purposes. 0
bertez/corpora A collection of Galician language data in JSON format. 2
elte-dh/regenykorpusz A large corpus of Hungarian novels with annotated texts and metadata, developed by the Department of Digital Humanities at Eötvös Loránd University. 4
galuhsahid/indonesian-word-embedding Demonstrates word embedding in Indonesian language using pre-trained Word2vec models 20
ans-4175/peta-indonesia-geojson Creates an Indonesia map with province codes 73
wisn/jargon-pemrograman-fungsional Provides a glossary of terms and explanations for functional programming concepts in a simple and accessible way. 70
igobronidze/hrs_training_data Training data for a handwritten recognition system 20
nytud/panmorph Harmonized tagset and annotation scheme for Hungarian morphological analysers 4
atik-05/bangla_datasets_absa A collection of pre-processed datasets in Bangla language for natural language processing tasks 0