idn-tagged-corpus
Indonesian Corpus
A manually tagged Indonesian language corpus in tab-separated file format
Indonesian Manually Tagged Corpus
88 stars
7 watching
26 forks
last commit: over 2 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
famrashel/idn-treebank | A manually tagged Indonesian corpus consisting of parse-trees from sentences. | 36 |
louisowen6/nlp_bahasa_resources | A curated collection of NLP datasets and resources for Bahasa Indonesia | 489 |
poltextlab/hunempoli_corpus | A manually annotated corpus for training and testing machine learning models of Aspect Based Sentiment Analysis (ABSA) in Hungarian language. | 0 |
lantip/baku-tidak-baku | A repository of linguistic data for Indonesian words categorized as either standard or non-standard | 29 |
j-min/korean-parallel-corpora | A collection of parallel Korean texts used for language processing and machine learning research | 12 |
kmkurn/id-nlp-resource | A collection of annotated NLP resources for the Indonesian language | 279 |
ukrainian-to-english-corpora/folktale_corpus | A collection of Ukrainian folktales translated into English for linguistic and literary research purposes. | 0 |
bertez/corpora | A collection of Galician language data in JSON format. | 2 |
elte-dh/regenykorpusz | A large corpus of Hungarian novels with annotated texts and metadata, developed by the Department of Digital Humanities at Eötvös Loránd University. | 4 |
galuhsahid/indonesian-word-embedding | Demonstrates word embedding in Indonesian language using pre-trained Word2vec models | 20 |
ans-4175/peta-indonesia-geojson | Creates an Indonesia map with province codes | 73 |
wisn/jargon-pemrograman-fungsional | Provides a glossary of terms and explanations for functional programming concepts in a simple and accessible way. | 70 |
igobronidze/hrs_training_data | Training data for a handwritten recognition system | 20 |
nytud/panmorph | Harmonized tagset and annotation scheme for Hungarian morphological analysers | 4 |
atik-05/bangla_datasets_absa | A collection of pre-processed datasets in Bangla language for natural language processing tasks | 0 |