idn-treebank

Indonesian Sentence Corpus

A manually tagged Indonesian corpus consisting of parse-trees from sentences.

Indonesian Treebank

GitHub

36 stars
2 watching
18 forks
last commit: over 2 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
famrashel/idn-tagged-corpus A manually tagged Indonesian language corpus in tab-separated file format 88
universaldependencies/ud_galician-treegal A treebank for the Galician language with annotated syntactic and morphological features. 6
kmkurn/id-nlp-resource A collection of annotated NLP resources for the Indonesian language 279
louisowen6/nlp_bahasa_resources A curated collection of NLP datasets and resources for Bahasa Indonesia 489
elte-dh/regenykorpusz A large corpus of Hungarian novels with annotated texts and metadata, developed by the Department of Digital Humanities at Eötvös Loránd University. 4
universaldependencies/ud_vietnamese-vtb An annotated corpus of Vietnamese language structure 36
jbaiter/archiscribe-corpus A repository of transcribed 19th century German texts from various sources. 8
kata-ai/indosum Provides a benchmark dataset and tools for training text summarization models in the Indonesian language. 76
galuhsahid/indonesian-word-embedding Demonstrates word embedding in Indonesian language using pre-trained Word2vec models 20
j-min/korean-parallel-corpora A collection of parallel Korean texts used for language processing and machine learning research 12
2ndquadrant/postgres Development trees for collaborative work on PostgreSQL patches and features 6
matbahasa/talpco A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research. 49
lantip/baku-tidak-baku A repository of linguistic data for Indonesian words categorized as either standard or non-standard 29
poltextlab/hunempoli_corpus A manually annotated corpus for training and testing machine learning models of Aspect Based Sentiment Analysis (ABSA) in Hungarian language. 0
valuesimplex/finbert An open-source BERT-based language model pre-trained on financial text data 677