idn-treebank
Indonesian Sentence Corpus
A manually tagged Indonesian corpus consisting of parse-trees from sentences.
Indonesian Treebank
36 stars
2 watching
18 forks
last commit: over 2 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
famrashel/idn-tagged-corpus | A manually tagged Indonesian language corpus in tab-separated file format | 88 |
universaldependencies/ud_galician-treegal | A treebank for the Galician language with annotated syntactic and morphological features. | 6 |
kmkurn/id-nlp-resource | A collection of annotated NLP resources for the Indonesian language | 279 |
louisowen6/nlp_bahasa_resources | A curated collection of NLP datasets and resources for Bahasa Indonesia | 489 |
elte-dh/regenykorpusz | A large corpus of Hungarian novels with annotated texts and metadata, developed by the Department of Digital Humanities at Eötvös Loránd University. | 4 |
universaldependencies/ud_vietnamese-vtb | An annotated corpus of Vietnamese language structure | 36 |
jbaiter/archiscribe-corpus | A repository of transcribed 19th century German texts from various sources. | 8 |
kata-ai/indosum | Provides a benchmark dataset and tools for training text summarization models in the Indonesian language. | 76 |
galuhsahid/indonesian-word-embedding | Demonstrates word embedding in Indonesian language using pre-trained Word2vec models | 20 |
j-min/korean-parallel-corpora | A collection of parallel Korean texts used for language processing and machine learning research | 12 |
2ndquadrant/postgres | Development trees for collaborative work on PostgreSQL patches and features | 6 |
matbahasa/talpco | A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research. | 49 |
lantip/baku-tidak-baku | A repository of linguistic data for Indonesian words categorized as either standard or non-standard | 29 |
poltextlab/hunempoli_corpus | A manually annotated corpus for training and testing machine learning models of Aspect Based Sentiment Analysis (ABSA) in Hungarian language. | 0 |
valuesimplex/finbert | An open-source BERT-based language model pre-trained on financial text data | 677 |