idn-treebank

Indonesian Sentence Corpus

A manually tagged Indonesian corpus consisting of parse-trees from sentences.

Indonesian Treebank

36 stars

2 watching

18 forks

last commit: about 4 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

keon/awesome-nlp

Related projects:

Repository	Description	Stars
famrashel/idn-tagged-corpus	A manually tagged Indonesian language corpus in tab-separated file format	88
universaldependencies/ud_galician-treegal	A treebank for the Galician language with annotated syntactic and morphological features.	6
kmkurn/id-nlp-resource	A collection of annotated NLP resources for the Indonesian language	279
louisowen6/nlp_bahasa_resources	A curated collection of NLP datasets and resources for Bahasa Indonesia	496
elte-dh/regenykorpusz	A large corpus of Hungarian novels with annotated texts and metadata, developed by the Department of Digital Humanities at Eötvös Loránd University.	4
universaldependencies/ud_vietnamese-vtb	An annotated corpus of Vietnamese language structure	36
jbaiter/archiscribe-corpus	A repository of transcribed 19th century German texts from various sources.	8
kata-ai/indosum	Provides a benchmark dataset and tools for training text summarization models in the Indonesian language.	77
galuhsahid/indonesian-word-embedding	Demonstrates word embedding in Indonesian language using pre-trained Word2vec models	20
j-min/korean-parallel-corpora	A collection of parallel Korean texts used for language processing and machine learning research	12
2ndquadrant/postgres	Development trees for collaborative work on PostgreSQL patches and features	6
matbahasa/talpco	A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research.	49
lantip/baku-tidak-baku	A repository of linguistic data for Indonesian words categorized as either standard or non-standard	29
poltextlab/hunempoli_corpus	A manually annotated corpus for training and testing machine learning models of Aspect Based Sentiment Analysis (ABSA) in Hungarian language.	0
valuesimplex/finbert	An open-source BERT-based language model pre-trained on financial text data	685