idn-treebank
Indonesian Sentence Corpus
A manually tagged Indonesian corpus consisting of parse-trees from sentences.
Indonesian Treebank
36 stars
2 watching
18 forks
last commit: over 2 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| A manually tagged Indonesian language corpus in tab-separated file format | 88 |
| A treebank for the Galician language with annotated syntactic and morphological features. | 6 |
| A collection of annotated NLP resources for the Indonesian language | 279 |
| A curated collection of NLP datasets and resources for Bahasa Indonesia | 496 |
| A large corpus of Hungarian novels with annotated texts and metadata, developed by the Department of Digital Humanities at Eötvös Loránd University. | 4 |
| An annotated corpus of Vietnamese language structure | 36 |
| A repository of transcribed 19th century German texts from various sources. | 8 |
| Provides a benchmark dataset and tools for training text summarization models in the Indonesian language. | 77 |
| Demonstrates word embedding in Indonesian language using pre-trained Word2vec models | 20 |
| A collection of parallel Korean texts used for language processing and machine learning research | 12 |
| Development trees for collaborative work on PostgreSQL patches and features | 6 |
| A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research. | 49 |
| A repository of linguistic data for Indonesian words categorized as either standard or non-standard | 29 |
| A manually annotated corpus for training and testing machine learning models of Aspect Based Sentiment Analysis (ABSA) in Hungarian language. | 0 |
| An open-source BERT-based language model pre-trained on financial text data | 685 |