corpus
Language corpus
Creating a balanced corpus of modern Ukrainian language with 1 million words, based on the Brown Corpus model.
Браунський корпус української мови
110 stars
23 watching
14 forks
Language: Groovy
last commit: 2 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
brown-uk/nlp_uk | A tool for natural language processing of Ukrainian text using the LanguageTool API | 72 |
brown-uk/dict_uk | Generates a comprehensive POS tag dictionary for Ukrainian language using various linguistic resources | 561 |
kateryna-bobrovnyk/ukr-twi-corpus | A collection of Ukrainian Twitter texts for linguistic analysis and research | 15 |
ukrainian-to-english-corpora/folktale_corpus | A collection of Ukrainian folktales translated into English for linguistic and literary research purposes. | 0 |
universaldependencies/ud_ukrainian-iu | A dataset of annotated text in Ukrainian with standardized formatting and annotation guidelines. | 28 |
lang-uk/ukrainian-abbreviations-dictionary | A dictionary of Ukrainian abbreviations with definitions and comments | 3 |
amakukha/stemmers_ukrainian | A novel stemmer for the Ukrainian language trained with AI | 28 |
nytud/hucola | A dataset of Hungarian sentences annotated for their grammatical acceptability. | 1 |
lang-uk/tone-dict-uk | A dictionary of Ukrainian words with tone annotations generated from expert ratings and machine learning models | 47 |
lang-uk/ukrainian-word-stress-dictionary | A dictionary of word stresses in the Ukrainian language | 19 |
lang-uk/ner-uk | A Ukrainian NER corpus and annotation dataset for training and evaluating named entity recognition models. | 90 |
grammarly/ua-gec | A collection of annotated data and tools for improving the grammar and fluency of Ukrainian texts. | 255 |
vadno/korkor_pilot | A large annotated corpus of Hungarian text with various linguistic annotations, split into development and test datasets for natural language processing tasks. | 2 |
lang-uk/ukrainian-heteronyms-dictionary | A dictionary of words with different pronunciation and/or meanings in Ukrainian | 3 |
christos-c/bible-corpus | A multilingual parallel corpus created from translations of the Bible. | 176 |