corpus

Language corpus

Creating a balanced corpus of modern Ukrainian language with 1 million words, based on the Brown Corpus model.

Браунський корпус української мови

GitHub

110 stars
23 watching
14 forks
Language: Groovy
last commit: 2 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
brown-uk/nlp_uk A tool for natural language processing of Ukrainian text using the LanguageTool API 72
brown-uk/dict_uk Generates a comprehensive POS tag dictionary for Ukrainian language using various linguistic resources 561
kateryna-bobrovnyk/ukr-twi-corpus A collection of Ukrainian Twitter texts for linguistic analysis and research 15
ukrainian-to-english-corpora/folktale_corpus A collection of Ukrainian folktales translated into English for linguistic and literary research purposes. 0
universaldependencies/ud_ukrainian-iu A dataset of annotated text in Ukrainian with standardized formatting and annotation guidelines. 28
lang-uk/ukrainian-abbreviations-dictionary A dictionary of Ukrainian abbreviations with definitions and comments 3
amakukha/stemmers_ukrainian A novel stemmer for the Ukrainian language trained with AI 28
nytud/hucola A dataset of Hungarian sentences annotated for their grammatical acceptability. 1
lang-uk/tone-dict-uk A dictionary of Ukrainian words with tone annotations generated from expert ratings and machine learning models 47
lang-uk/ukrainian-word-stress-dictionary A dictionary of word stresses in the Ukrainian language 19
lang-uk/ner-uk A Ukrainian NER corpus and annotation dataset for training and evaluating named entity recognition models. 90
grammarly/ua-gec A collection of annotated data and tools for improving the grammar and fluency of Ukrainian texts. 255
vadno/korkor_pilot A large annotated corpus of Hungarian text with various linguistic annotations, split into development and test datasets for natural language processing tasks. 2
lang-uk/ukrainian-heteronyms-dictionary A dictionary of words with different pronunciation and/or meanings in Ukrainian 3
christos-c/bible-corpus A multilingual parallel corpus created from translations of the Bible. 176