ua-gec

Ukrainian text correction dataset

A collection of annotated data and tools for improving the grammar and fluency of Ukrainian texts.

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

GitHub

255 stars
13 watching
22 forks
Language: Macaulay2
last commit: 10 months ago
Linked from 1 awesome list

corpuscorpus-datacorpus-toolsdatasetgecgrammatical-error-correctionnatural-language-processingnlp-datasetsukrainian-language

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
fido-ai/ua-datasets Provides a collection of datasets for natural language processing in Ukrainian. 57
proger/uk4b Develops pretraining and finetuning techniques for language models using metadata-conditioned text generation 18
universaldependencies/ud_ukrainian-iu A dataset of annotated text in Ukrainian with standardized formatting and annotation guidelines. 27
brown-uk/corpus Creating a balanced corpus of modern Ukrainian language with 1 million words, based on the Brown Corpus model. 110
khrystyna-skopyk/ukr_spell_check Spelling correction system for the Ukrainian language using noisy channel model 3
pkuchmiichuk/ua-coref A dataset and tools for coreference resolution in Ukrainian language using OntoNotes 5.0 data and machine translation models. 7
thu-coai/cdial-gpt A large-scale Chinese conversation dataset and pre-trained dialog models for text generation 1,799
brown-uk/nlp_uk Demonstrates NLP API from LanguageTool for Ukrainian language using Groovy 72
universaldependencies/ud_galician-ctg This is a collection of annotated text data for the Galician language. 1
lang-uk/ner-uk A Ukrainian NER corpus and annotation dataset for training and evaluating named entity recognition models. 90
amakukha/stemmers_ukrainian A novel stemmer for the Ukrainian language trained with AI 28
kateryna-bobrovnyk/ukr-twi-corpus A collection of Ukrainian Twitter texts for linguistic analysis and research 15
nytud/hucola A collection of 9,076 annotated sentences in Hungarian to evaluate linguistic acceptability and grammaticality 1
irakli97/frequency_dictionary_ge_363_202 A merged dataset of Georgian words with frequency information 2