ua-gec

Ukrainian text correction dataset

A collection of annotated data and tools for improving the grammar and fluency of Ukrainian texts.

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

GitHub

255 stars

13 watching

22 forks

Language: Macaulay2

last commit: over 2 years ago

Linked from 1 awesome list

corpuscorpus-datacorpus-toolsdatasetgecgrammatical-error-correctionnatural-language-processingnlp-datasetsukrainian-language

ua-gec-dataset.grammarly.ai/

Backlinks from these awesome lists:

osyvokon/awesome-ukrainian-nlp

Related projects:

Repository	Description	Stars
fido-ai/ua-datasets	Provides a collection of datasets for natural language processing in Ukrainian.	57
proger/uk4b	Develops pretraining and finetuning techniques for language models using metadata-conditioned text generation	18
universaldependencies/ud_ukrainian-iu	A dataset of annotated text in Ukrainian with standardized formatting and annotation guidelines.	27
brown-uk/corpus	Creating a balanced corpus of modern Ukrainian language with 1 million words, based on the Brown Corpus model.	110
khrystyna-skopyk/ukr_spell_check	Spelling correction system for the Ukrainian language using noisy channel model	3
pkuchmiichuk/ua-coref	A dataset and tools for coreference resolution in Ukrainian language using OntoNotes 5.0 data and machine translation models.	7
thu-coai/cdial-gpt	A large-scale Chinese conversation dataset and pre-trained dialog models for text generation	1,799
brown-uk/nlp_uk	Demonstrates NLP API from LanguageTool for Ukrainian language using Groovy	72
universaldependencies/ud_galician-ctg	This is a collection of annotated text data for the Galician language.	1
lang-uk/ner-uk	A Ukrainian NER corpus and annotation dataset for training and evaluating named entity recognition models.	90
amakukha/stemmers_ukrainian	A novel stemmer for the Ukrainian language trained with AI	28
kateryna-bobrovnyk/ukr-twi-corpus	A collection of Ukrainian Twitter texts for linguistic analysis and research	15
nytud/hucola	A collection of 9,076 annotated sentences in Hungarian to evaluate linguistic acceptability and grammaticality	1
irakli97/frequency_dictionary_ge_363_202	A merged dataset of Georgian words with frequency information	2