ua-gec
Ukrainian text correction dataset
A collection of annotated data and tools for improving the grammar and fluency of Ukrainian texts.
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
255 stars
13 watching
22 forks
Language: Macaulay2
last commit: about 1 year ago
Linked from 1 awesome list
corpuscorpus-datacorpus-toolsdatasetgecgrammatical-error-correctionnatural-language-processingnlp-datasetsukrainian-language
Related projects:
Repository | Description | Stars |
---|---|---|
| Provides a collection of datasets for natural language processing in Ukrainian. | 57 |
| Develops pretraining and finetuning techniques for language models using metadata-conditioned text generation | 18 |
| A dataset of annotated text in Ukrainian with standardized formatting and annotation guidelines. | 27 |
| Creating a balanced corpus of modern Ukrainian language with 1 million words, based on the Brown Corpus model. | 110 |
| Spelling correction system for the Ukrainian language using noisy channel model | 3 |
| A dataset and tools for coreference resolution in Ukrainian language using OntoNotes 5.0 data and machine translation models. | 7 |
| A large-scale Chinese conversation dataset and pre-trained dialog models for text generation | 1,799 |
| Demonstrates NLP API from LanguageTool for Ukrainian language using Groovy | 72 |
| This is a collection of annotated text data for the Galician language. | 1 |
| A Ukrainian NER corpus and annotation dataset for training and evaluating named entity recognition models. | 90 |
| A novel stemmer for the Ukrainian language trained with AI | 28 |
| A collection of Ukrainian Twitter texts for linguistic analysis and research | 15 |
| A collection of 9,076 annotated sentences in Hungarian to evaluate linguistic acceptability and grammaticality | 1 |
| A merged dataset of Georgian words with frequency information | 2 |