ukr-twi-corpus
Twitter corpus
A collection of Ukrainian Twitter texts for linguistic analysis and research
A corpus of Ukrainian Twitter texts + instructions for downloading and filtering texts.
15 stars
5 watching
3 forks
Language: Jupyter Notebook
last commit: over 5 years ago
Linked from 1 awesome list
corpuscorpus-generatorcorpus-linguisticsnlppythonpython-scriptpython3scraperukrainianukrainian-language
Related projects:
Repository | Description | Stars |
---|---|---|
| A collection of Ukrainian folktales translated into English for linguistic and literary research purposes. | 0 |
| A collection of Ukrainian obscene words and phrases. | 17 |
| Creating a balanced corpus of modern Ukrainian language with 1 million words, based on the Brown Corpus model. | 110 |
| A novel stemmer for the Ukrainian language trained with AI | 28 |
| A large annotated corpus of Hungarian text with various linguistic annotations, split into development and test datasets for natural language processing tasks. | 2 |
| A dataset of annotated text in Ukrainian with standardized formatting and annotation guidelines. | 27 |
| Develops pretraining and finetuning techniques for language models using metadata-conditioned text generation | 18 |
| Spelling correction system for the Ukrainian language using noisy channel model | 3 |
| A Python client for accessing the Twitter API | 14 |
| Collects and processes tweets from the Twitter API using Academic access | 20 |
| A collection of Ukrainian language tools and resources for machine translation, natural language processing, and text translation. | 30 |
| A dictionary of Ukrainian abbreviations with definitions and comments | 3 |
| An API client for accessing Twitter's v2 API endpoints to retrieve real-time tweets and other data | 38 |
| A collection of 9,076 annotated sentences in Hungarian to evaluate linguistic acceptability and grammaticality | 1 |
| A collection of Ukrainian language models and datasets for natural language processing tasks. | 86 |