ukr-twi-corpus
Twitter corpus
A collection of Ukrainian Twitter texts for linguistic analysis and research
A corpus of Ukrainian Twitter texts + instructions for downloading and filtering texts.
15 stars
5 watching
3 forks
Language: Jupyter Notebook
last commit: over 5 years ago
Linked from 1 awesome list
corpuscorpus-generatorcorpus-linguisticsnlppythonpython-scriptpython3scraperukrainianukrainian-language
Related projects:
Repository | Description | Stars |
---|---|---|
ukrainian-to-english-corpora/folktale_corpus | A collection of Ukrainian folktales translated into English for linguistic and literary research purposes. | 0 |
kateryna-bobrovnyk/obscene-ukr | A collection of Ukrainian obscene words and phrases. | 17 |
brown-uk/corpus | Creating a balanced corpus of modern Ukrainian language with 1 million words, based on the Brown Corpus model. | 110 |
amakukha/stemmers_ukrainian | A novel stemmer for the Ukrainian language trained with AI | 28 |
vadno/korkor_pilot | A large annotated corpus of Hungarian text with various linguistic annotations, split into development and test datasets for natural language processing tasks. | 2 |
universaldependencies/ud_ukrainian-iu | A dataset of annotated text in Ukrainian with standardized formatting and annotation guidelines. | 28 |
proger/uk4b | Develops pretraining and finetuning techniques for language models using metadata-conditioned text generation | 18 |
khrystyna-skopyk/ukr_spell_check | Spelling correction system for the Ukrainian language using noisy channel model | 3 |
ysenarath/tweetkit | A Python client for accessing the Twitter API | 14 |
simonlindgren/2wttr | Collects and processes tweets from the Twitter API using Academic access | 20 |
helsinki-nlp/ukrainianlt | A collection of Ukrainian language tools and resources for machine translation, natural language processing, and text translation. | 30 |
lang-uk/ukrainian-abbreviations-dictionary | A dictionary of Ukrainian abbreviations with definitions and comments | 3 |
twitivity/twitter-stream.py | An API client for accessing Twitter's v2 API endpoints to retrieve real-time tweets and other data | 38 |
nytud/hucola | A dataset of Hungarian sentences annotated for their grammatical acceptability. | 1 |
robinhad/kruk | A collection of Ukrainian language models and datasets for natural language processing tasks. | 84 |