ukr-twi-corpus

Twitter corpus

A collection of Ukrainian Twitter texts for linguistic analysis and research

A corpus of Ukrainian Twitter texts + instructions for downloading and filtering texts.

GitHub

15 stars
5 watching
3 forks
Language: Jupyter Notebook
last commit: over 5 years ago
Linked from 1 awesome list

corpuscorpus-generatorcorpus-linguisticsnlppythonpython-scriptpython3scraperukrainianukrainian-language

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
ukrainian-to-english-corpora/folktale_corpus A collection of Ukrainian folktales translated into English for linguistic and literary research purposes. 0
kateryna-bobrovnyk/obscene-ukr A collection of Ukrainian obscene words and phrases. 17
brown-uk/corpus Creating a balanced corpus of modern Ukrainian language with 1 million words, based on the Brown Corpus model. 110
amakukha/stemmers_ukrainian A novel stemmer for the Ukrainian language trained with AI 28
vadno/korkor_pilot A large annotated corpus of Hungarian text with various linguistic annotations, split into development and test datasets for natural language processing tasks. 2
universaldependencies/ud_ukrainian-iu A dataset of annotated text in Ukrainian with standardized formatting and annotation guidelines. 28
proger/uk4b Develops pretraining and finetuning techniques for language models using metadata-conditioned text generation 18
khrystyna-skopyk/ukr_spell_check Spelling correction system for the Ukrainian language using noisy channel model 3
ysenarath/tweetkit A Python client for accessing the Twitter API 14
simonlindgren/2wttr Collects and processes tweets from the Twitter API using Academic access 20
helsinki-nlp/ukrainianlt A collection of Ukrainian language tools and resources for machine translation, natural language processing, and text translation. 30
lang-uk/ukrainian-abbreviations-dictionary A dictionary of Ukrainian abbreviations with definitions and comments 3
twitivity/twitter-stream.py An API client for accessing Twitter's v2 API endpoints to retrieve real-time tweets and other data 38
nytud/hucola A dataset of Hungarian sentences annotated for their grammatical acceptability. 1
robinhad/kruk A collection of Ukrainian language models and datasets for natural language processing tasks. 84