regenykorpusz

Novel Corpus

A large corpus of Hungarian novels with annotated texts and metadata, developed by the Department of Digital Humanities at Eötvös Loránd University.

GitHub

4 stars
4 watching
1 forks
last commit: 5 days ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
elte-dh/drama-corpus A comprehensive annotated corpus of Hungarian drama texts, including structural annotations and grammatical features. 1
elte-dh/poetry-corpus A comprehensive poetry corpus with annotated text data in TEI XML format 7
nytud/hucola A dataset of Hungarian sentences annotated for their grammatical acceptability. 1
poltextlab/hunempoli_corpus A manually annotated corpus for training and testing machine learning models of Aspect Based Sentiment Analysis (ABSA) in Hungarian language. 0
vadno/korkor_pilot A large annotated corpus of Hungarian text with various linguistic annotations, split into development and test datasets for natural language processing tasks. 2
ukrainian-to-english-corpora/folktale_corpus A collection of Ukrainian folktales translated into English for linguistic and literary research purposes. 0
bertez/corpora A collection of Galician language data in JSON format. 2
eleutherai/polyglot Large language models designed to perform well in multiple languages and address performance issues with current multilingual models. 475
famrashel/idn-tagged-corpus A manually tagged Indonesian language corpus in tab-separated file format 88
nytud/nytk-nerkor A Hungarian language named entity annotated corpus containing 1 million tokens with morphological annotation layers and various source files. 14
famrashel/idn-treebank A manually tagged Indonesian corpus consisting of parse-trees from sentences. 36
qhungngo/evbcorpus A large-scale bilingual corpus collection for language technology and NLP tasks, containing English-Vietnamese translations and bitexts. 42
nytud/hucopa A dataset of Hungarian translations of English 'cause-and-effect' questions with plausible alternative answers 1
huspacy/huspacy An industrial-strength natural language processing library for Hungarian language text analysis 155
jbaiter/archiscribe-corpus A repository of transcribed 19th century German texts from various sources. 8