regenykorpusz

Novel Corpus

A large corpus of Hungarian novels with annotated texts and metadata, developed by the Department of Digital Humanities at Eötvös Loránd University.

GitHub

4 stars
4 watching
1 forks
last commit: 3 days ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
elte-dh/drama-corpus A comprehensive annotated corpus of Hungarian drama texts, including structural annotations and grammatical features. 1
elte-dh/poetry-corpus A large corpus of annotated Hungarian poems in XML format, with various annotations including grammatical features and sound patterns. 7
nytud/hucola A collection of 9,076 annotated sentences in Hungarian to evaluate linguistic acceptability and grammaticality 1
poltextlab/hunempoli_corpus A manually annotated corpus for training and testing machine learning models of Aspect Based Sentiment Analysis (ABSA) in Hungarian language. 0
vadno/korkor_pilot A large annotated corpus of Hungarian text with various linguistic annotations, split into development and test datasets for natural language processing tasks. 2
ukrainian-to-english-corpora/folktale_corpus A collection of Ukrainian folktales translated into English for linguistic and literary research purposes. 0
bertez/corpora A collection of Galician language data in JSON format. 2
eleutherai/polyglot Large language models designed to perform well in multiple languages and address performance issues with current multilingual models. 476
famrashel/idn-tagged-corpus A manually tagged Indonesian language corpus in tab-separated file format 88
nytud/nytk-nerkor A Hungarian language named entity annotated corpus containing 1 million tokens with morphological annotation layers and various source files. 15
famrashel/idn-treebank A manually tagged Indonesian corpus consisting of parse-trees from sentences. 36
qhungngo/evbcorpus A large-scale bilingual corpus collection for language technology and NLP tasks, containing English-Vietnamese translations and bitexts. 42
nytud/hucopa A dataset and annotation scheme for Hungarian causal reasoning tasks. 1
huspacy/huspacy An industrial-strength natural language processing library for Hungarian language text analysis 158
jbaiter/archiscribe-corpus A repository of transcribed 19th century German texts from various sources. 8