korkor_pilot

Hungarian Corpus

A large annotated corpus of Hungarian text with various linguistic annotations, split into development and test datasets for natural language processing tasks.

GitHub

2 stars
2 watching
1 forks
Language: Python
last commit: almost 2 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
nytud/nytk-nerkor A Hungarian language named entity annotated corpus containing 1 million tokens with morphological annotation layers and various source files. 14
nytud/hucola A dataset of Hungarian sentences annotated for their grammatical acceptability. 1
poltextlab/hunempoli_corpus A manually annotated corpus for training and testing machine learning models of Aspect Based Sentiment Analysis (ABSA) in Hungarian language. 0
novakat/nytk-nerkor-cars-ontonotespp A large annotated dataset of Hungarian text with over 30 entity types derived from various sources and formats. 1
eyurtsev/kor Extracts structured data from unstructured text using large language models 1,629
kateryna-bobrovnyk/ukr-twi-corpus A collection of Ukrainian Twitter texts for linguistic analysis and research 15
elte-dh/regenykorpusz A large corpus of Hungarian novels with annotated texts and metadata, developed by the Department of Digital Humanities at Eötvös Loránd University. 4
nytud/hadifogoly-adatbazis An attempt to transcribe Cyrillic text into Hungarian script for a large dataset of WWII prisoner-of-war records 23
nytud/emtsv A text processing system designed to handle various tasks in Hungarian language processing using Python and TSV-based data exchange. 27
irwin1985/hungaro A programming language based on Hungarian notation with the aim of improving source code readability and avoiding ambiguities. 8
nytud/panmorph Harmonized tagset and annotation scheme for Hungarian morphological analysers 4
kefirski/bytenet A Pytorch implementation of a neural network model for machine translation 47
brown-uk/corpus Creating a balanced corpus of modern Ukrainian language with 1 million words, based on the Brown Corpus model. 110
huspacy/huspacy An industrial-strength natural language processing library for Hungarian language text analysis 155
sedthh/lara-hungarian-nlp A lightweight Python library for natural language processing in Hungarian 29