korkor_pilot

Hungarian Corpus

A large annotated corpus of Hungarian text with various linguistic annotations, split into development and test datasets for natural language processing tasks.

GitHub

2 stars

2 watching

1 forks

Language: Python

last commit: over 3 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

oroszgy/awesome-hungarian-nlp

Related projects:

Repository	Description	Stars
nytud/nytk-nerkor	A Hungarian language named entity annotated corpus containing 1 million tokens with morphological annotation layers and various source files.	15
nytud/hucola	A collection of 9,076 annotated sentences in Hungarian to evaluate linguistic acceptability and grammaticality	1
poltextlab/hunempoli_corpus	A manually annotated corpus for training and testing machine learning models of Aspect Based Sentiment Analysis (ABSA) in Hungarian language.	0
novakat/nytk-nerkor-cars-ontonotespp	A large annotated dataset of Hungarian text with over 30 entity types derived from various sources and formats.	1
eyurtsev/kor	An open-source wrapper around LLMs to extract structured data from text	1,638
kateryna-bobrovnyk/ukr-twi-corpus	A collection of Ukrainian Twitter texts for linguistic analysis and research	15
elte-dh/regenykorpusz	A large corpus of Hungarian novels with annotated texts and metadata, developed by the Department of Digital Humanities at Eötvös Loránd University.	4
nytud/hadifogoly-adatbazis	An attempt to transcribe Cyrillic text into Hungarian script for a large dataset of WWII prisoner-of-war records	23
nytud/emtsv	A text processing system designed to handle various tasks in Hungarian language processing using Python and TSV-based data exchange.	28
irwin1985/hungaro	A programming language based on Hungarian notation with the aim of improving source code readability and avoiding ambiguities.	8
nytud/panmorph	Harmonized tagset and annotation scheme for Hungarian morphological analysers	4
kefirski/bytenet	A Pytorch implementation of a neural network model for machine translation	47
brown-uk/corpus	Creating a balanced corpus of modern Ukrainian language with 1 million words, based on the Brown Corpus model.	110
huspacy/huspacy	An industrial-strength natural language processing library for Hungarian language text analysis	158
sedthh/lara-hungarian-nlp	A lightweight Python library for natural language processing in Hungarian	29