korkor_pilot
Hungarian Corpus
A large annotated corpus of Hungarian text with various linguistic annotations, split into development and test datasets for natural language processing tasks.
2 stars
2 watching
1 forks
Language: Python
last commit: almost 2 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
nytud/nytk-nerkor | A Hungarian language named entity annotated corpus containing 1 million tokens with morphological annotation layers and various source files. | 14 |
nytud/hucola | A dataset of Hungarian sentences annotated for their grammatical acceptability. | 1 |
poltextlab/hunempoli_corpus | A manually annotated corpus for training and testing machine learning models of Aspect Based Sentiment Analysis (ABSA) in Hungarian language. | 0 |
novakat/nytk-nerkor-cars-ontonotespp | A large annotated dataset of Hungarian text with over 30 entity types derived from various sources and formats. | 1 |
eyurtsev/kor | Extracts structured data from unstructured text using large language models | 1,629 |
kateryna-bobrovnyk/ukr-twi-corpus | A collection of Ukrainian Twitter texts for linguistic analysis and research | 15 |
elte-dh/regenykorpusz | A large corpus of Hungarian novels with annotated texts and metadata, developed by the Department of Digital Humanities at Eötvös Loránd University. | 4 |
nytud/hadifogoly-adatbazis | An attempt to transcribe Cyrillic text into Hungarian script for a large dataset of WWII prisoner-of-war records | 23 |
nytud/emtsv | A text processing system designed to handle various tasks in Hungarian language processing using Python and TSV-based data exchange. | 27 |
irwin1985/hungaro | A programming language based on Hungarian notation with the aim of improving source code readability and avoiding ambiguities. | 8 |
nytud/panmorph | Harmonized tagset and annotation scheme for Hungarian morphological analysers | 4 |
kefirski/bytenet | A Pytorch implementation of a neural network model for machine translation | 47 |
brown-uk/corpus | Creating a balanced corpus of modern Ukrainian language with 1 million words, based on the Brown Corpus model. | 110 |
huspacy/huspacy | An industrial-strength natural language processing library for Hungarian language text analysis | 155 |
sedthh/lara-hungarian-nlp | A lightweight Python library for natural language processing in Hungarian | 29 |