hadifogoly-adatbazis

Transcription project

An attempt to transcribe Cyrillic text into Hungarian script for a large dataset of WWII prisoner-of-war records

A magyar hadifoglyok adatbázisának orosz-magyar transzkripciója

GitHub

23 stars

5 watching

4 forks

Language: Python

last commit: about 5 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

oroszgy/awesome-hungarian-nlp

Related projects:

Repository	Description	Stars
nytud/emtsv	A text processing system designed to handle various tasks in Hungarian language processing using Python and TSV-based data exchange.	28
nytud/hucola	A collection of 9,076 annotated sentences in Hungarian to evaluate linguistic acceptability and grammaticality	1
nytud/panmorph	Harmonized tagset and annotation scheme for Hungarian morphological analysers	4
nytud/machine-translation	Provides machine translation models and a demo site for Hungarian language translations	5
nytud/hulu	A collection of linguistic datasets and benchmarks for natural language understanding tasks	8
nytud/hunlp-gate	A collection of Hungarian NLP tools integrated as GATE processing resources	8
nytud/emlam	Preprocessing and modeling scripts for Hungarian language modeling using Python and TensorFlow.	3
nytud/huws	A dataset of manually curated Hungarian sentences with ambiguous wordings that require world knowledge and reasoning for resolution.	1
nytud/happ	A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms	1
vadno/korkor_pilot	A large annotated corpus of Hungarian text with various linguistic annotations, split into development and test datasets for natural language processing tasks.	2
nytud/pws	A collection of parallel corpora of Winograd schemata in multiple languages	0
dmort27/epitran	A tool for transcribing written text into the International Phonetic Alphabet (IPA) format.	668
nytud/quntoken	A C++ tokenizer that tokenizes Hungarian text	14
nytud/nytk-nerkor	A Hungarian language named entity annotated corpus containing 1 million tokens with morphological annotation layers and various source files.	15
ytsvetko/str2ipa	A tool for phonetic transcription of languages with close-to-phonetic writing systems	10