hadifogoly-adatbazis

Transcription project

An attempt to transcribe Cyrillic text into Hungarian script for a large dataset of WWII prisoner-of-war records

A magyar hadifoglyok adatbázisának orosz-magyar transzkripciója

GitHub

23 stars
5 watching
4 forks
Language: Python
last commit: over 3 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
nytud/emtsv A text processing system designed to handle various tasks in Hungarian language processing using Python and TSV-based data exchange. 27
nytud/hucola A dataset of Hungarian sentences annotated for their grammatical acceptability. 1
nytud/panmorph Harmonized tagset and annotation scheme for Hungarian morphological analysers 4
nytud/machine-translation Provides machine translation models and a demo site for Hungarian language translations 5
nytud/hulu A collection of linguistic datasets and benchmarks for natural language understanding tasks 9
nytud/hunlp-gate A collection of Hungarian NLP tools integrated as GATE processing resources 8
nytud/emlam Preprocessing and modeling scripts for Hungarian language modeling using Python and TensorFlow. 3
nytud/huws A dataset of manually curated Hungarian sentences with ambiguous wordings that require world knowledge and reasoning for resolution. 1
nytud/happ A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms 1
vadno/korkor_pilot A large annotated corpus of Hungarian text with various linguistic annotations, split into development and test datasets for natural language processing tasks. 2
nytud/pws A collection of parallel corpora of Winograd schemata in multiple languages 0
dmort27/epitran A tool for transcribing written text into the International Phonetic Alphabet (IPA) format. 653
nytud/quntoken A C++ tokenizer that tokenizes Hungarian text 14
nytud/nytk-nerkor A Hungarian language named entity annotated corpus containing 1 million tokens with morphological annotation layers and various source files. 14
ytsvetko/str2ipa A tool for phonetic transcription of languages with close-to-phonetic writing systems 10