emLam

Language model script

Preprocessing and modeling scripts for Hungarian language modeling using Python and TensorFlow.

Preprocessing scripts for Hungarian Language Modeling

GitHub

3 stars
1 watching
2 forks
Language: Python
last commit: almost 5 years ago
Linked from 1 awesome list

language-modelingpaperpythontensorflow

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
nytud/emtsv A text processing system designed to handle various tasks in Hungarian language processing using Python and TSV-based data exchange. 28
nytud/emmorph An online Hungarian humor analysis tool using morphology and finite-state grammar. 14
nytud/machine-translation Provides machine translation models and a demo site for Hungarian language translations 5
ppke-nlpg/emmorphpy A Python wrapper and lemmatizer for emMorph, a Hungarian morphological analyzer. 3
nytud/panmorph Harmonized tagset and annotation scheme for Hungarian morphological analysers 4
nytud/hucola A collection of 9,076 annotated sentences in Hungarian to evaluate linguistic acceptability and grammaticality 1
nytud/hadifogoly-adatbazis An attempt to transcribe Cyrillic text into Hungarian script for a large dataset of WWII prisoner-of-war records 23
nytud/hunlp-gate A collection of Hungarian NLP tools integrated as GATE processing resources 8
yfzhang114/slime Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. 143
ermlab/politbert Trains a language model using a RoBERTa architecture on high-quality Polish text data 33
nytud/quntoken A C++ tokenizer that tokenizes Hungarian text 14
jalammar/ecco An interactive visualization library for exploring and understanding transformer-based language models 1,986
nytud/nytk-nerkor A Hungarian language named entity annotated corpus containing 1 million tokens with morphological annotation layers and various source files. 15
davidnemeskey/embert Provides pre-trained transformer-based models and tools for natural language processing tasks 2
vhellendoorn/code-lms A guide to using pre-trained large language models in source code analysis and generation 1,789