HuLU
Language datasets
A collection of linguistic datasets and benchmarks for natural language understanding tasks
Hungarian Language Understanding Benchmark Kit
9 stars
3 watching
0 forks
last commit: 4 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
nytud/huws | A dataset of manually curated Hungarian sentences with ambiguous wordings that require world knowledge and reasoning for resolution. | 1 |
nytud/husst | A dataset and benchmarking kit for evaluating language understanding in Hungarian | 1 |
nytud/hucola | A dataset of Hungarian sentences annotated for their grammatical acceptability. | 1 |
nytud/happ | A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms | 1 |
nytud/huwnli | A dataset and toolset for Hungarian anaphora resolution in natural language inference tasks | 0 |
nytud/pws | A collection of parallel corpora of Winograd schemata in multiple languages | 0 |
nytud/hunlp-gate | A collection of Hungarian NLP tools integrated as GATE processing resources | 8 |
nytud/panmorph | Harmonized tagset and annotation scheme for Hungarian morphological analysers | 4 |
nytud/machine-translation | Provides machine translation models and a demo site for Hungarian language translations | 5 |
xuefuzhao/instructionwild | Creating a large-scale user-based instruction dataset for natural language processing research and development | 453 |
alexa/massive | A collection of tools and modeling code for a large multilingual Natural Language Understanding dataset | 538 |
turkunlp/wikibert | Provides pre-trained language models derived from Wikipedia texts for natural language processing tasks | 34 |
karthikncode/nlp-datasets | A curated list of Natural Language Processing datasets used to train and evaluate NLP models. | 919 |
nytud/hadifogoly-adatbazis | An attempt to transcribe Cyrillic text into Hungarian script for a large dataset of WWII prisoner-of-war records | 23 |
novakat/nytk-nerkor-cars-ontonotespp | A large annotated dataset of Hungarian text with over 30 entity types derived from various sources and formats. | 1 |