UD_Hungarian-Szeged
Hungarian text dataset
A corpus of annotated Hungarian text data for machine learning and natural language processing tasks
Hungarian data
5 stars
133 watching
0 forks
Language: Shell
last commit: 9 days ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
universaldependencies/ud_galician-treegal | A treebank for the Galician language with annotated syntactic and morphological features. | 6 |
universaldependencies/ud_ukrainian-iu | A dataset of annotated text in Ukrainian with standardized formatting and annotation guidelines. | 28 |
universaldependencies/ud_galician-ctg | This is a collection of annotated text data for the Galician language. | 1 |
universaldependencies/ud_vietnamese-vtb | An annotated corpus of Vietnamese language structure | 36 |
nytud/huws | A dataset of manually curated Hungarian sentences with ambiguous wordings that require world knowledge and reasoning for resolution. | 1 |
nytud/hucola | A dataset of Hungarian sentences annotated for their grammatical acceptability. | 1 |
nytud/husst | A dataset and benchmarking kit for evaluating language understanding in Hungarian | 1 |
nytud/hulu | A collection of linguistic datasets and benchmarks for natural language understanding tasks | 9 |
huspacy/huspacy | An industrial-strength natural language processing library for Hungarian language text analysis | 155 |
nytud/pws | A collection of parallel corpora of Winograd schemata in multiple languages | 0 |
nytud/panmorph | Harmonized tagset and annotation scheme for Hungarian morphological analysers | 4 |
universaldependencies/docs | An online documentation repository providing detailed resources and guides for the Universal Dependencies project | 273 |
nytud/hadifogoly-adatbazis | An attempt to transcribe Cyrillic text into Hungarian script for a large dataset of WWII prisoner-of-war records | 23 |
mmihaltz/huwn.rdf | Hungarian WordNet data in RDF format for use in semantic web applications | 2 |
novakat/nytk-nerkor-cars-ontonotespp | A large annotated dataset of Hungarian text with over 30 entity types derived from various sources and formats. | 1 |