NYTK-NerKor-Cars-OntoNotesPP
Hungarian NER dataset
A large annotated dataset of Hungarian text with over 30 entity types derived from various sources and formats.
A 1M+-token Hungarian named entity dataset with ~30 entity types derived from NYTK-NerKor
1 stars
1 watching
1 forks
last commit: almost 3 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
nytud/nytk-nerkor | A Hungarian language named entity annotated corpus containing 1 million tokens with morphological annotation layers and various source files. | 14 |
lang-uk/ner-uk | A Ukrainian NER corpus and annotation dataset for training and evaluating named entity recognition models. | 90 |
nytud/panmorph | Harmonized tagset and annotation scheme for Hungarian morphological analysers | 4 |
szegedai/hun_ner_checklist | Provides diagnostic test cases for evaluating Hungarian Named Entity Recognition models | 0 |
vadno/korkor_pilot | A large annotated corpus of Hungarian text with various linguistic annotations, split into development and test datasets for natural language processing tasks. | 2 |
nytud/happ | A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms | 1 |
karthikncode/nlp-datasets | A curated list of Natural Language Processing datasets used to train and evaluate NLP models. | 919 |
text-mining/persian-ner | A Persian named entity recognition system with a large, labeled dataset. | 224 |
nytud/husst | A dataset and benchmarking kit for evaluating language understanding in Hungarian | 1 |
itunlp/daner | A tool for identifying and categorizing named entities in Danish text using machine learning and natural language processing techniques. | 17 |
nytud/hulu | A collection of linguistic datasets and benchmarks for natural language understanding tasks | 9 |
deeppavlov/slavic-bert-ner | A shared BERT model for NER tasks in Slavic languages, pre-trained on Bulgarian, Czech, Polish, and Russian texts. | 73 |
kamalkraj/bert-ner | An implementation of named entity recognition using Google's BERT model for the CoNLL-2003 dataset and Python. | 1,211 |
nytud/hucola | A dataset of Hungarian sentences annotated for their grammatical acceptability. | 1 |
mirfan899/urdu | A collection of Urdu language datasets for various NLP tasks and applications | 71 |