NYTK-NerKor-Cars-OntoNotesPP

Hungarian NER dataset

A large annotated dataset of Hungarian text with over 30 entity types derived from various sources and formats.

A 1M+-token Hungarian named entity dataset with ~30 entity types derived from NYTK-NerKor

GitHub

1 stars
1 watching
1 forks
last commit: almost 3 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
nytud/nytk-nerkor A Hungarian language named entity annotated corpus containing 1 million tokens with morphological annotation layers and various source files. 14
lang-uk/ner-uk A Ukrainian NER corpus and annotation dataset for training and evaluating named entity recognition models. 90
nytud/panmorph Harmonized tagset and annotation scheme for Hungarian morphological analysers 4
szegedai/hun_ner_checklist Provides diagnostic test cases for evaluating Hungarian Named Entity Recognition models 0
vadno/korkor_pilot A large annotated corpus of Hungarian text with various linguistic annotations, split into development and test datasets for natural language processing tasks. 2
nytud/happ A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms 1
karthikncode/nlp-datasets A curated list of Natural Language Processing datasets used to train and evaluate NLP models. 919
text-mining/persian-ner A Persian named entity recognition system with a large, labeled dataset. 224
nytud/husst A dataset and benchmarking kit for evaluating language understanding in Hungarian 1
itunlp/daner A tool for identifying and categorizing named entities in Danish text using machine learning and natural language processing techniques. 17
nytud/hulu A collection of linguistic datasets and benchmarks for natural language understanding tasks 9
deeppavlov/slavic-bert-ner A shared BERT model for NER tasks in Slavic languages, pre-trained on Bulgarian, Czech, Polish, and Russian texts. 73
kamalkraj/bert-ner An implementation of named entity recognition using Google's BERT model for the CoNLL-2003 dataset and Python. 1,211
nytud/hucola A dataset of Hungarian sentences annotated for their grammatical acceptability. 1
mirfan899/urdu A collection of Urdu language datasets for various NLP tasks and applications 71