NYTK-NerKor-Cars-OntoNotesPP

Hungarian NER dataset

A large annotated dataset of Hungarian text with over 30 entity types derived from various sources and formats.

A 1M+-token Hungarian named entity dataset with ~30 entity types derived from NYTK-NerKor

GitHub

1 stars

1 watching

1 forks

last commit: over 4 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

oroszgy/awesome-hungarian-nlp

Related projects:

Repository	Description	Stars
nytud/nytk-nerkor	A Hungarian language named entity annotated corpus containing 1 million tokens with morphological annotation layers and various source files.	15
lang-uk/ner-uk	A Ukrainian NER corpus and annotation dataset for training and evaluating named entity recognition models.	90
nytud/panmorph	Harmonized tagset and annotation scheme for Hungarian morphological analysers	4
szegedai/hun_ner_checklist	Provides diagnostic test cases for evaluating Hungarian Named Entity Recognition models	0
vadno/korkor_pilot	A large annotated corpus of Hungarian text with various linguistic annotations, split into development and test datasets for natural language processing tasks.	2
nytud/happ	A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms	1
karthikncode/nlp-datasets	A curated list of Natural Language Processing datasets used to train and evaluate NLP models.	919
text-mining/persian-ner	A Persian named entity recognition system with a large, labeled dataset.	225
nytud/husst	A dataset of annotated sentences for training and evaluating sentiment analysis models in the Hungarian language.	1
itunlp/daner	A tool for identifying and categorizing named entities in Danish text using machine learning and natural language processing techniques.	17
nytud/hulu	A collection of linguistic datasets and benchmarks for natural language understanding tasks	8
deeppavlov/slavic-bert-ner	A shared BERT model for NER tasks in Slavic languages, pre-trained on Bulgarian, Czech, Polish, and Russian texts.	73
kamalkraj/bert-ner	An implementation of named entity recognition using Google's BERT model for the CoNLL-2003 dataset and Python.	1,220
nytud/hucola	A collection of 9,076 annotated sentences in Hungarian to evaluate linguistic acceptability and grammaticality	1
mirfan899/urdu	A collection of Urdu language datasets for various NLP tasks and applications	71