nlp-datasets

NLP datasets

A curated list of Natural Language Processing datasets used to train and evaluate NLP models.

A list of datasets/corpora for NLP tasks, in reverse chronological order.

GitHub

919 stars
81 watching
253 forks
last commit: almost 5 years ago
Linked from 3 awesome lists


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
mirfan899/urdu A collection of Urdu language datasets for various NLP tasks and applications 71
louisowen6/nlp_bahasa_resources A curated collection of NLP datasets and resources for Bahasa Indonesia 496
balavenkatesh3322/nlp-pretrained-model A collection of pre-trained natural language processing models 170
chakki-works/chazutsu A tool that simplifies the process of preparing and manipulating natural language processing datasets 243
dayyass/dayyass A collection of libraries and tools for natural language processing and reinforcement learning. 39
kmkurn/id-nlp-resource A collection of annotated NLP resources for the Indonesian language 279
fido-ai/ua-datasets Provides a collection of datasets for natural language processing in Ukrainian. 57
kimtaro/ve A linguistic framework for natural language processing tasks. 216
justfollowus/natural-language-processing Comprehensive resource for learning natural language processing (NLP) with a structured course outline and recommended readings. 834
goru001/inltk A comprehensive toolkit for Natural Language Processing tasks in Indic languages, providing pre-trained models and datasets. 825
diasks2/ruby-nlp A collection of Ruby Natural Language Processing libraries and tools 1,272
anoopkunchukuttan/indic_nlp_library A Python-based library providing common text processing and Natural Language Processing tools for Indian languages 561
piskvorky/gensim-data A repository of pre-trained NLP models and corpora for text processing. 990
web64/norwegian-nlp-resources A collection of pre-trained NLP models and resources for the Norwegian language. 178
cltk/cltk A Python library offering natural language processing capabilities for pre-modern languages 843