nlp-datasets

Text datasets

A collection of text datasets for use in Natural Language Processing

Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP)

GitHub

6k stars
234 watching
963 forks
last commit: almost 2 years ago
Linked from 5 awesome lists


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
karthikncode/nlp-datasets A curated list of Natural Language Processing datasets used to train and evaluate NLP models. 919
sebastianruder/nlp-progress A comprehensive repository tracking progress in NLP tasks and their corresponding datasets. 22,715
mirfan899/urdu A collection of Urdu language datasets for various NLP tasks and applications 71
louisowen6/nlp_bahasa_resources A curated collection of NLP datasets and resources for Bahasa Indonesia 489
axa-group/nlp.js A comprehensive NLP library for building conversational AI systems with entity extraction, sentiment analysis, language identification, and more. 6,283
brightmart/text_classification An NLP project offering various text classification models and techniques for deep learning exploration 7,861
stanfordnlp/stanza A Python library for natural language processing tasks in many human languages. 7,294
balavenkatesh3322/nlp-pretrained-model A collection of pre-trained natural language processing models 170
nltk/nltk A comprehensive toolkit for natural language processing tasks in Python. 13,646
bigscience-workshop/promptsource A toolkit for creating and using natural language prompts to enable large language models to generalize to new tasks. 2,700
adbar/german-nlp A curated collection of German language resources and tools for natural language processing 451
fido-ai/ua-datasets Provides a collection of datasets for natural language processing in Ukrainian. 56
stanfordnlp/corenlp A Java-based suite of tools for natural language processing and analysis 9,704
mhagiwara/100-nlp-papers A curated collection of 100 essential NLP papers for researchers and developers to understand the foundations of natural language processing 3,753
pawangeek/deep-nlp-resources A curated collection of natural language processing resources and libraries for developers to access and build upon 72