nlp-datasets
NLP datasets
A curated list of Natural Language Processing datasets used to train and evaluate NLP models.
A list of datasets/corpora for NLP tasks, in reverse chronological order.
919 stars
81 watching
253 forks
last commit: almost 5 years ago
Linked from 3 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
mirfan899/urdu | A collection of Urdu language datasets for various NLP tasks and applications | 71 |
louisowen6/nlp_bahasa_resources | A curated collection of NLP datasets and resources for Bahasa Indonesia | 496 |
balavenkatesh3322/nlp-pretrained-model | A collection of pre-trained natural language processing models | 170 |
chakki-works/chazutsu | A tool that simplifies the process of preparing and manipulating natural language processing datasets | 243 |
dayyass/dayyass | A collection of libraries and tools for natural language processing and reinforcement learning. | 39 |
kmkurn/id-nlp-resource | A collection of annotated NLP resources for the Indonesian language | 279 |
fido-ai/ua-datasets | Provides a collection of datasets for natural language processing in Ukrainian. | 57 |
kimtaro/ve | A linguistic framework for natural language processing tasks. | 216 |
justfollowus/natural-language-processing | Comprehensive resource for learning natural language processing (NLP) with a structured course outline and recommended readings. | 834 |
goru001/inltk | A comprehensive toolkit for Natural Language Processing tasks in Indic languages, providing pre-trained models and datasets. | 825 |
diasks2/ruby-nlp | A collection of Ruby Natural Language Processing libraries and tools | 1,272 |
anoopkunchukuttan/indic_nlp_library | A Python-based library providing common text processing and Natural Language Processing tools for Indian languages | 561 |
piskvorky/gensim-data | A repository of pre-trained NLP models and corpora for text processing. | 990 |
web64/norwegian-nlp-resources | A collection of pre-trained NLP models and resources for the Norwegian language. | 178 |
cltk/cltk | A Python library offering natural language processing capabilities for pre-modern languages | 843 |