gensim-data

NLP datasets

A repository of pre-trained NLP models and corpora for text processing.

Data repository for pretrained NLP models and NLP corpora.

GitHub

988 stars
39 watching
133 forks
Language: Python
last commit: over 6 years ago
Linked from 1 awesome list

corporadatasetgensimglove-modellda-modellsi-modelpretrained-modelsword2vec-model

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
nttcslab-nlp/doc_lm This repository contains source files and training scripts for language models. 12
karthikncode/nlp-datasets A curated list of Natural Language Processing datasets used to train and evaluate NLP models. 919
rdspring1/pytorch_gbw_lm Trains a large-scale PyTorch language model on the 1-Billion Word dataset 123
balavenkatesh3322/nlp-pretrained-model A collection of pre-trained natural language processing models 170
shawn-ieitsystems/yuan-1.0 Large-scale language model with improved performance on NLP tasks through distributed training and efficient data processing 591
fido-ai/ua-datasets Provides a collection of datasets for natural language processing in Ukrainian. 55
01-ai/yi A series of large language models trained from scratch to excel in multiple NLP tasks 7,699
vhellendoorn/code-lms A guide to using pre-trained large language models in source code analysis and generation 1,782
gmftbygmftby/science-llm A large-scale language model for scientific domain training on redpajama arXiv split 122
zhuiyitechnology/pretrained-models A collection of pre-trained language models for natural language processing tasks 987
radi-cho/datasetgpt A command-line interface to generate textual datasets with Large Language Models 293
davidnemeskey/embert Provides pre-trained transformer-based models and tools for natural language processing tasks 2
multimodal-art-projection/map-neo A large language model designed for research and application in natural language processing tasks. 877
eyurtsev/kor Extracts structured data from unstructured text using large language models 1,629
da-southampton/redgpt A library providing a pre-trained language model for natural language inference tasks using a transformer architecture. 62