gensim-data

NLP datasets

A repository of pre-trained NLP models and corpora for text processing.

Data repository for pretrained NLP models and NLP corpora.

GitHub

990 stars

39 watching

135 forks

Language: Python

last commit: over 8 years ago

Linked from 1 awesome list

corporadatasetgensimglove-modellda-modellsi-modelpretrained-modelsword2vec-model

Screenshot of piskvorky/gensim-data website

rare-technologies.com/new-api-for-pretrained-nlp-models-and-datasets-in-gensim/

Backlinks from these awesome lists:

keon/awesome-nlp

Related projects:

Repository	Description	Stars
nttcslab-nlp/doc_lm	This repository contains source files and training scripts for language models.	12
karthikncode/nlp-datasets	A curated list of Natural Language Processing datasets used to train and evaluate NLP models.	919
rdspring1/pytorch_gbw_lm	Trains a large-scale PyTorch language model on the 1-Billion Word dataset	123
balavenkatesh3322/nlp-pretrained-model	A collection of pre-trained natural language processing models	170
shawn-ieitsystems/yuan-1.0	Large-scale language model with improved performance on NLP tasks through distributed training and efficient data processing	591
fido-ai/ua-datasets	Provides a collection of datasets for natural language processing in Ukrainian.	57
01-ai/yi	A series of large language models trained from scratch to excel in multiple NLP tasks	7,743
vhellendoorn/code-lms	A guide to using pre-trained large language models in source code analysis and generation	1,789
gmftbygmftby/science-llm	A large-scale language model for scientific domain training on redpajama arXiv split	125
zhuiyitechnology/pretrained-models	A collection of pre-trained language models for natural language processing tasks	989
radi-cho/datasetgpt	A command-line interface to generate textual datasets with Large Language Models	293
davidnemeskey/embert	Provides pre-trained transformer-based models and tools for natural language processing tasks	2
multimodal-art-projection/map-neo	A large language model designed for research and application in natural language processing tasks.	887
eyurtsev/kor	An open-source wrapper around LLMs to extract structured data from text	1,638
da-southampton/redgpt	A library providing a pre-trained language model for natural language inference tasks using a transformer architecture.	61