gensim-data
NLP datasets
A repository of pre-trained NLP models and corpora for text processing.
Data repository for pretrained NLP models and NLP corpora.
988 stars
39 watching
133 forks
Language: Python
last commit: over 6 years ago
Linked from 1 awesome list
corporadatasetgensimglove-modellda-modellsi-modelpretrained-modelsword2vec-model
Related projects:
Repository | Description | Stars |
---|---|---|
nttcslab-nlp/doc_lm | This repository contains source files and training scripts for language models. | 12 |
karthikncode/nlp-datasets | A curated list of Natural Language Processing datasets used to train and evaluate NLP models. | 919 |
rdspring1/pytorch_gbw_lm | Trains a large-scale PyTorch language model on the 1-Billion Word dataset | 123 |
balavenkatesh3322/nlp-pretrained-model | A collection of pre-trained natural language processing models | 170 |
shawn-ieitsystems/yuan-1.0 | Large-scale language model with improved performance on NLP tasks through distributed training and efficient data processing | 591 |
fido-ai/ua-datasets | Provides a collection of datasets for natural language processing in Ukrainian. | 55 |
01-ai/yi | A series of large language models trained from scratch to excel in multiple NLP tasks | 7,699 |
vhellendoorn/code-lms | A guide to using pre-trained large language models in source code analysis and generation | 1,782 |
gmftbygmftby/science-llm | A large-scale language model for scientific domain training on redpajama arXiv split | 122 |
zhuiyitechnology/pretrained-models | A collection of pre-trained language models for natural language processing tasks | 987 |
radi-cho/datasetgpt | A command-line interface to generate textual datasets with Large Language Models | 293 |
davidnemeskey/embert | Provides pre-trained transformer-based models and tools for natural language processing tasks | 2 |
multimodal-art-projection/map-neo | A large language model designed for research and application in natural language processing tasks. | 877 |
eyurtsev/kor | Extracts structured data from unstructured text using large language models | 1,629 |
da-southampton/redgpt | A library providing a pre-trained language model for natural language inference tasks using a transformer architecture. | 62 |