gensim-data
NLP datasets
A repository of pre-trained NLP models and corpora for text processing.
Data repository for pretrained NLP models and NLP corpora.
990 stars
39 watching
135 forks
Language: Python
last commit: almost 7 years ago
Linked from 1 awesome list
corporadatasetgensimglove-modellda-modellsi-modelpretrained-modelsword2vec-model
Related projects:
Repository | Description | Stars |
---|---|---|
| This repository contains source files and training scripts for language models. | 12 |
| A curated list of Natural Language Processing datasets used to train and evaluate NLP models. | 919 |
| Trains a large-scale PyTorch language model on the 1-Billion Word dataset | 123 |
| A collection of pre-trained natural language processing models | 170 |
| Large-scale language model with improved performance on NLP tasks through distributed training and efficient data processing | 591 |
| Provides a collection of datasets for natural language processing in Ukrainian. | 57 |
| A series of large language models trained from scratch to excel in multiple NLP tasks | 7,743 |
| A guide to using pre-trained large language models in source code analysis and generation | 1,789 |
| A large-scale language model for scientific domain training on redpajama arXiv split | 125 |
| A collection of pre-trained language models for natural language processing tasks | 989 |
| A command-line interface to generate textual datasets with Large Language Models | 293 |
| Provides pre-trained transformer-based models and tools for natural language processing tasks | 2 |
| A large language model designed for research and application in natural language processing tasks. | 887 |
| An open-source wrapper around LLMs to extract structured data from text | 1,638 |
| A library providing a pre-trained language model for natural language inference tasks using a transformer architecture. | 61 |