GloVe

Word Vector Library

Provides pre-trained word vector representations and an implementation of the GloVe model for learning word embeddings

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings

GitHub

7k stars
229 watching
2k forks
Language: C
last commit: about 1 year ago
Linked from 2 awesome lists


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
cemoody/lda2vec A framework for creating interpretable natural language models by combining word embeddings and topic modeling. 3,149
plasticityai/magnitude A fast and efficient utility package for utilizing vector embeddings in machine learning models 1,627
alexandres/lexvec An implementation of a word embedding model that uses character n-grams and achieves state-of-the-art results in multiple NLP tasks 803
embedding/chinese-word-vectors Provides pre-trained vectors with various properties for downstream tasks in natural language processing 11,837
jwieting/paragram-word Trains word embeddings from a paraphrase database to represent semantic relationships between words. 30
ynqa/wego An open-source Go library for learning and manipulating vector representations of words 474
google/sentencepiece An unsupervised text tokenizer that segments input text into subwords and detokenizes output based on a predefined vocabulary size. 10,284
jwieting/iclr2016 Code for training universal paraphrastic sentence embeddings and models on semantic similarity tasks 193
stanfordnlp/stanza A Python library for natural language processing tasks in many human languages. 7,294
princeton-nlp/simcse An open source framework for learning sentence embeddings using contrastive learning. 3,423
piskvorky/gensim-data A repository of pre-trained NLP models and corpora for text processing. 988
codertimo/bert-pytorch An implementation of Google's 2018 BERT model in PyTorch, allowing pre-training and fine-tuning for natural language processing tasks 6,222
bigscience-workshop/promptsource A toolkit for creating and using natural language prompts to enable large language models to generalize to new tasks. 2,696
huggingface/tokenizers A toolkit providing optimized tokenizers for natural language processing tasks in various programming languages. 9,051
giuseppemarra/char-word-embeddings This repository provides an unsupervised approach to learning character-aware word and context embeddings. 0