GloVe
Word Vector Library
Provides pre-trained word vector representations and an implementation of the GloVe model for learning word embeddings
Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
7k stars
229 watching
2k forks
Language: C
last commit: about 1 year ago
Linked from 2 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
cemoody/lda2vec | A framework for creating interpretable natural language models by combining word embeddings and topic modeling. | 3,149 |
plasticityai/magnitude | A fast and efficient utility package for utilizing vector embeddings in machine learning models | 1,627 |
alexandres/lexvec | An implementation of a word embedding model that uses character n-grams and achieves state-of-the-art results in multiple NLP tasks | 803 |
embedding/chinese-word-vectors | Provides pre-trained vectors with various properties for downstream tasks in natural language processing | 11,837 |
jwieting/paragram-word | Trains word embeddings from a paraphrase database to represent semantic relationships between words. | 30 |
ynqa/wego | An open-source Go library for learning and manipulating vector representations of words | 474 |
google/sentencepiece | An unsupervised text tokenizer that segments input text into subwords and detokenizes output based on a predefined vocabulary size. | 10,284 |
jwieting/iclr2016 | Code for training universal paraphrastic sentence embeddings and models on semantic similarity tasks | 193 |
stanfordnlp/stanza | A Python library for natural language processing tasks in many human languages. | 7,294 |
princeton-nlp/simcse | An open source framework for learning sentence embeddings using contrastive learning. | 3,423 |
piskvorky/gensim-data | A repository of pre-trained NLP models and corpora for text processing. | 988 |
codertimo/bert-pytorch | An implementation of Google's 2018 BERT model in PyTorch, allowing pre-training and fine-tuning for natural language processing tasks | 6,222 |
bigscience-workshop/promptsource | A toolkit for creating and using natural language prompts to enable large language models to generalize to new tasks. | 2,696 |
huggingface/tokenizers | A toolkit providing optimized tokenizers for natural language processing tasks in various programming languages. | 9,051 |
giuseppemarra/char-word-embeddings | This repository provides an unsupervised approach to learning character-aware word and context embeddings. | 0 |