awesome-sentence-embedding

A curated list of pretrained sentence and word embedding models

Archived

GitHub

2k stars

78 watching

261 forks

Language: Python

last commit: about 5 years ago

Linked from 1 awesome list

awesomeawesome-listbertcontextualized-representationcross-lingualembedding-modelslanguage-modelnatural-languagenlppretrained-embeddingpretrained-language-modelpretrained-modelssentence-embeddingssentence-representationssubword-modelsunsupervised-learningword-embeddingswordembedding

awesome-sentence-embedding / Word Embeddings
WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models
RusVectōrēs
Efficient Estimation of Word Representations in Vector Space
C	1,525	over 3 years ago
Word2Vec
Word Representations via Gaussian Embedding
Cython	190	over 8 years ago
A Probabilistic Model for Learning Multi-Prototype Word Embeddings
DMTK	116	about 10 years ago
Dependency-Based Word Embeddings
C++
word2vecf
GloVe: Global Vectors for Word Representation
C	6,908	over 1 year ago
GloVe	6,908	over 1 year ago
Sparse Overcomplete Word Vector Representations
C++	54	almost 9 years ago
From Paraphrase Database to Compositional Paraphrase Model and Back
Theano	30	over 10 years ago
PARAGRAM
Non-distributional Word Vector Representations
Python	62	almost 9 years ago
WordFeat	62	almost 9 years ago
Joint Learning of Character and Word Embeddings
C	299	almost 6 years ago
SensEmbed: Learning Sense Embeddings for Word and Relational Similarity
SensEmbed
Topical Word Embeddings
Cython	314	over 8 years ago

Swivel: Improving Embeddings by Noticing What's Missing
TF	77,258	over 1 year ago
Counter-fitting Word Vectors to Linguistic Constraints
Python	145	over 6 years ago
counter-fitting			(broken)
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec
Chainer	3,152	over 4 years ago
Siamese CBOW: Optimizing Word Embeddings for Sentence Representations
Theano
Siamese CBOW
Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations
Go	803	over 5 years ago
lexvec	803	over 5 years ago
Enriching Word Vectors with Subword Information
C++	25,979	over 2 years ago
fastText
Morphological Priors for Probabilistic Neural Word Embeddings
Theano	52	over 9 years ago
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
C++	23	over 2 years ago
charNgram2vec
ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
Python	1,296	about 4 years ago
Numberbatch	1,296	about 4 years ago
Learning Word Meta-Embeddings
Meta-Emb			(broken)
Offline bilingual word vectors, orthogonal transformations and the inverted softmax
Python	1,197	over 3 years ago
Multimodal Word Distributions
TF	283	about 7 years ago
word2gm	283	about 7 years ago
Poincaré Embeddings for Learning Hierarchical Representations
Pytorch	1,684	almost 2 years ago
Context encoders as a simple but powerful extension of word2vec
Python	20	about 6 years ago
Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints
TF	64	almost 9 years ago
Attract-Repel	64	almost 9 years ago
Learning Chinese Word Representations From Glyphs Of Characters
C	30	about 8 years ago
Making Sense of Word Embeddings
Python	212	about 5 years ago
sensegram
Hash Embeddings for Efficient Word Representations
Keras	42	over 8 years ago
BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages
Gensim	1,189	almost 2 years ago
BPEmb	1,189	almost 2 years ago
SPINE: SParse Interpretable Neural Embeddings
Pytorch	52	over 6 years ago
SPINE
AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP
Gensim	395	over 5 years ago
AraVec	395	over 5 years ago
Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics
C	848	almost 7 years ago
Dict2vec : Learning Word Embeddings using Lexical Dictionaries
C++	115	over 5 years ago
Dict2vec	115	over 5 years ago
Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components
C	99	about 7 years ago
Representation Tradeoffs for Hyperbolic Embeddings
Pytorch	377	about 3 years ago
h-MDS	377	about 3 years ago
Dynamic Meta-Embeddings for Improved Sentence Representations
Pytorch	332	almost 6 years ago
DME/CDME	332	almost 6 years ago
Analogical Reasoning on Chinese Morphological and Semantic Relations
ChineseWordVectors	11,874	over 2 years ago
Probabilistic FastText for Multi-Sense Word Embeddings
C++	148	about 8 years ago
Probabilistic FastText	148	about 8 years ago
Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks
TF	291	over 3 years ago
SynGCN
FRAGE: Frequency-Agnostic Word Representation
Pytorch	118	about 7 years ago
Wikipedia2Vec: An Optimized Tool for LearningEmbeddings of Words and Entities from Wikipedia
Cython	946	about 2 years ago
Wikipedia2Vec
Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings
ChineseEmbedding
cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information
C++	274	over 3 years ago
VCWE: Visual Character-Enhanced Word Embeddings
Pytorch	15	about 7 years ago
VCWE	15	about 7 years ago
Learning Cross-lingual Embeddings from Twitter via Distant Supervision
Text	14	about 6 years ago
An Unsupervised Character-Aware Neural Approach to Word and Context Representation Learning
TF	0	almost 8 years ago
ViCo: Word Embeddings from Visual Co-occurrences
Pytorch	25	almost 7 years ago
ViCo	25	almost 7 years ago
Spherical Text Embedding
C	175	over 2 years ago
Unsupervised word embeddings capture latent knowledge from materials science literature
Gensim	624	about 3 years ago
awesome-sentence-embedding / OOV Handling
ALaCarte	104	almost 8 years ago	:
Mimick	153	over 6 years ago	:
CompactReconstruction	9	about 3 years ago	:
awesome-sentence-embedding / Contextualized Word Embeddings
Language Models are Unsupervised Multitask Learners
TF	22,644	almost 2 years ago
117M	22,644	almost 2 years ago	GPT-2( , , , , , )
Learned in Translation: Contextualized Word Vectors
Pytorch	473	over 4 years ago
CoVe	473	over 4 years ago
Universal Language Model Fine-tuning for Text Classification
Pytorch	26,390	over 1 year ago
English			ULMFit( , )
Deep contextualized word representations
Pytorch	11,774	over 3 years ago
AllenNLP			ELMO( , )
Efficient Contextualized Representation:Language Model Pruning for Sequence Labeling
Pytorch	147	over 6 years ago
LD-Net	147	over 6 years ago
Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation
Pytorch	1,462	about 5 years ago
ELMo	1,462	about 5 years ago
Direct Output Connection for a High-Rank Language Model
Pytorch	12	over 7 years ago
DOC
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TF	38,374	almost 2 years ago
BERT	38,374	almost 2 years ago	BERT( , , )
Contextual String Embeddings for Sequence Labeling
Pytorch	13,990	over 1 year ago
Flair	13,990	over 1 year ago
Improving Language Understanding by Generative Pre-Training
TF	2,167	over 7 years ago
GPT	2,167	over 7 years ago
Multi-Task Deep Neural Networks for Natural Language Understanding
Pytorch	2,238	over 2 years ago
MT-DNN	2,238	over 2 years ago
BioBERT: pre-trained biomedical language representation model for biomedical text mining
TF	1,970	almost 3 years ago
BioBERT	672	about 6 years ago
Cross-lingual Language Model Pretraining
Pytorch	2,893	over 3 years ago
XLM	2,893	over 3 years ago
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
TF	3,619	almost 4 years ago
Transformer-XL	3,619	almost 4 years ago
Efficient Contextual Representation Learning Without Softmax Layer
Pytorch	4	about 6 years ago
SciBERT: Pretrained Contextualized Embeddings for Scientific Text
Pytorch, TF	1,532	over 4 years ago
SciBERT	1,532	over 4 years ago
Publicly Available Clinical BERT Embeddings
Text	680	almost 6 years ago
clinicalBERT
ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission
Pytorch	386	almost 4 years ago
ClinicalBERT
ERNIE: Enhanced Language Representation with Informative Entities
Pytorch	1,413	over 2 years ago
ERNIE
Unified Language Model Pre-training for Natural Language Understanding and Generation
Pytorch	20,400	over 1 year ago
unilm1-large-cased			UniLMv1( , )
HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization
Pre-Training with Whole Word Masking for Chinese BERT
Pytorch, TF	9,746	almost 3 years ago
BERT-wwm	9,746	almost 3 years ago
XLNet: Generalized Autoregressive Pretraining for Language Understanding
TF	6,183	about 3 years ago
XLNet	6,183	about 3 years ago
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
PaddlePaddle	6,331	almost 2 years ago
ERNIE 2.0	6,331	almost 2 years ago
SpanBERT: Improving Pre-training by Representing and Predicting Spans
Pytorch	893	almost 3 years ago
SpanBERT	893	almost 3 years ago
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Pytorch	30,675	almost 2 years ago
RoBERTa	30,675	almost 2 years ago
Subword ELMo
Pytorch	12	over 6 years ago
Knowledge Enhanced Contextual Word Representations
TinyBERT: Distilling BERT for Natural Language Understanding
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Pytorch	10,804	over 1 year ago
BERT-345M			Megatron-LM( , )
MultiFiT: Efficient Multi-lingual Language Model Fine-tuning
Pytorch	284	about 6 years ago
Extreme Language Model Compression with Optimal Subwords and Shared Projections
MULE: Multimodal Universal Language Embedding
Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks
K-BERT: Enabling Language Representation with Knowledge Graph
UNITER: Learning UNiversal Image-TExt Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
TF	3,942	over 3 years ago
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Pytorch	30,675	almost 2 years ago
bart.base			BART( , , , , )
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Pytorch, TF2.0	136,357	over 1 year ago
DistilBERT	136,357	over 1 year ago
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
TF	6,215	almost 2 years ago
T5	6,215	almost 2 years ago
CamemBERT: a Tasty French Language Model
CamemBERT
ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations
Pytorch	645	almost 4 years ago
Unsupervised Cross-lingual Representation Learning at Scale
Pytorch	2,893	over 3 years ago
xlmr.large			XLM-R (XLM-RoBERTa)( , )
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
Pytorch	694	almost 2 years ago
ProphetNet-large-16GB			ProphetNet( , )
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Pytorch	2,281	about 3 years ago
CodeBERT
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training
Pytorch	20,400	over 1 year ago
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
TF	2,342	over 2 years ago
ELECTRA-Small			ELECTRA( , , )
MPNet: Masked and Permuted Pre-training for Language Understanding
Pytorch	288	almost 5 years ago
MPNet
ParsBERT: Transformer-based Model for Persian Language Understanding
Pytorch	341	about 3 years ago
ParsBERT
Language Models are Few-Shot Learners
InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training
Pytorch	20,400	over 1 year ago
awesome-sentence-embedding / Pooling Methods
SIF	1,084	almost 7 years ago	:
TF-IDF	9	over 3 years ago	:
P-norm	186	over 5 years ago	:
DisC	54	over 6 years ago	:
GEM	19	over 7 years ago	:
SWEM	284	over 3 years ago	:
VLAWE	10	about 7 years ago	:
Efficient Sentence Embedding using Discrete Cosine Transform
fse: Gensim add-on for fast sentence embeddings. Supports Mean, Max, SIF, uSIF	618	over 3 years ago
Efficient Sentence Embedding via Semantic Subspace Analysis
awesome-sentence-embedding / Encoders
Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings
Python	5	almost 7 years ago
Distributed Representations of Sentences and Documents
Pytorch	413	over 3 years ago
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
Theano	427	over 9 years ago
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
Theano	2,050	about 6 years ago
Order-Embeddings of Images and Language
Theano	186	almost 10 years ago
Towards Universal Paraphrastic Sentence Embeddings
Theano	193	over 10 years ago
From Word Embeddings to Document Distances
C, Python	538	about 2 years ago
Learning Distributed Representations of Sentences from Unlabelled Data
Python	124	over 9 years ago
Charagram: Embedding Words and Sentences via Character n-grams
Theano	125	about 10 years ago
Learning Generic Sentence Representations Using Convolutional Neural Networks
Theano	34	almost 9 years ago
Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features
C++	1,194	almost 4 years ago
Learning to Generate Reviews and Discovering Sentiment
TF	1,512	about 3 years ago
Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings
Theano	33	about 9 years ago
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
Pytorch	2,282	almost 5 years ago
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
Pytorch	492	over 4 years ago
Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm
Keras	1,525	almost 2 years ago
StarSpace: Embed All The Things!
C++	3,948	over 3 years ago
DisSent: Learning Sentence Representations from Explicit Discourse Relations
Pytorch	33	over 6 years ago
Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations
Theano	102	over 2 years ago
Dual-Path Convolutional Image-Text Embedding with Instance Loss
Matlab	287	about 3 years ago
An efficient framework for learning sentence representations
TF	205	about 7 years ago
Universal Sentence Encoder
TF-Hub
End-Task Oriented Textual Entailment via Deep Explorations of Inter-Sentence Interactions
Theano	16	about 8 years ago
Learning general purpose distributed sentence representations via large scale multi-task learning
Pytorch	311	almost 6 years ago
Embedding Text in Hyperbolic Spaces
TF	8	almost 9 years ago
Representation Learning with Contrastive Predictive Coding
Keras	527	about 7 years ago
Context Mover’s Distance & Barycenters: Optimal transport of contexts for building representations
Python	21	over 5 years ago
Learning Universal Sentence Representations with Mean-Max Attention Autoencoder
TF	16	over 7 years ago
Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model
TF-Hub
Improving Sentence Representations with Consensus Maximisation
BioSentVec: creating sentence embeddings for biomedical texts
Python	578	almost 3 years ago
Word Mover's Embedding: From Word2Vec to Document Embedding
C, Python	81	over 7 years ago
A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks
Pytorch	1,191	almost 3 years ago
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
Pytorch	3,604	about 2 years ago
Convolutional Neural Network for Universal Sentence Embeddings
Theano	2	about 8 years ago
No Training Required: Exploring Random Encoders for Sentence Classification
Pytorch	184	over 6 years ago
CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model
Pytorch	21	about 7 years ago
GLOSS: Generative Latent Optimization of Sentence Representations
Multilingual Universal Sentence Encoder
TF-Hub
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Pytorch	15,556	over 1 year ago
SBERT-WK: A Sentence Embedding Method By Dissecting BERT-based Word Models
Pytorch	178	over 5 years ago
DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations
Pytorch	380	over 3 years ago
Language-agnostic BERT Sentence Embedding
TF-Hub
On the Sentence Embeddings from Pre-trained Language Models
TF	530	about 5 years ago
awesome-sentence-embedding / Evaluation
decaNLP	2,345	over 2 years ago	:
SentEval	2,086	over 2 years ago	:
GLUE	779	almost 5 years ago	:
Exploring Semantic Properties of Sentence Embeddings
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
Word Embeddings Benchmarks	437	over 5 years ago	:
MLDoc	152	about 4 years ago	:
LexNET	77,258	over 1 year ago	:
wordvectors.net	120	over 5 years ago	:
jiant	1,650	about 3 years ago	:
jiant	1,650	about 3 years ago	:
Evaluation of sentence embeddings in downstream and linguistic probing tasks
QVEC	75	over 8 years ago	:
Grammatical Analysis of Pretrained Sentence Encoders with Acceptability Judgments
EQUATE : A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference
Evaluating Word Embedding Models: Methods andExperimental Results
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions
Linguistic Knowledge and Transferability of Contextual Representations			:
LINSPECTOR	24	over 6 years ago	:
Pitfalls in the Evaluation of Sentence Embeddings
Probing Multilingual Sentence Representations With X-Probe			:
awesome-sentence-embedding / Misc
Word Embedding Dimensionality Selection	329	about 6 years ago	:
Half-Size	129	over 5 years ago	:
magnitude	1,635	almost 3 years ago	:
To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks
Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors			:
The Pupil Has Become the Master: Teacher-Student Model-BasedWord Embedding Distillation with Ensemble Learning			:
Improving Distributional Similarity with Lessons Learned from Word Embeddings			:
Misspelling Oblivious Word Embeddings			:
Single Training Dimension Selection for Word Embedding with PCA
Compressing Word Embeddings via Deep Compositional Code Learning			:
UER: An Open-Source Toolkit for Pre-training Models			:
Situating Sentence Embedders with Nearest Neighbor Overlap
German BERT
awesome-sentence-embedding / Vector Mapping
Cross-lingual Word Vectors Projection Using CCA	56	almost 8 years ago	:
vecmap	648	about 3 years ago	:
MUSE	3,193	almost 4 years ago	:
CrossLingualELMo	99	over 6 years ago	:
awesome-sentence-embedding / Articles
Comparing Sentence Similarity Methods
The Current Best of Universal Word Embeddings and Sentence Embeddings
On sentence representations, pt. 1: what can you fit into a single #$!%@*&% blog post?
Deep-learning-free Text and Sentence Embedding, Part 1
Deep-learning-free Text and Sentence Embedding, Part 2
An Overview of Sentence Embedding Methods
Word embeddings in 2017: Trends and future directions
A Walkthrough of InferSent – Supervised Learning of Sentence Embeddings
A survey of cross-lingual word embedding models
Introducing state of the art text classification with universal language models
Document Embedding Techniques

Backlinks from these awesome lists:

0ex/more-awesome

awesome-sentence-embedding

awesome-sentence-embedding / Word Embeddings

awesome-sentence-embedding / OOV Handling

awesome-sentence-embedding / Contextualized Word Embeddings

awesome-sentence-embedding / Pooling Methods

awesome-sentence-embedding / Encoders

awesome-sentence-embedding / Evaluation

awesome-sentence-embedding / Misc

awesome-sentence-embedding / Vector Mapping

awesome-sentence-embedding / Articles

Backlinks from these awesome lists: