para-nmt-50m

Sentence embedding training toolkit

A collection of pre-trained models and code for training paraphrastic sentence embeddings from large machine translation datasets.

Pre-trained models and code and data to train and use models from "Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations"

GitHub

102 stars
5 watching
21 forks
Language: Python
last commit: 12 months ago

Related projects:

Repository Description Stars
jwieting/acl2017 A codebase for training and using models of sentence embeddings. 33
jwieting/paragram-word Trains word embeddings from a paraphrase database to represent semantic relationships between words. 30
jwieting/iclr2016 Code for training universal paraphrastic sentence embeddings and models on semantic similarity tasks 193
nlprinceton/text_embedding A utility class for generating and evaluating document representations using word embeddings. 54
jwieting/charagram A tool for training and using character n-gram based word and sentence embeddings in natural language processing. 125
johngiorgi/declutr A tool for training and evaluating sentence embeddings using deep contrastive learning 379
xiaoqijiao/coling2018 Provides training and testing code for a CNN-based sentence embedding model 2
binwang28/sbert-wk-sentence-embedding A method to generate sentence embeddings from pre-trained language models 177
antoine77340/howto100m Provides code and tools for learning joint text-video embeddings using the HowTo100M dataset 250
neulab/word-embeddings-for-nmt An open source project that provides pre-trained word embeddings and a dataset for evaluating their usefulness in neural machine translation. 121
davidnemeskey/embert Provides pre-trained transformer-based models and tools for natural language processing tasks 2
zhanghang1989/pytorch-encoding A Python framework for building deep learning models with optimized encoding layers and batch normalization. 2,041
microsoft/mpnet Develops a method for pre-training language understanding models by combining masked and permuted techniques, and provides code for implementation and fine-tuning. 288
zhuiyitechnology/pretrained-models A collection of pre-trained language models for natural language processing tasks 987
huggingface/setfit A framework for efficient few-shot learning with Sentence Transformers 2,236