word-embeddings-for-nmt

NMT Dataset

An open source project that provides pre-trained word embeddings and a dataset for evaluating their usefulness in neural machine translation.

Supplementary material for "When and Why Are Pre-trained Word Embeddings Useful for Neural Machine Translation?" at NAACL 2018

GitHub

121 stars
9 watching
19 forks
Language: Python
last commit: over 4 years ago

Related projects:

Repository Description Stars
jwieting/para-nmt-50m A collection of pre-trained models and code for training paraphrastic sentence embeddings from large machine translation datasets. 102
nlprinceton/text_embedding A utility class for generating and evaluating document representations using word embeddings. 54
lmthang/nmt.matlab Provides code to train state-of-the-art Neural Machine Translation systems using Matlab 105
jonsafari/nmt-list A comprehensive catalog of various neural machine translation implementations using different deep learning frameworks. 359
novakat/nytk-nerkor-cars-ontonotespp A large annotated dataset of Hungarian text with over 30 entity types derived from various sources and formats. 1
davidnemeskey/embert Provides pre-trained transformer-based models and tools for natural language processing tasks 2
neulab/compare-mt A tool for comparing the performance of different language generation systems. 467
namisan/mt-dnn A PyTorch package implementing multi-task deep neural networks for natural language understanding 2,238
harvardnlp/seq2seq-attn An implementation of a sequence-to-sequence model with attention mechanism using LSTMs and character embeddings for neural machine translation 1,260
embeddings-benchmark/mteb A benchmarking suite for evaluating text embedding models across various NLP tasks and datasets. 1,952
microsoft/neuronblocks A toolkit for building and deploying neural network models for natural language processing tasks. 1,448
elbayadm/attn2d A PyTorch implementation of 2D convolutional neural networks for sequence-to-sequence prediction in machine translation 501
moses-smt/nplm A toolkit for training neural network language models 14
karthikncode/nlp-datasets A curated list of Natural Language Processing datasets used to train and evaluate NLP models. 919
blackrockneurotech/npmk A MATLAB-based toolkit for loading and processing data from Blackrock Microsystems' neuroscientific files. 45