word-embeddings-for-nmt
NMT Dataset
An open source project that provides pre-trained word embeddings and a dataset for evaluating their usefulness in neural machine translation.
Supplementary material for "When and Why Are Pre-trained Word Embeddings Useful for Neural Machine Translation?" at NAACL 2018
121 stars
9 watching
19 forks
Language: Python
last commit: over 4 years ago Related projects:
Repository | Description | Stars |
---|---|---|
jwieting/para-nmt-50m | A collection of pre-trained models and code for training paraphrastic sentence embeddings from large machine translation datasets. | 102 |
nlprinceton/text_embedding | A utility class for generating and evaluating document representations using word embeddings. | 54 |
lmthang/nmt.matlab | Provides code to train state-of-the-art Neural Machine Translation systems using Matlab | 105 |
jonsafari/nmt-list | A comprehensive catalog of various neural machine translation implementations using different deep learning frameworks. | 359 |
novakat/nytk-nerkor-cars-ontonotespp | A large annotated dataset of Hungarian text with over 30 entity types derived from various sources and formats. | 1 |
davidnemeskey/embert | Provides pre-trained transformer-based models and tools for natural language processing tasks | 2 |
neulab/compare-mt | A tool for comparing the performance of different language generation systems. | 467 |
namisan/mt-dnn | A PyTorch package implementing multi-task deep neural networks for natural language understanding | 2,238 |
harvardnlp/seq2seq-attn | An implementation of a sequence-to-sequence model with attention mechanism using LSTMs and character embeddings for neural machine translation | 1,260 |
embeddings-benchmark/mteb | A benchmarking suite for evaluating text embedding models across various NLP tasks and datasets. | 1,952 |
microsoft/neuronblocks | A toolkit for building and deploying neural network models for natural language processing tasks. | 1,448 |
elbayadm/attn2d | A PyTorch implementation of 2D convolutional neural networks for sequence-to-sequence prediction in machine translation | 501 |
moses-smt/nplm | A toolkit for training neural network language models | 14 |
karthikncode/nlp-datasets | A curated list of Natural Language Processing datasets used to train and evaluate NLP models. | 919 |
blackrockneurotech/npmk | A MATLAB-based toolkit for loading and processing data from Blackrock Microsystems' neuroscientific files. | 45 |