mgiza
Word aligner
A C++ implementation of a word alignment tool with multi-threading and incremental training capabilities for machine translation.
A word alignment tool based on famous GIZA++, extended to support multi-threading, resume training and incremental training.
161 stars
77 watching
60 forks
Language: C++
last commit: over 3 years ago Related projects:
Repository | Description | Stars |
---|---|---|
moses-smt/giza-pp | A toolkit for training statistical machine translation models and word alignment. | 264 |
moses-smt/mosesdecoder | A software toolkit for machine translation | 1,584 |
moses-smt/nplm | A toolkit for training neural network language models | 14 |
clab/fast_align | A fast and simple unsupervised word aligner for generating parallel corpus alignments. | 738 |
moses-smt/salm | A tool kit for working with suffix arrays and their applications in empirical language processing. | 11 |
braunefe/gargantua | Software tool for aligning sentences between multiple languages | 12 |
machinalis/yalign | Automates the process of extracting parallel sentences from comparable corpora to aid in statistical machine translation | 127 |
ldmt-muri/alignment-with-openfst | An implementation of a CRF autoencoder framework for aligning text data | 21 |
richardlitt/ldc-word-aligner | A tool for annotating manual word alignments in parallel texts | 2 |
jcgood/rosetta-pangloss | A Python library that uses machine learning and natural language processing to improve translation accuracy by aligning source and target languages | 0 |
martinsos/edlib | A lightweight library for calculating sequence alignment using edit distance | 512 |
egtwobits/mesh_mesh_align_plus | An add-on for Blender that allows precise alignment and transformation of 3D mesh objects | 582 |
trigeorgis/mdm | A TensorFlow implementation of a recurrent face alignment process | 124 |
cmesher/inuktitutalignerdata | Tools for aligning laboratory speech production data | 3 |
montrealcorpustools/montreal-forced-aligner | A command-line utility for aligning audio data with written text based on pronunciation rules. | 1,343 |