mgiza

Word aligner

A C++ implementation of a word alignment tool with multi-threading and incremental training capabilities for machine translation.

A word alignment tool based on famous GIZA++, extended to support multi-threading, resume training and incremental training.

GitHub

161 stars
77 watching
60 forks
Language: C++
last commit: over 3 years ago

Related projects:

Repository Description Stars
moses-smt/giza-pp A toolkit for training statistical machine translation models and word alignment. 264
moses-smt/mosesdecoder A software toolkit for machine translation 1,585
moses-smt/nplm A toolkit for training neural network language models 14
clab/fast_align A fast and simple unsupervised word aligner for generating parallel corpus alignments. 740
moses-smt/salm A toolkit for creating and manipulating suffix arrays in empirical language processing 11
braunefe/gargantua Software tool to align sentences across multiple languages using unsupervised alignment methods 12
machinalis/yalign Automates the process of extracting parallel sentences from comparable corpora to aid in statistical machine translation 127
ldmt-muri/alignment-with-openfst An implementation of the CRF autoencoder framework for tasks in natural language processing and machine translation 21
richardlitt/ldc-word-aligner A Python-based tool for annotating manual word alignments in parallel texts. 2
jcgood/rosetta-pangloss A Python library that uses machine learning and natural language processing to improve translation accuracy by aligning source and target languages 0
martinsos/edlib A lightweight library for calculating sequence alignment using edit distance 517
egtwobits/mesh_mesh_align_plus An add-on for Blender that allows precise alignment and transformation of 3D mesh objects 585
trigeorgis/mdm A TensorFlow implementation of a recurrent face alignment process 124
cmesher/inuktitutalignerdata Scripts for aligning laboratory speech production data in Inuktitut 3
montrealcorpustools/montreal-forced-aligner A command-line utility for aligning audio data with written text based on pronunciation rules. 1,364