mgiza

Word aligner

A C++ implementation of a word alignment tool with multi-threading and incremental training capabilities for machine translation.

A word alignment tool based on famous GIZA++, extended to support multi-threading, resume training and incremental training.

GitHub

161 stars

77 watching

60 forks

Language: C++

last commit: about 5 years ago

Related projects:

Repository	Description	Stars
moses-smt/giza-pp	A toolkit for training statistical machine translation models and word alignment.	264
moses-smt/mosesdecoder	A software toolkit for machine translation	1,585
moses-smt/nplm	A toolkit for training neural network language models	14
clab/fast_align	A fast and simple unsupervised word aligner for generating parallel corpus alignments.	740
moses-smt/salm	A toolkit for creating and manipulating suffix arrays in empirical language processing	11
braunefe/gargantua	Software tool to align sentences across multiple languages using unsupervised alignment methods	12
machinalis/yalign	Automates the process of extracting parallel sentences from comparable corpora to aid in statistical machine translation	127
ldmt-muri/alignment-with-openfst	An implementation of the CRF autoencoder framework for tasks in natural language processing and machine translation	21
richardlitt/ldc-word-aligner	A Python-based tool for annotating manual word alignments in parallel texts.	2
jcgood/rosetta-pangloss	A Python library that uses machine learning and natural language processing to improve translation accuracy by aligning source and target languages	0
martinsos/edlib	A lightweight library for calculating sequence alignment using edit distance	517
egtwobits/mesh_mesh_align_plus	An add-on for Blender that allows precise alignment and transformation of 3D mesh objects	585
trigeorgis/mdm	A TensorFlow implementation of a recurrent face alignment process	124
cmesher/inuktitutalignerdata	Scripts for aligning laboratory speech production data in Inuktitut	3
montrealcorpustools/montreal-forced-aligner	A command-line utility for aligning audio data with written text based on pronunciation rules.	1,364