ldc-word-aligner

Word aligner

A tool for annotating manual word alignments in parallel texts

The LDC Word Aligner is a Python-based tool used for annotating manual word alignments (or gold standard alignments). Sentence-segmented parallel texts are required as input.

GitHub

2 stars
3 watching
0 forks
Language: Python
last commit: over 6 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
clab/fast_align A fast and simple unsupervised word aligner for generating parallel corpus alignments. 738
ldmt-muri/alignment-with-openfst An implementation of a CRF autoencoder framework for aligning text data 21
lowerquality/gentle A tool for aligning speech with text by analyzing audio and providing an output transcript 1,453
josefnpat/reflowprint A library that enables character-by-character text alignment in real-time 46
machinalis/yalign Automates the process of extracting parallel sentences from comparable corpora to aid in statistical machine translation 127
guitarbum722/align An application and library for aligning text with flexible formatting options. 84
nigel2392/wagtail_text_alignment Enhances text alignment in Wagtail richtext editors with support for block entities. 4
thudm/longalign A framework for training and evaluating large language models on long context inputs 217
moses-smt/mgiza A C++ implementation of a word alignment tool with multi-threading and incremental training capabilities for machine translation. 161
richardlitt/lrl Developing tools and scripts to extract data from low-resource languages, focusing on language processing and machine learning applications. 2
lowresourcelanguages/champollion A toolkit providing ready-to-use parallel text sentence alignment tools for multiple language pairs. 18
martinsos/edlib A lightweight library for calculating sequence alignment using edit distance 512
artificiai/multilingual-latent-dirichlet-allocation-lda An LDA-based text clustering pipeline for multiple languages 82
raphael-group/paste2 A software framework for aligning and reconstructing spatial transcriptomics data from non-overlapping samples 29
gao-lab/slat A software package for aligning single-cell spatial omics data using deep learning and graph neural networks 80