ldc-word-aligner
Word aligner
A tool for annotating manual word alignments in parallel texts
The LDC Word Aligner is a Python-based tool used for annotating manual word alignments (or gold standard alignments). Sentence-segmented parallel texts are required as input.
2 stars
3 watching
0 forks
Language: Python
last commit: over 6 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
clab/fast_align | A fast and simple unsupervised word aligner for generating parallel corpus alignments. | 738 |
ldmt-muri/alignment-with-openfst | An implementation of a CRF autoencoder framework for aligning text data | 21 |
lowerquality/gentle | A tool for aligning speech with text by analyzing audio and providing an output transcript | 1,453 |
josefnpat/reflowprint | A library that enables character-by-character text alignment in real-time | 46 |
machinalis/yalign | Automates the process of extracting parallel sentences from comparable corpora to aid in statistical machine translation | 127 |
guitarbum722/align | An application and library for aligning text with flexible formatting options. | 84 |
nigel2392/wagtail_text_alignment | Enhances text alignment in Wagtail richtext editors with support for block entities. | 4 |
thudm/longalign | A framework for training and evaluating large language models on long context inputs | 217 |
moses-smt/mgiza | A C++ implementation of a word alignment tool with multi-threading and incremental training capabilities for machine translation. | 161 |
richardlitt/lrl | Developing tools and scripts to extract data from low-resource languages, focusing on language processing and machine learning applications. | 2 |
lowresourcelanguages/champollion | A toolkit providing ready-to-use parallel text sentence alignment tools for multiple language pairs. | 18 |
martinsos/edlib | A lightweight library for calculating sequence alignment using edit distance | 512 |
artificiai/multilingual-latent-dirichlet-allocation-lda | An LDA-based text clustering pipeline for multiple languages | 82 |
raphael-group/paste2 | A software framework for aligning and reconstructing spatial transcriptomics data from non-overlapping samples | 29 |
gao-lab/slat | A software package for aligning single-cell spatial omics data using deep learning and graph neural networks | 80 |