yalign

Sentence aligner

Automates the process of extracting parallel sentences from comparable corpora to aid in statistical machine translation

A sentence aligner for comparable corpora

GitHub

127 stars
16 watching
31 forks
Language: Python
last commit: over 8 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
lowresourcelanguages/champollion A toolkit providing ready-to-use parallel text sentence alignment tools for multiple language pairs. 18
braunefe/gargantua Software tool to align sentences across multiple languages using unsupervised alignment methods 12
jcgood/rosetta-pangloss A Python library that uses machine learning and natural language processing to improve translation accuracy by aligning source and target languages 0
lowerquality/gentle A tool for aligning speech with text by analyzing audio and providing an output transcript 1,471
lewisgaul/zig-nestedtext A parser library for a human-readable data format based on YAML 13
prosodylab/prosodylab.alignertools A package of scripts to prepare data for alignment in speech processing 12
montrealcorpustools/montreal-forced-aligner A command-line utility for aligning audio data with written text based on pronunciation rules. 1,364
tanloong/interlaced.nvim A plugin for aligning bilingual parallel texts by re-positioning text and applying highlighting. 7
prosodylab/prosodylab-aligner Tools for aligning laboratory speech production data to forced audio alignment using HTK and SoX. 333
talschuster/crosslingualcontextualemb Enables alignment of word embeddings across multiple languages to facilitate cross-lingual text analysis and machine learning tasks 99
thom1729/yaml-macros A macro system for YAML files powered by Python 21
zhoux85/staligner Tool for aligning and integrating spatially resolved transcriptomics data using machine learning algorithms 29
kaaaaaaaaaaai/paragraph-with-alignment Provides a paragraph tool with alignment options for the Editor.js text editor framework. 44
cmesher/inuktitutalignerdata Scripts for aligning laboratory speech production data in Inuktitut 3
vchahun/gv-crawl Automates text extraction and alignment from Global Voices articles to create parallel corpora for low-resource languages. 9