yalign

Sentence aligner

Automates the process of extracting parallel sentences from comparable corpora to aid in statistical machine translation

A sentence aligner for comparable corpora

GitHub

127 stars
16 watching
31 forks
Language: Python
last commit: over 8 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
lowresourcelanguages/champollion A toolkit providing ready-to-use parallel text sentence alignment tools for multiple language pairs. 18
braunefe/gargantua Software tool for aligning sentences between multiple languages 12
jcgood/rosetta-pangloss A Python library that uses machine learning and natural language processing to improve translation accuracy by aligning source and target languages 0
lowerquality/gentle A tool for aligning speech with text by analyzing audio and providing an output transcript 1,453
lewisgaul/zig-nestedtext A simple human-readable data format parser library written in Zig. 13
prosodylab/prosodylab.alignertools A package of scripts to prepare data for use in Prosodylab-Aligner by cleaning and relabeling transcriptions and generating orthography-based dictionaries. 12
montrealcorpustools/montreal-forced-aligner A command-line utility for aligning audio data with written text based on pronunciation rules. 1,343
tanloong/interlaced.nvim Aligns bilingual parallel texts by repositioning lines. 6
prosodylab/prosodylab-aligner A Python tool for aligning audio data from laboratory speech production experiments 331
talschuster/crosslingualcontextualemb Enables alignment of word embeddings across multiple languages to facilitate cross-lingual text analysis and machine learning tasks 98
thom1729/yaml-macros A macro system for YAML files powered by Python 21
zhoux85/staligner Tool for aligning and integrating spatially resolved transcriptomics data using machine learning algorithms 29
kaaaaaaaaaaai/paragraph-with-alignment Provides a paragraph tool with alignment options for the Editor.js text editor framework. 44
cmesher/inuktitutalignerdata Tools for aligning laboratory speech production data 3
vchahun/gv-crawl Automates text extraction and alignment from Global Voices articles to create parallel corpora for low-resource languages. 9