yalign
Sentence aligner
Automates the process of extracting parallel sentences from comparable corpora to aid in statistical machine translation
A sentence aligner for comparable corpora
127 stars
16 watching
31 forks
Language: Python
last commit: over 8 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
lowresourcelanguages/champollion | A toolkit providing ready-to-use parallel text sentence alignment tools for multiple language pairs. | 18 |
braunefe/gargantua | Software tool to align sentences across multiple languages using unsupervised alignment methods | 12 |
jcgood/rosetta-pangloss | A Python library that uses machine learning and natural language processing to improve translation accuracy by aligning source and target languages | 0 |
lowerquality/gentle | A tool for aligning speech with text by analyzing audio and providing an output transcript | 1,471 |
lewisgaul/zig-nestedtext | A parser library for a human-readable data format based on YAML | 13 |
prosodylab/prosodylab.alignertools | A package of scripts to prepare data for alignment in speech processing | 12 |
montrealcorpustools/montreal-forced-aligner | A command-line utility for aligning audio data with written text based on pronunciation rules. | 1,364 |
tanloong/interlaced.nvim | A plugin for aligning bilingual parallel texts by re-positioning text and applying highlighting. | 7 |
prosodylab/prosodylab-aligner | Tools for aligning laboratory speech production data to forced audio alignment using HTK and SoX. | 333 |
talschuster/crosslingualcontextualemb | Enables alignment of word embeddings across multiple languages to facilitate cross-lingual text analysis and machine learning tasks | 99 |
thom1729/yaml-macros | A macro system for YAML files powered by Python | 21 |
zhoux85/staligner | Tool for aligning and integrating spatially resolved transcriptomics data using machine learning algorithms | 29 |
kaaaaaaaaaaai/paragraph-with-alignment | Provides a paragraph tool with alignment options for the Editor.js text editor framework. | 44 |
cmesher/inuktitutalignerdata | Scripts for aligning laboratory speech production data in Inuktitut | 3 |
vchahun/gv-crawl | Automates text extraction and alignment from Global Voices articles to create parallel corpora for low-resource languages. | 9 |