yalign
Sentence aligner
Automates the process of extracting parallel sentences from comparable corpora to aid in statistical machine translation
A sentence aligner for comparable corpora
127 stars
16 watching
31 forks
Language: Python
last commit: over 8 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
lowresourcelanguages/champollion | A toolkit providing ready-to-use parallel text sentence alignment tools for multiple language pairs. | 18 |
braunefe/gargantua | Software tool for aligning sentences between multiple languages | 12 |
jcgood/rosetta-pangloss | A Python library that uses machine learning and natural language processing to improve translation accuracy by aligning source and target languages | 0 |
lowerquality/gentle | A tool for aligning speech with text by analyzing audio and providing an output transcript | 1,453 |
lewisgaul/zig-nestedtext | A simple human-readable data format parser library written in Zig. | 13 |
prosodylab/prosodylab.alignertools | A package of scripts to prepare data for use in Prosodylab-Aligner by cleaning and relabeling transcriptions and generating orthography-based dictionaries. | 12 |
montrealcorpustools/montreal-forced-aligner | A command-line utility for aligning audio data with written text based on pronunciation rules. | 1,343 |
tanloong/interlaced.nvim | Aligns bilingual parallel texts by repositioning lines. | 6 |
prosodylab/prosodylab-aligner | A Python tool for aligning audio data from laboratory speech production experiments | 331 |
talschuster/crosslingualcontextualemb | Enables alignment of word embeddings across multiple languages to facilitate cross-lingual text analysis and machine learning tasks | 98 |
thom1729/yaml-macros | A macro system for YAML files powered by Python | 21 |
zhoux85/staligner | Tool for aligning and integrating spatially resolved transcriptomics data using machine learning algorithms | 29 |
kaaaaaaaaaaai/paragraph-with-alignment | Provides a paragraph tool with alignment options for the Editor.js text editor framework. | 44 |
cmesher/inuktitutalignerdata | Tools for aligning laboratory speech production data | 3 |
vchahun/gv-crawl | Automates text extraction and alignment from Global Voices articles to create parallel corpora for low-resource languages. | 9 |