wikipron

pronunciation scraper

A tool for extracting and processing multilingual pronunciation data from Wiktionary.

Massively multilingual pronunciation mining

GitHub

323 stars
18 watching
71 forks
Language: Python
last commit: about 2 months ago
Linked from 1 awesome list

computational-linguisticsg2planguagelinguisticsnlpphoneticsphonologypronunciationpython-apiscraped-dataspeech

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
fielddb/lex4all Tool for automating pronunciation lexicon creation for low-resource languages using speech recognition and machine learning algorithms. 1
lex4all/lex4all Software tool to generate pronunciation lexicons for low-resource languages using speech recognition and machine learning algorithms. 21
phonologicalcorpustools/corpustools A collection of tools and libraries for analyzing and processing phonological data in various languages 115
macr0dev/audiobooks.bundle A metadata agent that scrapes audiobook metadata from Audible.com and integrates it with Plex media servers. 607
ytsvetko/str2ipa A tool for phonetic transcription of languages with close-to-phonetic writing systems 10
analyzeplatypus/translitkit A Ruby framework for converting Hebrew text to English using phoneme maps 7
ukplab/linspector A framework to interpret multilingual NLP models and understand their word representations. 24
ibm/max-chinese-phonetic-similarity-estimator Estimates phonetic similarity between Chinese words and suggests similar-sounding candidates 35
huspacy/huspacy An industrial-strength natural language processing library for Hungarian language text analysis 158
vchahun/gv-crawl Automates text extraction and alignment from Global Voices articles to create parallel corpora for low-resource languages. 9
bgutter/cl-phonetic Provides phonetic pattern matching functionality in Common Lisp to aid with natural language processing and text analysis. 24
khrystyna-skopyk/ukr_spell_check Spelling correction system for the Ukrainian language using noisy channel model 3
prosodylab/prosodylab.alignertools A package of scripts to prepare data for alignment in speech processing 12
eyurtsev/kor An open-source wrapper around LLMs to extract structured data from text 1,638
synyi/poplar A web-based annotation tool for natural language processing (NLP) 520