wikipron

pronunciation scraper

A tool for extracting and processing multilingual pronunciation data from Wiktionary.

Massively multilingual pronunciation mining

GitHub

321 stars
18 watching
71 forks
Language: Python
last commit: 2 months ago
Linked from 1 awesome list

computational-linguisticsg2planguagelinguisticsnlpphoneticsphonologypronunciationpython-apiscraped-dataspeech

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
fielddb/lex4all Tool for automating pronunciation lexicon creation for low-resource languages using speech recognition and machine learning algorithms. 1
lex4all/lex4all Software tool to generate pronunciation lexicons for low-resource languages using speech recognition and machine learning algorithms. 21
phonologicalcorpustools/corpustools A collection of tools and libraries for analyzing and processing phonological data in various languages 113
macr0dev/audiobooks.bundle A metadata agent that scrapes audiobook metadata from Audible.com and integrates it with Plex media servers. 605
ytsvetko/str2ipa A tool for phonetic transcription of languages with close-to-phonetic writing systems 10
analyzeplatypus/translitkit A Ruby framework for converting Hebrew text to English using phoneme maps 7
ukplab/linspector A framework to interpret multilingual NLP models and understand their word representations. 23
ibm/max-chinese-phonetic-similarity-estimator Estimates phonetic similarity between Chinese words and suggests similar-sounding candidates 35
huspacy/huspacy An industrial-strength natural language processing library for Hungarian language text analysis 155
vchahun/gv-crawl Automates text extraction and alignment from Global Voices articles to create parallel corpora for low-resource languages. 9
bgutter/cl-phonetic Provides phonetic pattern matching functionality in Common Lisp to aid with natural language processing and text analysis. 24
khrystyna-skopyk/ukr_spell_check Spelling correction system for the Ukrainian language using noisy channel model 3
prosodylab/prosodylab.alignertools A package of scripts to prepare data for use in Prosodylab-Aligner by cleaning and relabeling transcriptions and generating orthography-based dictionaries. 12
eyurtsev/kor Extracts structured data from unstructured text using large language models 1,629
synyi/poplar A web-based annotation tool for natural language processing (NLP) 519