transfusion-asr

Speech Transcription Tool

An ASR project that uses diffusion models to transcribe speech

Transcribing Speech with Multinomial Diffusion, training code and models.

GitHub

76 stars
8 watching
5 forks
Language: Python
last commit: over 1 year ago
Linked from 1 awesome list

asrbinomial-distributiondiffusiondiscrete-diffusionpytorchspeech-recognition

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
shashikg/whispers2t An optimized speech-to-text pipeline designed to improve inference speed and accuracy 330
bnosac/audio.whisper Provides an R interface to the Whisper Automatic Speech Recognition model 119
linto-ai/whisper-timestamped An extension to the Whisper speech recognition model that adds word-level timestamps and confidence scores. 2,121
collabora/whisperlive An implementation of Whisper's speech-to-text functionality in a real-time transcription application 2,186
arthurfdlr/whisper-youtube Transcribes Youtube videos using OpenAI's Whisper speech recognition model 369
matlab-deep-learning/deepspeech Enables speech-to-text transcription using a pre-trained Deep Speech model in MATLAB. 7
ytsvetko/str2ipa A tool for phonetic transcription of languages with close-to-phonetic writing systems 10
neso613/asr_tflite Provides pre-trained ASR models for efficient inference using TFLite 11
dodohow1011/speechadvreprogram Developing low-resource speech command recognition systems using adversarial reprogramming and transfer learning 18
birch-san/diffusers A toolkit for creating and manipulating state-of-the-art diffusion models in PyTorch 8
langtech/transcriber An online transcription tool for a specific application, allowing users to input audio or video and receive a written text summary 2
srijith-rkr/kaust-whisper-adapter A tool for fine-tuning the OpenAI Whisper speech recognition model using residual adapters and parameter-efficient learning methods. 32
awni/speech A PyTorch implementation of end-to-end speech recognition models. 756
r3gm/sonitranslate Software that allows video translation with synchronized audio, utilizing speech-to-text and text-to-speech technologies. 924
mybigday/whisper.rn A React Native binding of Whisper's automatic speech recognition model 408