transfusion-asr

Speech Transcription Tool

An ASR project that uses diffusion models to transcribe speech

Transcribing Speech with Multinomial Diffusion, training code and models.

GitHub

75 stars
8 watching
5 forks
Language: Python
last commit: about 1 year ago
Linked from 1 awesome list

asrbinomial-distributiondiffusiondiscrete-diffusionpytorchspeech-recognition

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
shashikg/whispers2t An optimized speech-to-text pipeline designed to improve inference speed and accuracy 310
bnosac/audio.whisper Provides an R interface to the Whisper Automatic Speech Recognition model 118
linto-ai/whisper-timestamped An extension of the Whisper model to predict word timestamps and confidence scores with improved accuracy 2,045
collabora/whisperlive An implementation of Whisper's speech-to-text functionality in a real-time transcription application 2,050
arthurfdlr/whisper-youtube Transcribes Youtube videos using OpenAI's Whisper speech recognition model 362
matlab-deep-learning/deepspeech Enables speech-to-text transcription using a pre-trained Deep Speech model in MATLAB. 7
ytsvetko/str2ipa A tool for phonetic transcription of languages with close-to-phonetic writing systems 10
neso613/asr_tflite Provides pre-trained ASR models for efficient inference using TFLite 11
dodohow1011/speechadvreprogram Developing low-resource speech command recognition systems using adversarial reprogramming and transfer learning 18
birch-san/diffusers A toolkit for creating and manipulating state-of-the-art diffusion models in PyTorch 8
langtech/transcriber An online transcription tool for a specific application, allowing users to input audio or video and receive a written text summary 2
srijith-rkr/kaust-whisper-adapter A tool for fine-tuning the OpenAI Whisper speech recognition model using residual adapters and parameter-efficient learning methods. 32
awni/speech A PyTorch implementation of end-to-end speech recognition models. 754
r3gm/sonitranslate Software that allows video translation with synchronized audio, utilizing speech-to-text and text-to-speech technologies. 869
mybigday/whisper.rn A React Native binding of Whisper's automatic speech recognition model 395