transfusion-asr

Speech Transcription Tool

An ASR project that uses diffusion models to transcribe speech

Transcribing Speech with Multinomial Diffusion, training code and models.

GitHub

76 stars

8 watching

5 forks

Language: Python

last commit: almost 3 years ago

Linked from 1 awesome list

asrbinomial-distributiondiffusiondiscrete-diffusionpytorchspeech-recognition

Backlinks from these awesome lists:

amrzv/awesome-colab-notebooks

Related projects:

Repository	Description	Stars
shashikg/whispers2t	An optimized speech-to-text pipeline designed to improve inference speed and accuracy	330
bnosac/audio.whisper	Provides an R interface to the Whisper Automatic Speech Recognition model	119
linto-ai/whisper-timestamped	An extension to the Whisper speech recognition model that adds word-level timestamps and confidence scores.	2,121
collabora/whisperlive	An implementation of Whisper's speech-to-text functionality in a real-time transcription application	2,186
arthurfdlr/whisper-youtube	Transcribes Youtube videos using OpenAI's Whisper speech recognition model	369
matlab-deep-learning/deepspeech	Enables speech-to-text transcription using a pre-trained Deep Speech model in MATLAB.	7
ytsvetko/str2ipa	A tool for phonetic transcription of languages with close-to-phonetic writing systems	10
neso613/asr_tflite	Provides pre-trained ASR models for efficient inference using TFLite	11
dodohow1011/speechadvreprogram	Developing low-resource speech command recognition systems using adversarial reprogramming and transfer learning	18
birch-san/diffusers	A toolkit for creating and manipulating state-of-the-art diffusion models in PyTorch	8
langtech/transcriber	An online transcription tool for a specific application, allowing users to input audio or video and receive a written text summary	2
srijith-rkr/kaust-whisper-adapter	A tool for fine-tuning the OpenAI Whisper speech recognition model using residual adapters and parameter-efficient learning methods.	32
awni/speech	A PyTorch implementation of end-to-end speech recognition models.	756
r3gm/sonitranslate	Software that allows video translation with synchronized audio, utilizing speech-to-text and text-to-speech technologies.	924
mybigday/whisper.rn	A React Native binding of Whisper's automatic speech recognition model	408