transfusion-asr
Speech Transcription Tool
An ASR project that uses diffusion models to transcribe speech
Transcribing Speech with Multinomial Diffusion, training code and models.
75 stars
8 watching
5 forks
Language: Python
last commit: about 1 year ago
Linked from 1 awesome list
asrbinomial-distributiondiffusiondiscrete-diffusionpytorchspeech-recognition
Related projects:
Repository | Description | Stars |
---|---|---|
shashikg/whispers2t | An optimized speech-to-text pipeline designed to improve inference speed and accuracy | 310 |
bnosac/audio.whisper | Provides an R interface to the Whisper Automatic Speech Recognition model | 118 |
linto-ai/whisper-timestamped | An extension of the Whisper model to predict word timestamps and confidence scores with improved accuracy | 2,045 |
collabora/whisperlive | An implementation of Whisper's speech-to-text functionality in a real-time transcription application | 2,050 |
arthurfdlr/whisper-youtube | Transcribes Youtube videos using OpenAI's Whisper speech recognition model | 362 |
matlab-deep-learning/deepspeech | Enables speech-to-text transcription using a pre-trained Deep Speech model in MATLAB. | 7 |
ytsvetko/str2ipa | A tool for phonetic transcription of languages with close-to-phonetic writing systems | 10 |
neso613/asr_tflite | Provides pre-trained ASR models for efficient inference using TFLite | 11 |
dodohow1011/speechadvreprogram | Developing low-resource speech command recognition systems using adversarial reprogramming and transfer learning | 18 |
birch-san/diffusers | A toolkit for creating and manipulating state-of-the-art diffusion models in PyTorch | 8 |
langtech/transcriber | An online transcription tool for a specific application, allowing users to input audio or video and receive a written text summary | 2 |
srijith-rkr/kaust-whisper-adapter | A tool for fine-tuning the OpenAI Whisper speech recognition model using residual adapters and parameter-efficient learning methods. | 32 |
awni/speech | A PyTorch implementation of end-to-end speech recognition models. | 754 |
r3gm/sonitranslate | Software that allows video translation with synchronized audio, utilizing speech-to-text and text-to-speech technologies. | 869 |
mybigday/whisper.rn | A React Native binding of Whisper's automatic speech recognition model | 395 |