distil-whisper
Speech recognition model
A machine learning model that uses audio input to generate text transcriptions at high speeds and with good accuracy.
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
4k stars
65 watching
304 forks
Language: Python
last commit: about 1 year ago audiospeech-recognitionwhisper
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | A general-purpose speech recognition system trained on large-scale weak supervision | 72,752 |
| | An automatic speech recognition system with word-level timestamps and speaker diarization. | 12,894 |
| | A high-performance inference implementation of an automatic speech recognition model in C++ | 36,332 |
| | A fast speech-to-text implementation using CTranslate2 and optimized for inference on CPU and GPU. | 12,989 |
| | An optimized implementation of OpenAI's Whisper Model for speech recognition and speech-to-text tasks using JAX. | 4,467 |
| | Automates speaker diarization from audio recordings using OpenAI Whisper ASR and additional neural networks. | 3,874 |
| | An implementation of OpenAI's Whisper ASR model using DirectCompute for GPGPU inference | 8,617 |
| | A command-line tool for fast audio transcription using the Whisper AI model | 7,848 |
| | Provides standalone executables for OpenAI's Whisper & Faster-Whisper speech recognition and transcription tools | 1,405 |
| | An open-source speech recognition system built using machine learning models and JavaScript. | 2,651 |
| | A React Native binding of Whisper's automatic speech recognition model | 408 |
| | A UI component library for displaying messages and notifications in iOS apps with customizable sounds, colors, and fonts | 3,755 |
| | An AI-powered speech recognition and translation tool that utilizes CTranslate2 and Faster-whisper implementations for faster and more efficient processing. | 938 |
| | Provides a high-performance speech recognition system for Unity3D applications. | 445 |
| | An AI system for generating human-like voices from text inputs, using deep learning techniques and pre-trained models. | 36,977 |