whisper-at
Audio tagger
An audio processing model that adds audio event tagging capabilities to an existing speech recognition system with minimal additional computational cost.
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
321 stars
10 watching
27 forks
Language: Python
last commit: 9 months ago
Linked from 1 awesome list
audioaudio-classificationaudio-processingaudio-taggingspeech-recognition
Related projects:
Repository | Description | Stars |
---|---|---|
yuangongnd/ltu | An audio and speech large language model implementation with pre-trained models, datasets, and inference options | 385 |
macoron/whisper.unity | Provides a high-performance speech recognition system for Unity3D applications. | 433 |
linto-ai/whisper-timestamped | An extension of the Whisper model to predict word timestamps and confidence scores with improved accuracy | 2,045 |
bnosac/audio.whisper | Provides an R interface to the Whisper Automatic Speech Recognition model | 118 |
jordipons/music-audio-tagging-at-scale-models | Research on end-to-end learning for music audio tagging using large datasets and different front-end paradigms. | 148 |
collabora/whisperlive | An implementation of Whisper's speech-to-text functionality in a real-time transcription application | 2,050 |
ggerganov/whisper.spm | A Swift package for C implementation of a speech recognition system | 169 |
srijith-rkr/kaust-whisper-adapter | A tool for fine-tuning the OpenAI Whisper speech recognition model using residual adapters and parameter-efficient learning methods. | 32 |
tinytag/tinytag | A Python library for reading and extracting metadata from various audio file formats. | 703 |
ronggong/jingjusyllabicsegmentaion | An implementation of a score-informed method for segmenting jingju a cappella singing voice into syllables using convolutional neural networks and Viterbi algorithm | 7 |
ronggong/jingjusingingphrasematching | This repository provides a software framework to match singing audio with corresponding music scores based on phonetic and duration information. | 27 |
arthurfdlr/whisper-youtube | Transcribes Youtube videos using OpenAI's Whisper speech recognition model | 362 |
bytedance/salmonn | A large language model enabling speech, audio event perception and music inputs to achieve multilingual capabilities | 1,053 |
soerenab/audiomnist | This project provides an implementation of a deep learning framework to classify audio signals and offers insights into the model's decision-making process using Explainable Artificial Intelligence (AI) techniques. | 347 |
jongpillee/musictagging_msd | This project is an audio classification system trained on the MSD tagging dataset, enabling automatic tagging of music files with relevant genres and styles. | 7 |