whisper-at

Audio tagger

An audio processing model that adds audio event tagging capabilities to an existing speech recognition system with minimal additional computational cost.

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"

GitHub

321 stars
10 watching
27 forks
Language: Python
last commit: 9 months ago
Linked from 1 awesome list

audioaudio-classificationaudio-processingaudio-taggingspeech-recognition

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
yuangongnd/ltu An audio and speech large language model implementation with pre-trained models, datasets, and inference options 385
macoron/whisper.unity Provides a high-performance speech recognition system for Unity3D applications. 433
linto-ai/whisper-timestamped An extension of the Whisper model to predict word timestamps and confidence scores with improved accuracy 2,045
bnosac/audio.whisper Provides an R interface to the Whisper Automatic Speech Recognition model 118
jordipons/music-audio-tagging-at-scale-models Research on end-to-end learning for music audio tagging using large datasets and different front-end paradigms. 148
collabora/whisperlive An implementation of Whisper's speech-to-text functionality in a real-time transcription application 2,050
ggerganov/whisper.spm A Swift package for C implementation of a speech recognition system 169
srijith-rkr/kaust-whisper-adapter A tool for fine-tuning the OpenAI Whisper speech recognition model using residual adapters and parameter-efficient learning methods. 32
tinytag/tinytag A Python library for reading and extracting metadata from various audio file formats. 703
ronggong/jingjusyllabicsegmentaion An implementation of a score-informed method for segmenting jingju a cappella singing voice into syllables using convolutional neural networks and Viterbi algorithm 7
ronggong/jingjusingingphrasematching This repository provides a software framework to match singing audio with corresponding music scores based on phonetic and duration information. 27
arthurfdlr/whisper-youtube Transcribes Youtube videos using OpenAI's Whisper speech recognition model 362
bytedance/salmonn A large language model enabling speech, audio event perception and music inputs to achieve multilingual capabilities 1,053
soerenab/audiomnist This project provides an implementation of a deep learning framework to classify audio signals and offers insights into the model's decision-making process using Explainable Artificial Intelligence (AI) techniques. 347
jongpillee/musictagging_msd This project is an audio classification system trained on the MSD tagging dataset, enabling automatic tagging of music files with relevant genres and styles. 7