whisper-at

Audio tagger

An audio processing model that adds audio event tagging capabilities to an existing speech recognition system with minimal additional computational cost.

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"

GitHub

343 stars

11 watching

28 forks

Language: Python

last commit: over 2 years ago

Linked from 1 awesome list

audioaudio-classificationaudio-processingaudio-taggingspeech-recognition

Backlinks from these awesome lists:

sindresorhus/awesome-whisper

Related projects:

Repository	Description	Stars
yuangongnd/ltu	An audio and speech large language model implementation with pre-trained models, datasets, and inference options	396
macoron/whisper.unity	Provides a high-performance speech recognition system for Unity3D applications.	445
linto-ai/whisper-timestamped	An extension to the Whisper speech recognition model that adds word-level timestamps and confidence scores.	2,121
bnosac/audio.whisper	Provides an R interface to the Whisper Automatic Speech Recognition model	119
jordipons/music-audio-tagging-at-scale-models	Research on end-to-end learning for music audio tagging using large datasets and different front-end paradigms.	149
collabora/whisperlive	An implementation of Whisper's speech-to-text functionality in a real-time transcription application	2,186
ggerganov/whisper.spm	A Swift package for C implementation of a speech recognition system	169
srijith-rkr/kaust-whisper-adapter	A tool for fine-tuning the OpenAI Whisper speech recognition model using residual adapters and parameter-efficient learning methods.	32
tinytag/tinytag	A Python library for reading and extracting metadata from various audio file formats.	714
ronggong/jingjusyllabicsegmentaion	An implementation of a score-informed method for segmenting jingju a cappella singing voice into syllables using convolutional neural networks and Viterbi algorithm	7
ronggong/jingjusingingphrasematching	This repository provides a software framework to match singing audio with corresponding music scores based on phonetic and duration information.	27
arthurfdlr/whisper-youtube	Transcribes Youtube videos using OpenAI's Whisper speech recognition model	369
bytedance/salmonn	A large language model enabling speech, audio event perception and music inputs to achieve multilingual capabilities	1,091
soerenab/audiomnist	This project provides an implementation of a deep learning framework to classify audio signals and offers insights into the model's decision-making process using Explainable Artificial Intelligence (AI) techniques.	351
jongpillee/musictagging_msd	This project is an audio classification system trained on the MSD tagging dataset, enabling automatic tagging of music files with relevant genres and styles.	7