whisper-at

Audio tagger

An audio processing model that adds audio event tagging capabilities to an existing speech recognition system with minimal additional computational cost.

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"

GitHub

341 stars
11 watching
28 forks
Language: Python
last commit: 10 months ago
Linked from 1 awesome list

audioaudio-classificationaudio-processingaudio-taggingspeech-recognition

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
yuangongnd/ltu An audio and speech large language model implementation with pre-trained models, datasets, and inference options 396
macoron/whisper.unity Provides a high-performance speech recognition system for Unity3D applications. 445
linto-ai/whisper-timestamped An extension to the Whisper speech recognition model that adds word-level timestamps and confidence scores. 2,121
bnosac/audio.whisper Provides an R interface to the Whisper Automatic Speech Recognition model 119
jordipons/music-audio-tagging-at-scale-models Research on end-to-end learning for music audio tagging using large datasets and different front-end paradigms. 149
collabora/whisperlive An implementation of Whisper's speech-to-text functionality in a real-time transcription application 2,186
ggerganov/whisper.spm A Swift package for C implementation of a speech recognition system 169
srijith-rkr/kaust-whisper-adapter A tool for fine-tuning the OpenAI Whisper speech recognition model using residual adapters and parameter-efficient learning methods. 32
tinytag/tinytag A Python library for reading and extracting metadata from various audio file formats. 714
ronggong/jingjusyllabicsegmentaion An implementation of a score-informed method for segmenting jingju a cappella singing voice into syllables using convolutional neural networks and Viterbi algorithm 7
ronggong/jingjusingingphrasematching This repository provides a software framework to match singing audio with corresponding music scores based on phonetic and duration information. 27
arthurfdlr/whisper-youtube Transcribes Youtube videos using OpenAI's Whisper speech recognition model 369
bytedance/salmonn A large language model enabling speech, audio event perception and music inputs to achieve multilingual capabilities 1,091
soerenab/audiomnist This project provides an implementation of a deep learning framework to classify audio signals and offers insights into the model's decision-making process using Explainable Artificial Intelligence (AI) techniques. 350
jongpillee/musictagging_msd This project is an audio classification system trained on the MSD tagging dataset, enabling automatic tagging of music files with relevant genres and styles. 7