SALMONN

Audio perceptron

A large language model enabling speech, audio event perception and music inputs to achieve multilingual capabilities

SALMONN: Speech Audio Language Music Open Neural Network

GitHub

1k stars
28 watching
85 forks
Language: Python
last commit: 11 months ago
audioaudio-processingbytedanceiclr2024icml-2024large-language-modelsmulti-modalmusicresearchspeechspeech-recognitiontsinghua-university

Related projects:

Repository Description Stars
keunwoochoi/auralisation Reconstructs audio features learned by convolutional neural networks into audible sounds 42
soerenab/audiomnist This project provides an implementation of a deep learning framework to classify audio signals and offers insights into the model's decision-making process using Explainable Artificial Intelligence (AI) techniques. 351
soroushmehr/samplernn_iclr2017 An unconditional end-to-end neural audio generation model utilizing a recurrent neural network architecture. 537
ibm/max-audio-classifier Identifies sounds in short audio clips using machine learning and PCA transformation 154
kinwaicheuk/nnaudio An audio processing toolkit using PyTorch convolutional neural networks to generate spectrograms from raw audio data 1,036
yuangongnd/ltu An audio and speech large language model implementation with pre-trained models, datasets, and inference options 396
yongxuustc/dcase2017_task4_cvssp A system for audio classification and detection using machine learning models 4
balavenkatesh3322/audio-pretrained-model A collection of pre-trained audio and speech models for various applications 183
drscotthawley/audio-classifier-keras-cnn An audio classification system using a convolutional neural network to classify audio data 160
ksw0306/clarinet An implementation of a neural network-based vocoder using parallel-wavenet architecture and autoregressive flow 290
deepsound-project/samplernn-pytorch An implementation of an audio generation model using PyTorch 290
dodohow1011/speechadvreprogram Developing low-resource speech command recognition systems using adversarial reprogramming and transfer learning 18
xidongwu/d-auprc Provides an implementation of a specific algorithm used in audio signal processing 0
mlachmish/musicgenreclassification Classify music genre from a 10-second sound stream using a neural network. 565
microsoft/pengi An Audio Language Model framework that uses transfer learning to generate text from audio inputs 295