SALMONN

Audio perceptron

A large language model enabling speech, audio event perception and music inputs to achieve multilingual capabilities

SALMONN: Speech Audio Language Music Open Neural Network

GitHub

1k stars

28 watching

85 forks

Language: Python

last commit: over 1 year ago

audioaudio-processingbytedanceiclr2024icml-2024large-language-modelsmulti-modalmusicresearchspeechspeech-recognitiontsinghua-university

bytedance.github.io/SALMONN/

Related projects:

Repository	Description	Stars
keunwoochoi/auralisation	Reconstructs audio features learned by convolutional neural networks into audible sounds	42
soerenab/audiomnist	This project provides an implementation of a deep learning framework to classify audio signals and offers insights into the model's decision-making process using Explainable Artificial Intelligence (AI) techniques.	351
soroushmehr/samplernn_iclr2017	An unconditional end-to-end neural audio generation model utilizing a recurrent neural network architecture.	537
ibm/max-audio-classifier	Identifies sounds in short audio clips using machine learning and PCA transformation	154
kinwaicheuk/nnaudio	An audio processing toolkit using PyTorch convolutional neural networks to generate spectrograms from raw audio data	1,036
yuangongnd/ltu	An audio and speech large language model implementation with pre-trained models, datasets, and inference options	396
yongxuustc/dcase2017_task4_cvssp	A system for audio classification and detection using machine learning models	4
balavenkatesh3322/audio-pretrained-model	A collection of pre-trained audio and speech models for various applications	183
drscotthawley/audio-classifier-keras-cnn	An audio classification system using a convolutional neural network to classify audio data	160
ksw0306/clarinet	An implementation of a neural network-based vocoder using parallel-wavenet architecture and autoregressive flow	290
deepsound-project/samplernn-pytorch	An implementation of an audio generation model using PyTorch	290
dodohow1011/speechadvreprogram	Developing low-resource speech command recognition systems using adversarial reprogramming and transfer learning	18
xidongwu/d-auprc	Provides an implementation of a specific algorithm used in audio signal processing	0
mlachmish/musicgenreclassification	Classify music genre from a 10-second sound stream using a neural network.	565
microsoft/pengi	An Audio Language Model framework that uses transfer learning to generate text from audio inputs	295