SALMONN
Audio perceptron
A large language model enabling speech, audio event perception and music inputs to achieve multilingual capabilities
SALMONN: Speech Audio Language Music Open Neural Network
1k stars
26 watching
83 forks
Language: Python
last commit: 15 days ago audioaudio-processingbytedanceiclr2024icml-2024large-language-modelsmulti-modalmusicresearchspeechspeech-recognitiontsinghua-university
Related projects:
Repository | Description | Stars |
---|---|---|
keunwoochoi/auralisation | Reconstructs audio features learned by convolutional neural networks into audible sounds | 42 |
soerenab/audiomnist | This project provides an implementation of a deep learning framework to classify audio signals and offers insights into the model's decision-making process using Explainable Artificial Intelligence (AI) techniques. | 347 |
soroushmehr/samplernn_iclr2017 | An unconditional end-to-end neural audio generation model utilizing a recurrent neural network architecture. | 537 |
ibm/max-audio-classifier | Identifies sounds in short audio clips using machine learning and PCA transformation | 153 |
kinwaicheuk/nnaudio | An audio processing toolkit using PyTorch convolutional neural networks to generate spectrograms from raw audio data | 1,032 |
yuangongnd/ltu | An audio and speech large language model implementation with pre-trained models, datasets, and inference options | 385 |
yongxuustc/dcase2017_task4_cvssp | A system for audio classification and detection using machine learning models | 4 |
balavenkatesh3322/audio-pretrained-model | A collection of pre-trained audio and speech models for various applications | 182 |
drscotthawley/audio-classifier-keras-cnn | An audio classification system using a convolutional neural network to classify audio data | 160 |
ksw0306/clarinet | An implementation of a neural network-based vocoder using parallel-wavenet architecture and autoregressive flow | 289 |
deepsound-project/samplernn-pytorch | An implementation of an audio generation model using PyTorch | 288 |
dodohow1011/speechadvreprogram | Developing low-resource speech command recognition systems using adversarial reprogramming and transfer learning | 18 |
xidongwu/d-auprc | Provides an implementation of a specific algorithm used in audio signal processing | 0 |
mlachmish/musicgenreclassification | Classify music genre from a 10-second sound stream using a neural network. | 562 |
microsoft/pengi | An Audio Language Model framework that uses transfer learning to generate text from audio inputs | 290 |