SALMONN
Audio perceptron
A large language model enabling speech, audio event perception and music inputs to achieve multilingual capabilities
SALMONN: Speech Audio Language Music Open Neural Network
1k stars
28 watching
85 forks
Language: Python
last commit: 3 months ago audioaudio-processingbytedanceiclr2024icml-2024large-language-modelsmulti-modalmusicresearchspeechspeech-recognitiontsinghua-university
Related projects:
Repository | Description | Stars |
---|---|---|
| Reconstructs audio features learned by convolutional neural networks into audible sounds | 42 |
| This project provides an implementation of a deep learning framework to classify audio signals and offers insights into the model's decision-making process using Explainable Artificial Intelligence (AI) techniques. | 351 |
| An unconditional end-to-end neural audio generation model utilizing a recurrent neural network architecture. | 537 |
| Identifies sounds in short audio clips using machine learning and PCA transformation | 154 |
| An audio processing toolkit using PyTorch convolutional neural networks to generate spectrograms from raw audio data | 1,036 |
| An audio and speech large language model implementation with pre-trained models, datasets, and inference options | 396 |
| A system for audio classification and detection using machine learning models | 4 |
| A collection of pre-trained audio and speech models for various applications | 183 |
| An audio classification system using a convolutional neural network to classify audio data | 160 |
| An implementation of a neural network-based vocoder using parallel-wavenet architecture and autoregressive flow | 290 |
| An implementation of an audio generation model using PyTorch | 290 |
| Developing low-resource speech command recognition systems using adversarial reprogramming and transfer learning | 18 |
| Provides an implementation of a specific algorithm used in audio signal processing | 0 |
| Classify music genre from a 10-second sound stream using a neural network. | 565 |
| An Audio Language Model framework that uses transfer learning to generate text from audio inputs | 295 |