gazelle
Audio responder
An implementation of a joint speech language model that responds directly to audio input
Joint speech-language model - respond directly to audio!
357 stars
15 watching
34 forks
Language: Python
last commit: 6 months ago
Linked from 1 awesome list
audiollmmultimodalspeech
Related projects:
Repository | Description | Stars |
---|---|---|
qwenlm/qwen2-audio | An audio-language model that can analyze or respond to speech instructions based on audio input | 1,306 |
microsoft/pengi | An Audio Language Model framework that uses transfer learning to generate text from audio inputs | 295 |
soerenab/audiomnist | This project provides an implementation of a deep learning framework to classify audio signals and offers insights into the model's decision-making process using Explainable Artificial Intelligence (AI) techniques. | 350 |
balavenkatesh3322/audio-pretrained-model | A collection of pre-trained audio and speech models for various applications | 183 |
bytedance/salmonn | A large language model enabling speech, audio event perception and music inputs to achieve multilingual capabilities | 1,091 |
yongxuustc/dcase2017_task4_cvssp | A system for audio classification and detection using machine learning models | 4 |
yuangongnd/ltu | An audio and speech large language model implementation with pre-trained models, datasets, and inference options | 396 |
laion-ai/clap | A library for learning audio embeddings from text and audio data using contrastive language-audio pretraining | 1,457 |
kinwaicheuk/nnaudio | An audio processing toolkit using PyTorch convolutional neural networks to generate spectrograms from raw audio data | 1,036 |
ibm/max-audio-classifier | Identifies sounds in short audio clips using machine learning and PCA transformation | 154 |
awni/speech | A PyTorch implementation of end-to-end speech recognition models. | 756 |
keunwoochoi/auralisation | Reconstructs audio features learned by convolutional neural networks into audible sounds | 42 |
gen2brain/malgo | Provides a set of audio APIs for Go programming language | 305 |
picovoice/rhino | A deep learning-based speech-to-intent engine for on-device voice interaction | 631 |
chrisguttandin/standardized-audio-context | A cross-browser wrapper for the Web Audio API aiming to closely follow the standard. | 687 |