gazelle

Audio responder

An implementation of a joint speech language model that responds directly to audio input

Joint speech-language model - respond directly to audio!

GitHub

357 stars
15 watching
34 forks
Language: Python
last commit: 6 months ago
Linked from 1 awesome list

audiollmmultimodalspeech

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
qwenlm/qwen2-audio An audio-language model that can analyze or respond to speech instructions based on audio input 1,306
microsoft/pengi An Audio Language Model framework that uses transfer learning to generate text from audio inputs 295
soerenab/audiomnist This project provides an implementation of a deep learning framework to classify audio signals and offers insights into the model's decision-making process using Explainable Artificial Intelligence (AI) techniques. 350
balavenkatesh3322/audio-pretrained-model A collection of pre-trained audio and speech models for various applications 183
bytedance/salmonn A large language model enabling speech, audio event perception and music inputs to achieve multilingual capabilities 1,091
yongxuustc/dcase2017_task4_cvssp A system for audio classification and detection using machine learning models 4
yuangongnd/ltu An audio and speech large language model implementation with pre-trained models, datasets, and inference options 396
laion-ai/clap A library for learning audio embeddings from text and audio data using contrastive language-audio pretraining 1,457
kinwaicheuk/nnaudio An audio processing toolkit using PyTorch convolutional neural networks to generate spectrograms from raw audio data 1,036
ibm/max-audio-classifier Identifies sounds in short audio clips using machine learning and PCA transformation 154
awni/speech A PyTorch implementation of end-to-end speech recognition models. 756
keunwoochoi/auralisation Reconstructs audio features learned by convolutional neural networks into audible sounds 42
gen2brain/malgo Provides a set of audio APIs for Go programming language 305
picovoice/rhino A deep learning-based speech-to-intent engine for on-device voice interaction 631
chrisguttandin/standardized-audio-context A cross-browser wrapper for the Web Audio API aiming to closely follow the standard. 687