ltu

Audio Model

An audio and speech large language model implementation with pre-trained models, datasets, and inference options

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

GitHub

385 stars
15 watching
36 forks
Language: Python
last commit: 7 months ago
audioaudio-processingdeep-learninglarge-language-modelsspeech-recognition

Related projects:

Repository Description Stars
microsoft/pengi An Audio Language Model framework that uses transfer learning to generate text from audio inputs 290
balavenkatesh3322/audio-pretrained-model A collection of pre-trained audio and speech models for various applications 182
shawn-ieitsystems/yuan-1.0 Large-scale language model with improved performance on NLP tasks through distributed training and efficient data processing 591
brightmart/xlnet_zh Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks 230
ymcui/lert A pre-trained language model designed to leverage linguistic features and outperform comparable baselines on Chinese natural language understanding tasks. 202
ieit-yuan/yuan2.0-m32 A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation 180
yunwentechnology/unilm This project provides pre-trained models for natural language understanding and generation tasks using the UniLM architecture. 438
qwenlm/qwen-audio A multimodal audio language model developed by Alibaba Cloud that supports various tasks and languages 1,486
yuangongnd/whisper-at An audio processing model that adds audio event tagging capabilities to an existing speech recognition system with minimal additional computational cost. 321
bytedance/salmonn A large language model enabling speech, audio event perception and music inputs to achieve multilingual capabilities 1,053
tencent/tencent-hunyuan-large This project makes a large language model accessible for research and development 1,114
thu-coai/opd A large-scale pre-trained dialogue model for Chinese language 74
renshuhuai-andy/timechat A large language model designed to understand and process long videos with temporal information 286
baai-wudao/model A repository of pre-trained language models for various tasks and domains. 121
qwenlm/qwen2-audio An audio-language model that can analyze or respond to speech instructions based on audio input 1,229