ltu

Audio Model

An audio and speech large language model implementation with pre-trained models, datasets, and inference options

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

GitHub

396 stars
15 watching
38 forks
Language: Python
last commit: 9 months ago
audioaudio-processingdeep-learninglarge-language-modelsspeech-recognition

Related projects:

Repository Description Stars
microsoft/pengi An Audio Language Model framework that uses transfer learning to generate text from audio inputs 295
balavenkatesh3322/audio-pretrained-model A collection of pre-trained audio and speech models for various applications 183
shawn-ieitsystems/yuan-1.0 Large-scale language model with improved performance on NLP tasks through distributed training and efficient data processing 591
brightmart/xlnet_zh Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks 230
ymcui/lert A pre-trained language model designed to leverage linguistic features and outperform comparable baselines on Chinese natural language understanding tasks. 202
ieit-yuan/yuan2.0-m32 A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation 182
yunwentechnology/unilm This project provides pre-trained models and tools for natural language understanding (NLU) and generation (NLG) tasks in Chinese. 439
qwenlm/qwen-audio A multimodal audio language model developed by Alibaba Cloud that supports various tasks and languages 1,515
yuangongnd/whisper-at An audio processing model that adds audio event tagging capabilities to an existing speech recognition system with minimal additional computational cost. 343
bytedance/salmonn A large language model enabling speech, audio event perception and music inputs to achieve multilingual capabilities 1,091
tencent/tencent-hunyuan-large This project makes a large language model accessible for research and development 1,245
thu-coai/opd A large-scale pre-trained dialogue model for Chinese language 74
renshuhuai-andy/timechat A large language model designed to understand long videos by binding visual content with timestamps and producing video token sequences of varying lengths. 314
baai-wudao/model A repository of pre-trained language models for various tasks and domains. 121
qwenlm/qwen2-audio An audio-language model that can analyze or respond to speech instructions based on audio input 1,306