ltu

Audio Model

An audio and speech large language model implementation with pre-trained models, datasets, and inference options

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

GitHub

396 stars

15 watching

38 forks

Language: Python

last commit: about 2 years ago

audioaudio-processingdeep-learninglarge-language-modelsspeech-recognition

Related projects:

Repository	Description	Stars
microsoft/pengi	An Audio Language Model framework that uses transfer learning to generate text from audio inputs	295
balavenkatesh3322/audio-pretrained-model	A collection of pre-trained audio and speech models for various applications	183
shawn-ieitsystems/yuan-1.0	Large-scale language model with improved performance on NLP tasks through distributed training and efficient data processing	591
brightmart/xlnet_zh	Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks	230
ymcui/lert	A pre-trained language model designed to leverage linguistic features and outperform comparable baselines on Chinese natural language understanding tasks.	202
ieit-yuan/yuan2.0-m32	A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation	182
yunwentechnology/unilm	This project provides pre-trained models and tools for natural language understanding (NLU) and generation (NLG) tasks in Chinese.	439
qwenlm/qwen-audio	A multimodal audio language model developed by Alibaba Cloud that supports various tasks and languages	1,515
yuangongnd/whisper-at	An audio processing model that adds audio event tagging capabilities to an existing speech recognition system with minimal additional computational cost.	343
bytedance/salmonn	A large language model enabling speech, audio event perception and music inputs to achieve multilingual capabilities	1,091
tencent/tencent-hunyuan-large	This project makes a large language model accessible for research and development	1,245
thu-coai/opd	A large-scale pre-trained dialogue model for Chinese language	74
renshuhuai-andy/timechat	A large language model designed to understand long videos by binding visual content with timestamps and producing video token sequences of varying lengths.	314
baai-wudao/model	A repository of pre-trained language models for various tasks and domains.	121
qwenlm/qwen2-audio	An audio-language model that can analyze or respond to speech instructions based on audio input	1,306