ltu
Audio Model
An audio and speech large language model implementation with pre-trained models, datasets, and inference options
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
385 stars
15 watching
36 forks
Language: Python
last commit: 7 months ago audioaudio-processingdeep-learninglarge-language-modelsspeech-recognition
Related projects:
Repository | Description | Stars |
---|---|---|
microsoft/pengi | An Audio Language Model framework that uses transfer learning to generate text from audio inputs | 290 |
balavenkatesh3322/audio-pretrained-model | A collection of pre-trained audio and speech models for various applications | 182 |
shawn-ieitsystems/yuan-1.0 | Large-scale language model with improved performance on NLP tasks through distributed training and efficient data processing | 591 |
brightmart/xlnet_zh | Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks | 230 |
ymcui/lert | A pre-trained language model designed to leverage linguistic features and outperform comparable baselines on Chinese natural language understanding tasks. | 202 |
ieit-yuan/yuan2.0-m32 | A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation | 180 |
yunwentechnology/unilm | This project provides pre-trained models for natural language understanding and generation tasks using the UniLM architecture. | 438 |
qwenlm/qwen-audio | A multimodal audio language model developed by Alibaba Cloud that supports various tasks and languages | 1,486 |
yuangongnd/whisper-at | An audio processing model that adds audio event tagging capabilities to an existing speech recognition system with minimal additional computational cost. | 321 |
bytedance/salmonn | A large language model enabling speech, audio event perception and music inputs to achieve multilingual capabilities | 1,053 |
tencent/tencent-hunyuan-large | This project makes a large language model accessible for research and development | 1,114 |
thu-coai/opd | A large-scale pre-trained dialogue model for Chinese language | 74 |
renshuhuai-andy/timechat | A large language model designed to understand and process long videos with temporal information | 286 |
baai-wudao/model | A repository of pre-trained language models for various tasks and domains. | 121 |
qwenlm/qwen2-audio | An audio-language model that can analyze or respond to speech instructions based on audio input | 1,229 |