ltu
Audio Model
An audio and speech large language model implementation with pre-trained models, datasets, and inference options
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
396 stars
15 watching
38 forks
Language: Python
last commit: 10 months ago audioaudio-processingdeep-learninglarge-language-modelsspeech-recognition
Related projects:
Repository | Description | Stars |
---|---|---|
| An Audio Language Model framework that uses transfer learning to generate text from audio inputs | 295 |
| A collection of pre-trained audio and speech models for various applications | 183 |
| Large-scale language model with improved performance on NLP tasks through distributed training and efficient data processing | 591 |
| Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks | 230 |
| A pre-trained language model designed to leverage linguistic features and outperform comparable baselines on Chinese natural language understanding tasks. | 202 |
| A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation | 182 |
| This project provides pre-trained models and tools for natural language understanding (NLU) and generation (NLG) tasks in Chinese. | 439 |
| A multimodal audio language model developed by Alibaba Cloud that supports various tasks and languages | 1,515 |
| An audio processing model that adds audio event tagging capabilities to an existing speech recognition system with minimal additional computational cost. | 343 |
| A large language model enabling speech, audio event perception and music inputs to achieve multilingual capabilities | 1,091 |
| This project makes a large language model accessible for research and development | 1,245 |
| A large-scale pre-trained dialogue model for Chinese language | 74 |
| A large language model designed to understand long videos by binding visual content with timestamps and producing video token sequences of varying lengths. | 314 |
| A repository of pre-trained language models for various tasks and domains. | 121 |
| An audio-language model that can analyze or respond to speech instructions based on audio input | 1,306 |