tortoise-tts
TTS system
A multi-voice text-to-speech system trained on high-quality data
A multi-voice TTS system trained with an emphasis on quality
13k stars
174 watching
2k forks
Language: Jupyter Notebook
last commit: 3 months ago
Linked from 2 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
mozilla/tts | An open-source project providing a suite of deep learning models and tools for advanced text-to-speech synthesis. | 9,401 |
jasonppy/voicecraft | A neural codec model for speech editing and text-to-speech synthesis in real-time, using few seconds of reference audio. | 7,638 |
rvc-boss/gpt-sovits | An AI system for generating human-like voices from text inputs, using deep learning techniques and pre-trained models. | 35,728 |
tensorspeech/tensorflowtts | Real-time speech synthesis using state-of-the-art architectures | 3,839 |
coqui-ai/tts | A deep learning toolkit for generating human-like speech from text | 35,453 |
metavoiceio/metavoice-src | A deep learning model for generating human-like speech | 3,891 |
huggingface/text-generation-inference | A toolkit for deploying and serving Large Language Models. | 9,106 |
camb-ai/mars5-tts | A deep learning-based speech synthesis model that generates natural-sounding audio with controlled prosody. | 2,530 |
coqui-ai/stt | A toolkit for building and deploying speech-to-text models using deep learning techniques | 2,283 |
google/sentencepiece | An unsupervised text tokenizer that segments input text into subwords and detokenizes output based on a predefined vocabulary size. | 10,284 |
openai/whisper | A general-purpose speech recognition system trained on large-scale weak supervision | 71,257 |
nvidia/tacotron2 | This PyTorch implementation provides a toolkit for speech synthesis using a deep neural network architecture. | 5,099 |
eleutherai/gpt-neox | Provides a framework for training large-scale language models on GPUs with advanced features and optimizations. | 6,941 |
opennmt/ctranslate2 | A high-performance library for efficient inference with Transformer models on CPUs and GPUs. | 3,404 |
jaywalnut310/vits | Develops an end-to-end text-to-speech system that generates more natural audio than existing models | 6,860 |