MARS5-TTS
Speech synthesizer
A deep learning-based speech synthesis model that generates natural-sounding audio with controlled prosody.
MARS5 speech model (TTS) from CAMB.AI
3k stars
33 watching
209 forks
Language: Jupyter Notebook
last commit: 5 months ago
Linked from 1 awesome list
prosodyspeechspeech-synthesistext-to-speechvoice-cloneaivoice-cloning
Related projects:
Repository | Description | Stars |
---|---|---|
coqui-ai/tts | A deep learning toolkit for generating human-like speech from text | 36,118 |
mozilla/tts | An open-source project providing a suite of deep learning models and tools for advanced text-to-speech synthesis. | 9,466 |
suno-ai/bark | A text-to-audio model that generates realistic speech and other audio | 36,433 |
plachtaa/vall-e-x | A research implementation of Microsoft's VALL-E X zero-shot TTS model for multilingual text-to-speech synthesis and voice cloning | 7,719 |
jasonppy/voicecraft | A neural codec model for speech editing and text-to-speech synthesis in real-time, using few seconds of reference audio. | 7,744 |
m-bain/whisperx | An automatic speech recognition system with word-level timestamps and speaker diarization. | 12,894 |
neonbjb/tortoise-tts | An open-source text-to-speech system trained with high-quality audio capabilities | 13,373 |
rvc-boss/gpt-sovits | An AI system for generating human-like voices from text inputs, using deep learning techniques and pre-trained models. | 36,977 |
metavoiceio/metavoice-src | A deep learning model for generating human-like speech | 3,936 |
speechbrain/speechbrain | A PyTorch-based toolkit for building conversational AI systems with advanced speech and text processing capabilities. | 9,066 |
mshumer/gpt-prompt-engineer | A tool for automating the process of generating and ranking effective prompts for AI models like GPT-4, GPT-3.5-Turbo, or Claude 3 Opus. | 9,411 |
dvlab-research/mgm | An open-source framework for training large language models with vision capabilities. | 3,229 |
dair-ai/prompt-engineering-guide | A comprehensive resource for guiding the development and optimization of prompts to use language models effectively in various applications. | 51,082 |
ai-forever/kandinsky-2 | A multilingual text2image latent diffusion model with improved aesthetics and controllability | 2,774 |
jaywalnut310/vits | Develops an end-to-end text-to-speech system that generates more natural audio than existing models | 6,947 |