MARS5-TTS
Speech synthesizer
A deep learning-based speech synthesis model that generates natural-sounding audio with controlled prosody.
MARS5 speech model (TTS) from CAMB.AI
3k stars
32 watching
207 forks
Language: Jupyter Notebook
last commit: 4 months ago
Linked from 1 awesome list
prosodyspeechspeech-synthesistext-to-speechvoice-cloneaivoice-cloning
Related projects:
Repository | Description | Stars |
---|---|---|
coqui-ai/tts | A deep learning toolkit for generating human-like speech from text | 35,453 |
mozilla/tts | An open-source project providing a suite of deep learning models and tools for advanced text-to-speech synthesis. | 9,401 |
suno-ai/bark | A text-to-audio model that generates realistic speech and other audio | 36,126 |
plachtaa/vall-e-x | A research implementation of Microsoft's VALL-E X zero-shot TTS model for multilingual text-to-speech synthesis and voice cloning | 7,670 |
jasonppy/voicecraft | A neural codec model for speech editing and text-to-speech synthesis in real-time, using few seconds of reference audio. | 7,638 |
m-bain/whisperx | An automatic speech recognition system with word-level timestamps and speaker diarization. | 12,489 |
neonbjb/tortoise-tts | A multi-voice text-to-speech system trained on high-quality data | 13,225 |
rvc-boss/gpt-sovits | An AI system for generating human-like voices from text inputs, using deep learning techniques and pre-trained models. | 35,728 |
metavoiceio/metavoice-src | A deep learning model for generating human-like speech | 3,891 |
speechbrain/speechbrain | A PyTorch-based toolkit for building conversational AI systems with advanced speech and text processing capabilities. | 8,922 |
mshumer/gpt-prompt-engineer | A tool for automating the process of generating and ranking effective prompts for AI models like GPT-4, GPT-3.5-Turbo, or Claude 3 Opus. | 9,368 |
dvlab-research/mgm | An open-source framework for training large language models with vision capabilities. | 3,211 |
dair-ai/prompt-engineering-guide | A comprehensive resource for designing and optimizing prompts to interact with language models | 50,262 |
ai-forever/kandinsky-2 | A multilingual text2image latent diffusion model with improved aesthetics and controllability | 2,766 |
jaywalnut310/vits | Develops an end-to-end text-to-speech system that generates more natural audio than existing models | 6,860 |