MARS5-TTS

Speech synthesizer

A deep learning-based speech synthesis model that generates natural-sounding audio with controlled prosody.

MARS5 speech model (TTS) from CAMB.AI

GitHub

3k stars

33 watching

209 forks

Language: Jupyter Notebook

last commit: almost 2 years ago

Linked from 1 awesome list

prosodyspeechspeech-synthesistext-to-speechvoice-cloneaivoice-cloning

www.camb.ai

Backlinks from these awesome lists:

amrzv/awesome-colab-notebooks

Related projects:

Repository	Description	Stars
coqui-ai/tts	A deep learning toolkit for generating human-like speech from text	36,118
mozilla/tts	An open-source project providing a suite of deep learning models and tools for advanced text-to-speech synthesis.	9,466
suno-ai/bark	A text-to-audio model that generates realistic speech and other audio	36,433
plachtaa/vall-e-x	A research implementation of Microsoft's VALL-E X zero-shot TTS model for multilingual text-to-speech synthesis and voice cloning	7,719
jasonppy/voicecraft	A neural codec model for speech editing and text-to-speech synthesis in real-time, using few seconds of reference audio.	7,744
m-bain/whisperx	An automatic speech recognition system with word-level timestamps and speaker diarization.	12,894
neonbjb/tortoise-tts	An open-source text-to-speech system trained with high-quality audio capabilities	13,373
rvc-boss/gpt-sovits	An AI system for generating human-like voices from text inputs, using deep learning techniques and pre-trained models.	36,977
metavoiceio/metavoice-src	A deep learning model for generating human-like speech	3,936
speechbrain/speechbrain	A PyTorch-based toolkit for building conversational AI systems with advanced speech and text processing capabilities.	9,066
mshumer/gpt-prompt-engineer	A tool for automating the process of generating and ranking effective prompts for AI models like GPT-4, GPT-3.5-Turbo, or Claude 3 Opus.	9,411
dvlab-research/mgm	An open-source framework for training large language models with vision capabilities.	3,229
dair-ai/prompt-engineering-guide	A comprehensive resource for guiding the development and optimization of prompts to use language models effectively in various applications.	51,082
ai-forever/kandinsky-2	A multilingual text2image latent diffusion model with improved aesthetics and controllability	2,774
jaywalnut310/vits	Develops an end-to-end text-to-speech system that generates more natural audio than existing models	6,947