MARS5-TTS

Speech synthesizer

A deep learning-based speech synthesis model that generates natural-sounding audio with controlled prosody.

MARS5 speech model (TTS) from CAMB.AI

GitHub

3k stars
33 watching
209 forks
Language: Jupyter Notebook
last commit: 5 months ago
Linked from 1 awesome list

prosodyspeechspeech-synthesistext-to-speechvoice-cloneaivoice-cloning

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
coqui-ai/tts A deep learning toolkit for generating human-like speech from text 36,118
mozilla/tts An open-source project providing a suite of deep learning models and tools for advanced text-to-speech synthesis. 9,466
suno-ai/bark A text-to-audio model that generates realistic speech and other audio 36,433
plachtaa/vall-e-x A research implementation of Microsoft's VALL-E X zero-shot TTS model for multilingual text-to-speech synthesis and voice cloning 7,719
jasonppy/voicecraft A neural codec model for speech editing and text-to-speech synthesis in real-time, using few seconds of reference audio. 7,744
m-bain/whisperx An automatic speech recognition system with word-level timestamps and speaker diarization. 12,894
neonbjb/tortoise-tts An open-source text-to-speech system trained with high-quality audio capabilities 13,373
rvc-boss/gpt-sovits An AI system for generating human-like voices from text inputs, using deep learning techniques and pre-trained models. 36,977
metavoiceio/metavoice-src A deep learning model for generating human-like speech 3,936
speechbrain/speechbrain A PyTorch-based toolkit for building conversational AI systems with advanced speech and text processing capabilities. 9,066
mshumer/gpt-prompt-engineer A tool for automating the process of generating and ranking effective prompts for AI models like GPT-4, GPT-3.5-Turbo, or Claude 3 Opus. 9,411
dvlab-research/mgm An open-source framework for training large language models with vision capabilities. 3,229
dair-ai/prompt-engineering-guide A comprehensive resource for guiding the development and optimization of prompts to use language models effectively in various applications. 51,082
ai-forever/kandinsky-2 A multilingual text2image latent diffusion model with improved aesthetics and controllability 2,774
jaywalnut310/vits Develops an end-to-end text-to-speech system that generates more natural audio than existing models 6,947