vits
TTS system
Develops an end-to-end text-to-speech system that generates more natural audio than existing models
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
7k stars
54 watching
1k forks
Language: Python
last commit: about 1 year ago deep-learningpytorchspeech-synthesistext-to-speechtts
Related projects:
Repository | Description | Stars |
---|---|---|
mozilla/tts | An open-source project providing a suite of deep learning models and tools for advanced text-to-speech synthesis. | 9,466 |
jasonppy/voicecraft | A neural codec model for speech editing and text-to-speech synthesis in real-time, using few seconds of reference audio. | 7,744 |
rvc-boss/gpt-sovits | An AI system for generating human-like voices from text inputs, using deep learning techniques and pre-trained models. | 36,977 |
coqui-ai/tts | A deep learning toolkit for generating human-like speech from text | 36,118 |
plachtaa/vall-e-x | A research implementation of Microsoft's VALL-E X zero-shot TTS model for multilingual text-to-speech synthesis and voice cloning | 7,719 |
neonbjb/tortoise-tts | An open-source text-to-speech system trained with high-quality audio capabilities | 13,373 |
camb-ai/mars5-tts | A deep learning-based speech synthesis model that generates natural-sounding audio with controlled prosody. | 2,551 |
oxford-cs-deepnlp-2017/lectures | An open-source repository containing lecture slides and course materials for an advanced natural language processing course. | 15,702 |
mubertai/mubert-text-to-music | Generates music based on user input prompts using the Mubert API | 2,738 |
metavoiceio/metavoice-src | A deep learning model for generating human-like speech | 3,936 |
matlab-deep-learning/wav2vec-2.0 | Enables speech-to-text transcription using a pre-trained neural network model in MATLAB. | 7 |
coqui-ai/stt | A toolkit for building and deploying speech-to-text models using deep learning techniques | 2,302 |
facebookresearch/fairseq | A toolkit for training custom sequence-to-sequence models for various NLP tasks | 30,675 |
ai-forever/kandinsky-2 | A multilingual text2image latent diffusion model with improved aesthetics and controllability | 2,774 |
suno-ai/bark | A text-to-audio model that generates realistic speech and other audio | 36,433 |