vits

TTS system

Develops an end-to-end text-to-speech system that generates more natural audio than existing models

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

GitHub

7k stars
54 watching
1k forks
Language: Python
last commit: about 1 year ago
deep-learningpytorchspeech-synthesistext-to-speechtts

Related projects:

Repository Description Stars
mozilla/tts An open-source project providing a suite of deep learning models and tools for advanced text-to-speech synthesis. 9,466
jasonppy/voicecraft A neural codec model for speech editing and text-to-speech synthesis in real-time, using few seconds of reference audio. 7,744
rvc-boss/gpt-sovits An AI system for generating human-like voices from text inputs, using deep learning techniques and pre-trained models. 36,977
coqui-ai/tts A deep learning toolkit for generating human-like speech from text 36,118
plachtaa/vall-e-x A research implementation of Microsoft's VALL-E X zero-shot TTS model for multilingual text-to-speech synthesis and voice cloning 7,719
neonbjb/tortoise-tts An open-source text-to-speech system trained with high-quality audio capabilities 13,373
camb-ai/mars5-tts A deep learning-based speech synthesis model that generates natural-sounding audio with controlled prosody. 2,551
oxford-cs-deepnlp-2017/lectures An open-source repository containing lecture slides and course materials for an advanced natural language processing course. 15,702
mubertai/mubert-text-to-music Generates music based on user input prompts using the Mubert API 2,738
metavoiceio/metavoice-src A deep learning model for generating human-like speech 3,936
matlab-deep-learning/wav2vec-2.0 Enables speech-to-text transcription using a pre-trained neural network model in MATLAB. 7
coqui-ai/stt A toolkit for building and deploying speech-to-text models using deep learning techniques 2,302
facebookresearch/fairseq A toolkit for training custom sequence-to-sequence models for various NLP tasks 30,675
ai-forever/kandinsky-2 A multilingual text2image latent diffusion model with improved aesthetics and controllability 2,774
suno-ai/bark A text-to-audio model that generates realistic speech and other audio 36,433