GPT-SoVITS

Voice Generator

An AI system for generating human-like voices from text inputs, using deep learning techniques and pre-trained models.

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

GitHub

37k stars
218 watching
4k forks
Language: Python
last commit: 2 months ago
Linked from 1 awesome list

text-to-speechttsvitsvoice-clonevoice-cloneaivoice-cloning

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
jasonppy/voicecraft A neural codec model for speech editing and text-to-speech synthesis in real-time, using few seconds of reference audio. 7,744
metavoiceio/metavoice-src A deep learning model for generating human-like speech 3,936
neonbjb/tortoise-tts An open-source text-to-speech system trained with high-quality audio capabilities 13,373
coqui-ai/tts A deep learning toolkit for generating human-like speech from text 36,118
plachtaa/vall-e-x A research implementation of Microsoft's VALL-E X zero-shot TTS model for multilingual text-to-speech synthesis and voice cloning 7,719
coqui-ai/stt A toolkit for building and deploying speech-to-text models using deep learning techniques 2,302
mozilla/tts An open-source project providing a suite of deep learning models and tools for advanced text-to-speech synthesis. 9,466
mshumer/gpt-prompt-engineer A tool for automating the process of generating and ranking effective prompts for AI models like GPT-4, GPT-3.5-Turbo, or Claude 3 Opus. 9,411
tensorspeech/tensorflowtts Real-time speech synthesis using state-of-the-art architectures 3,855
openai/whisper A general-purpose speech recognition system trained on large-scale weak supervision 72,752
instruction-tuning-with-gpt-4/gpt-4-llm This project generates instruction-following data using GPT-4 to fine-tune large language models for real-world tasks. 4,244
jaywalnut310/vits Develops an end-to-end text-to-speech system that generates more natural audio than existing models 6,947
minimaxir/gpt-2-simple A tool for retraining and fine-tuning the OpenAI GPT-2 text generation model on new datasets. 3,398
camb-ai/mars5-tts A deep learning-based speech synthesis model that generates natural-sounding audio with controlled prosody. 2,551
rhasspy/piper A fast local neural text-to-speech system optimized for small devices 7,002