GPT-SoVITS

Voice Generator

An AI system for generating human-like voices from text inputs, using deep learning techniques and pre-trained models.

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

GitHub

36k stars
210 watching
4k forks
Language: Python
last commit: 15 days ago
Linked from 1 awesome list

text-to-speechttsvitsvoice-clonevoice-cloneaivoice-cloning

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
jasonppy/voicecraft A neural codec model for speech editing and text-to-speech synthesis in real-time, using few seconds of reference audio. 7,638
metavoiceio/metavoice-src A deep learning model for generating human-like speech 3,891
neonbjb/tortoise-tts A multi-voice text-to-speech system trained on high-quality data 13,225
coqui-ai/tts A deep learning toolkit for generating human-like speech from text 35,453
plachtaa/vall-e-x A research implementation of Microsoft's VALL-E X zero-shot TTS model for multilingual text-to-speech synthesis and voice cloning 7,670
coqui-ai/stt A toolkit for building and deploying speech-to-text models using deep learning techniques 2,283
mozilla/tts An open-source project providing a suite of deep learning models and tools for advanced text-to-speech synthesis. 9,401
mshumer/gpt-prompt-engineer A tool for automating the process of generating and ranking effective prompts for AI models like GPT-4, GPT-3.5-Turbo, or Claude 3 Opus. 9,368
tensorspeech/tensorflowtts Real-time speech synthesis using state-of-the-art architectures 3,839
openai/whisper A general-purpose speech recognition system trained on large-scale weak supervision 71,257
instruction-tuning-with-gpt-4/gpt-4-llm This project generates instruction-following data using GPT-4 to fine-tune large language models for real-world tasks. 4,210
jaywalnut310/vits Develops an end-to-end text-to-speech system that generates more natural audio than existing models 6,860
minimaxir/gpt-2-simple A tool for retraining and fine-tuning the OpenAI GPT-2 text generation model on new datasets. 3,397
camb-ai/mars5-tts A deep learning-based speech synthesis model that generates natural-sounding audio with controlled prosody. 2,530
rhasspy/piper A fast local neural text-to-speech system optimized for small devices 6,576