GPT-SoVITS
Voice Generator
An AI system for generating human-like voices from text inputs, using deep learning techniques and pre-trained models.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
36k stars
210 watching
4k forks
Language: Python
last commit: 14 days ago
Linked from 1 awesome list
text-to-speechttsvitsvoice-clonevoice-cloneaivoice-cloning
Related projects:
Repository | Description | Stars |
---|---|---|
jasonppy/voicecraft | A neural codec model for speech editing and text-to-speech synthesis in real-time, using few seconds of reference audio. | 7,638 |
metavoiceio/metavoice-src | A deep learning model for generating human-like speech | 3,891 |
neonbjb/tortoise-tts | A multi-voice text-to-speech system trained on high-quality data | 13,225 |
coqui-ai/tts | A deep learning toolkit for generating human-like speech from text | 35,453 |
plachtaa/vall-e-x | A research implementation of Microsoft's VALL-E X zero-shot TTS model for multilingual text-to-speech synthesis and voice cloning | 7,670 |
coqui-ai/stt | A toolkit for building and deploying speech-to-text models using deep learning techniques | 2,283 |
mozilla/tts | An open-source project providing a suite of deep learning models and tools for advanced text-to-speech synthesis. | 9,401 |
mshumer/gpt-prompt-engineer | A tool for automating the process of generating and ranking effective prompts for AI models like GPT-4, GPT-3.5-Turbo, or Claude 3 Opus. | 9,368 |
tensorspeech/tensorflowtts | Real-time speech synthesis using state-of-the-art architectures | 3,839 |
openai/whisper | A general-purpose speech recognition system trained on large-scale weak supervision | 71,257 |
instruction-tuning-with-gpt-4/gpt-4-llm | This project generates instruction-following data using GPT-4 to fine-tune large language models for real-world tasks. | 4,210 |
jaywalnut310/vits | Develops an end-to-end text-to-speech system that generates more natural audio than existing models | 6,860 |
minimaxir/gpt-2-simple | A tool for retraining and fine-tuning the OpenAI GPT-2 text generation model on new datasets. | 3,397 |
camb-ai/mars5-tts | A deep learning-based speech synthesis model that generates natural-sounding audio with controlled prosody. | 2,530 |
rhasspy/piper | A fast local neural text-to-speech system optimized for small devices | 6,576 |