bark
Audio generator
A text-to-audio model that generates realistic speech and other audio
🔊 Text-Prompted Generative Audio Model
36k stars
330 watching
4k forks
Language: Jupyter Notebook
last commit: 5 months ago
Linked from 3 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
camb-ai/mars5-tts | A deep learning-based speech synthesis model that generates natural-sounding audio with controlled prosody. | 2,551 |
jasonppy/voicecraft | A neural codec model for speech editing and text-to-speech synthesis in real-time, using few seconds of reference audio. | 7,744 |
speechbrain/speechbrain | A PyTorch-based toolkit for building conversational AI systems with advanced speech and text processing capabilities. | 9,066 |
haoheliu/audioldm | A Python-based audio generation tool that can produce speech, sound effects, music, and more, using text as input or guided by user description. | 2,483 |
mozilla/tts | An open-source project providing a suite of deep learning models and tools for advanced text-to-speech synthesis. | 9,466 |
mshumer/gpt-prompt-engineer | A tool for automating the process of generating and ranking effective prompts for AI models like GPT-4, GPT-3.5-Turbo, or Claude 3 Opus. | 9,411 |
coqui-ai/tts | A deep learning toolkit for generating human-like speech from text | 36,118 |
aigc-audio/audiogpt | An audio processing toolkit that provides pre-trained models and tools for tasks like speech synthesis, music generation, sound detection, and talking head creation. | 10,061 |
bigscience-workshop/promptsource | A toolkit for creating and using natural language prompts to enable large language models to generalize to new tasks. | 2,718 |
coqui-ai/stt | A toolkit for building and deploying speech-to-text models using deep learning techniques | 2,302 |
plachtaa/vall-e-x | A research implementation of Microsoft's VALL-E X zero-shot TTS model for multilingual text-to-speech synthesis and voice cloning | 7,719 |
hahahumble/speechgpt | An application that enables users to converse with ChatGPT via speech and text interfaces. | 2,752 |
huggingface/transformers | A collection of pre-trained machine learning models for various natural language and computer vision tasks, enabling developers to fine-tune and deploy these models on their own projects. | 136,357 |
lucidrains/musiclm-pytorch | Implementation of Google's MusicLM model for music generation using attention networks and text-conditioning. | 3,189 |
nvidia/waveglow | Generates high-quality speech from mel-spectrograms using a flow-based network architecture | 2,294 |