pyannote-audio
Speaker Diarization Toolkit
A toolkit for speaker diarization using PyTorch and speech activity detection.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
6k stars
71 watching
780 forks
Language: Jupyter Notebook
last commit: 10 days ago
Linked from 2 awesome lists
overlapped-speech-detectionpretrained-modelspytorchspeaker-change-detectionspeaker-diarizationspeaker-embeddingspeaker-recognitionspeaker-verificationspeech-activity-detectionspeech-processingvoice-activity-detection
Related projects:
Repository | Description | Stars |
---|---|---|
tyiannak/pyaudioanalysis | A comprehensive Python library for feature extraction, classification, segmentation, and applications of audio data. | 5,885 |
mahmoudashraf97/whisper-diarization | Automates speaker diarization from audio recordings using OpenAI Whisper ASR and additional neural networks. | 3,718 |
pytorch/audio | A PyTorch module providing tools and functions for audio signal processing | 2,538 |
speechbrain/speechbrain | A PyTorch-based toolkit for building conversational AI systems with advanced speech and text processing capabilities. | 8,922 |
lucidrains/musiclm-pytorch | Implementation of Google's MusicLM model for music generation using attention networks and text-conditioning. | 3,166 |
facebookresearch/audio2photoreal | Generating photorealistic avatars from audio | 2,709 |
mravanelli/pytorch-kaldi | A toolkit for developing state-of-the-art deep learning-based speech recognition systems using PyTorch and Kaldi | 2,367 |
facebookresearch/audiocraft | A deep learning library for generating high-quality audio | 20,969 |
ibab/tensorflow-wavenet | An implementation of a WaveNet generative neural network architecture for audio generation | 5,414 |
m-bain/whisperx | An automatic speech recognition system with word-level timestamps and speaker diarization. | 12,489 |
nvidia/waveglow | Generates high-quality speech from mel-spectrograms using a flow-based network architecture | 2,285 |
openai/whisper | A general-purpose speech recognition system trained on large-scale weak supervision | 71,257 |
enhuiz/vall-e | An implementation of VALL-E in PyTorch for text-to-speech synthesis | 2,964 |
lucidrains/dalle2-pytorch | An implementation of DALL-E 2's text-to-image synthesis neural network in PyTorch | 11,148 |
huggingface/transformers | A collection of pre-trained machine learning models for various natural language and computer vision tasks, enabling developers to fine-tune and deploy these models on their own projects. | 135,022 |