pyannote-audio

Speaker Diarization Toolkit

A toolkit for speaker diarization using PyTorch and speech activity detection.

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

GitHub

6k stars
71 watching
780 forks
Language: Jupyter Notebook
last commit: 10 days ago
Linked from 2 awesome lists

overlapped-speech-detectionpretrained-modelspytorchspeaker-change-detectionspeaker-diarizationspeaker-embeddingspeaker-recognitionspeaker-verificationspeech-activity-detectionspeech-processingvoice-activity-detection

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
tyiannak/pyaudioanalysis A comprehensive Python library for feature extraction, classification, segmentation, and applications of audio data. 5,885
mahmoudashraf97/whisper-diarization Automates speaker diarization from audio recordings using OpenAI Whisper ASR and additional neural networks. 3,718
pytorch/audio A PyTorch module providing tools and functions for audio signal processing 2,538
speechbrain/speechbrain A PyTorch-based toolkit for building conversational AI systems with advanced speech and text processing capabilities. 8,922
lucidrains/musiclm-pytorch Implementation of Google's MusicLM model for music generation using attention networks and text-conditioning. 3,166
facebookresearch/audio2photoreal Generating photorealistic avatars from audio 2,709
mravanelli/pytorch-kaldi A toolkit for developing state-of-the-art deep learning-based speech recognition systems using PyTorch and Kaldi 2,367
facebookresearch/audiocraft A deep learning library for generating high-quality audio 20,969
ibab/tensorflow-wavenet An implementation of a WaveNet generative neural network architecture for audio generation 5,414
m-bain/whisperx An automatic speech recognition system with word-level timestamps and speaker diarization. 12,489
nvidia/waveglow Generates high-quality speech from mel-spectrograms using a flow-based network architecture 2,285
openai/whisper A general-purpose speech recognition system trained on large-scale weak supervision 71,257
enhuiz/vall-e An implementation of VALL-E in PyTorch for text-to-speech synthesis 2,964
lucidrains/dalle2-pytorch An implementation of DALL-E 2's text-to-image synthesis neural network in PyTorch 11,148
huggingface/transformers A collection of pre-trained machine learning models for various natural language and computer vision tasks, enabling developers to fine-tune and deploy these models on their own projects. 135,022