pyannote-audio

Speaker Diarization Toolkit

A toolkit for speaker diarization using PyTorch and speech activity detection.

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

GitHub

7k stars

73 watching

797 forks

Language: Jupyter Notebook

last commit: 8 months ago

Linked from 2 awesome lists

overlapped-speech-detectionpretrained-modelspytorchspeaker-change-detectionspeaker-diarizationspeaker-embeddingspeaker-recognitionspeaker-verificationspeech-activity-detectionspeech-processingvoice-activity-detection

Screenshot of pyannote/pyannote-audio website

pyannote.github.io

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
tyiannak/pyaudioanalysis	A comprehensive Python library for feature extraction, classification, segmentation, and applications of audio data.	5,918
mahmoudashraf97/whisper-diarization	Automates speaker diarization from audio recordings using OpenAI Whisper ASR and additional neural networks.	3,874
pytorch/audio	A PyTorch module providing tools and functions for audio signal processing	2,561
speechbrain/speechbrain	A PyTorch-based toolkit for building conversational AI systems with advanced speech and text processing capabilities.	9,066
lucidrains/musiclm-pytorch	Implementation of Google's MusicLM model for music generation using attention networks and text-conditioning.	3,189
facebookresearch/audio2photoreal	Generating photorealistic avatars from audio	2,715
mravanelli/pytorch-kaldi	Develops state-of-the-art speech recognition systems using PyTorch and Kaldi toolkits	2,370
facebookresearch/audiocraft	A deep learning library for generating high-quality audio	21,134
ibab/tensorflow-wavenet	An implementation of a WaveNet generative neural network architecture for audio generation	5,417
m-bain/whisperx	An automatic speech recognition system with word-level timestamps and speaker diarization.	12,894
nvidia/waveglow	Generates high-quality speech from mel-spectrograms using a flow-based network architecture	2,294
openai/whisper	A general-purpose speech recognition system trained on large-scale weak supervision	72,752
enhuiz/vall-e	An implementation of VALL-E in PyTorch for text-to-speech synthesis	2,970
lucidrains/dalle2-pytorch	An implementation of DALL-E 2's text-to-image synthesis neural network in PyTorch	11,184
huggingface/transformers	A collection of pre-trained machine learning models for various natural language and computer vision tasks, enabling developers to fine-tune and deploy these models on their own projects.	136,357