audio2photoreal

Avatar generation

Generating photorealistic avatars from audio

Code and dataset for photorealistic Codec Avatars driven from audio

GitHub

3k stars
31 watching
261 forks
Language: Python
last commit: 5 months ago

Related projects:

Repository Description Stars
facebookresearch/imagebind An AI framework that combines data from multiple sources into a single embedding space, enabling various applications such as cross-modal retrieval and generation. 8,424
sebastianstarke/ai4animation A deep learning framework for data-driven character animation in Unity3D 7,927
zejun-yang/aniportrait An open-source framework for generating photorealistic animations driven by audio and reference images. 4,718
facebookresearch/dinov2 A PyTorch implementation of a self-supervised learning method for learning robust visual features without supervision. 9,425
facebookresearch/pytorch3d A deep learning library for 3D data processing and computer vision research using PyTorch 8,889
pyannote/pyannote-audio A toolkit for speaker diarization using PyTorch and speech activity detection. 6,508
facebookresearch/sam2 A software framework for video segmentation in images and videos using AI models 13,054
facebookresearch/ca_body A Python implementation of a neural network architecture for image avatar body generation 47
facebookresearch/audiocraft A deep learning library for generating high-quality audio 21,134
huggingface/lerobot A platform providing pre-trained models, datasets, and tools for robotics with focus on imitation learning and reinforcement learning. 7,874
facebookresearch/eft Provides pseudo-GT 3D human pose data and pre-trained models for training 3D pose estimation algorithms 379
nvidia/vid2vid A PyTorch implementation of a video-to-video translation method for generating photorealistic videos from semantic label maps or other input data. 8,623
tyiannak/pyaudioanalysis A comprehensive Python library for feature extraction, classification, segmentation, and applications of audio data. 5,918
pytorch/audio A PyTorch module providing tools and functions for audio signal processing 2,561
nvidia/waveglow Generates high-quality speech from mel-spectrograms using a flow-based network architecture 2,294