Learning / Courses |
Practical Deep Learning for Coders (fast.ai) | | | |
Deep Learning (NYU) | | | |
Introduction to Deep Learning (CMU) | | | |
Deep Learning for Computer Vision (Stanford CS231n) | | | |
Natural Language Processing with Deep Learning (Stanford CS224n) | | | |
Deep Generative Models (Stanford) | | | |
Deep Unsupervised Learning (UC Berkeley) | | | |
Differentiable Inference and Generative Models (Toronto) | | | |
Learning Discrete Latent Structure (Toronto) | | | |
From Deep Learning Foundations to Stable Diffusion (fast.ai) | | | |
Machine Learning for the Web (ITP/NYU) | 406 | 8 months ago | |
Art and Machine Learning (CMU) | | | |
New Media Installation: Art that Learns (CMU) | | | |
|
Media course | 1 | almost 2 years ago | |
Code course | 6 | almost 2 years ago | |
Learning / Videos |
I Created a Neural Network and Tried Teaching it to Recognize Doodles (Sebastian Lague) | | | |
Neural Network Series (3Blue1Brown) | | | |
Beginner's Guide to Machine Learning in JavaScript (Coding Train) | | | |
Two Minute Papers | | | |
Learning / Books |
Deep Learning (Goodfellow, Bengio, and Courville) | | | |
Computer Vision: Algorithms and Applications (Szeliski) | | | |
Procedural Content Generation in Games (Shaker, Togelius, and Nelson) | | | |
Generative Design (Benedikt Groß) | | | |
Learning / Tutorials and Blogs |
Tutorial on Deep Generative Models (IJCAI-ECAI 2018) | | | |
Tutorial on GANs (CVPR 2018) | | | |
Lil'Log (Lilian Weng) | | | |
Distill [on hiatus] | | | |
Book of Shaders: Generative Designs | | | |
Mike Bostock: Visualizing Algorithms | | | (with ) |
Generative Examples in Processing | 63 | almost 7 years ago | |
Generative Music | | | |
Learning / Papers/Methods |
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations | | | : Paper predating Stable Diffusion describing a method for image synthesis and editing with diffusion based models |
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models | | | |
High-Resolution Image Synthesis with Latent Diffusion Models | | | : Original paper that introduced Stable Diffusion and started it all |
Prompt-to-Prompt Image Editing with Cross-Attention Control | | | : Edit Stable Diffusion outputs by editing the original prompt |
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion | | | : Similar to prompt-to-prompt but instead takes an input image and a text description. Kinda like Style Transfer... but with Stable diffusion |
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation | | | : Similar to Textual Inversion but instead focused on manipulating subject based images (i.e. but ) |
Novel View Synthesis with Diffusion Models | | | |
AudioGen: Textually Guided Audio Generation | | | |
Make-A-Video: Text-to-Video Generation without Text-Video Data | | | |
Imagic: Text-Based Real Image Editing with Diffusion Models | | | |
MDM: Human Motion Diffusion Model | | | |
Soft Diffusion: Score Matching for General Corruptions | | | |
Multi-Concept Customization of Text-to-Image Diffusion | | | : Like DreamBooth but capable of synthesizing multiple concepts |
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers | | | |
Elucidating the Design Space of Diffusion-Based Generative Models (EDM) | 1,399 | 8 months ago | |
Tackling the Generative Learning Trilemma with Denoising Diffusion GANs | | | |
Imagen Video: High Definition Video Generation with Diffusion Models | | | |
Structure-from-Motion Revisited | | | : prior work on sparse modeling (still needed/useful for NeRF) |
Pixelwise View Selection for Unstructured
Multi-View Stereo | | | : prior work on dense modeling (NeRF kinda replaces this) |
DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation | | | |
Deferred Neural Rendering: Image Synthesis using Neural Textures | | | |
Neural Volumes: Learning Dynamic Renderable Volumes from Images | | | |
Neural Radiance Fields for Unconstrained Photo Collections | | | : NeRF in the wild (alternative to MVS) |
Nerfies: Deformable Neural Radiance Fields | | | : Photorealistic NeRF from casual in-the-wild photos and videos (like from a cellphone) |
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields | | | : NeRF... but BETTER FASTER HARDER STRONGER |
Depth-supervised NeRF: Fewer Views and Faster Training for Free | | | : Train NeRF models faster with fewer images by leveraging depth information |
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding | | | : caching for NeRF training to make it rlllly FAST |
Understanding Pure CLIP Guidance for Voxel Grid NeRF Models | | | : text-to-3D using CLIP |
NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance Fields | | | : NeRF for robots (and cars) |
nerf2nerf: Pairwise Registration of Neural Radiance Fields | | | : pretrained NeRF |
The One Where They Reconstructed 3D Humans and Environments in TV Shows | | | |
ClimateNeRF: Physically-based Neural Rendering for Extreme Climate Synthesis | | | |
Realistic one-shot mesh-based head avatars | | | |
Neural Point Catacaustics for Novel-View Synthesis of Reflections | | | |
3D Moments from Near-Duplicate Photos | | | |
NeRDi: Single-View NeRF Synthesis with Language-Guided Diffusion as General Image Priors | | | |
Learning / 3D and point clouds |
DreamFusion: Text-to-3D using 2D Diffusion (Google) | | | |
ULIP: Learning Unified Representation of Language, Image and Point Cloud for 3D Understanding (Salesforce) | | | |
Extracting Triangular 3D Models, Materials, and Lighting From Images (NVIDIA) | | | |
GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images (NVIDIA) | | | |
3D Neural Field Generation using Triplane Diffusion | | | |
🎠 MagicPony: Learning Articulated 3D Animals in the Wild | | | |
ObjectStitch: Generative Object Compositing (Adobe) | | | |
LADIS: Language Disentanglement for 3D Shape Editing (Snap) | | | |
Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion (Microsoft) | | | |
SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation (Snap) | | | |
DiffRF: Rendering-guided 3D Radiance Field Diffusion (Meta) | | | |
Novel View Synthesis with Diffusion Models (Google) | | | |
Learning / Unconditional Image Synthesis |
Sampling Generative Networks | | | |
Neural Discrete Representation Learning (VQVAE) | | | |
Progressive Growing of GANs for Improved Quality, Stability, and Variation | | | |
A Style-Based Generator Architecture for Generative Adversarial Networks (StyleGAN) | | | |
Training Generative Adversarial Networks with Limited Data (StyleGAN2-ADA) | 4,126 | 6 months ago | |
Alias-Free Generative Adversarial Networks (StyleGAN3) | 6,438 | about 1 year ago | |
Generating Diverse High-Fidelity Images with VQ-VAE-2 | | | |
Taming Transformers for High-Resolution Image Synthesis (VQGAN) | | | |
Diffusion Models Beat GANs on Image Synthesis | | | |
StyleNAT: Giving Each Head a New Perspective | | | |
StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets | | | |
Learning / Conditional Image Synthesis (and inverse problems) |
Image-to-Image Translation with Conditional Adversarial Nets (pix2pix) | | | |
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (CycleGAN) | | | |
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs (pix2pixHD) | | | |
Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects (SESAME) | | | |
Semantic Image Synthesis with Spatially-Adaptive Normalization (SPADE) | 7,610 | over 1 year ago | |
You Only Need Adversarial Supervision for Semantic Image Synthesis (OASIS) | | | |
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation | | | |
Multimodal Conditional Image Synthesis with Product-of-Experts GANs | | | |
Palette: Image-to-Image Diffusion Models | | | |
Sketch-Guided Text-to-Image Diffusion Models | | | |
HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation | 235 | 3 months ago | |
PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation | | | |
MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation | 269 | 3 months ago | |
Pretraining is All You Need for Image-to-Image Translation (PITI) | | | |
Learning / GAN inversion (and editing) |
Generative Visual Manipulation on the Natural Image Manifold (iGAN) | | | |
In-Domain GAN Inversion for Real Image Editing | | | |
Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? | | | |
Designing an Encoder for StyleGAN Image Manipulation | 945 | over 1 year ago | |
Pivotal Tuning for Latent-based Editing of Real Images | 906 | 4 months ago | |
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery | 4,000 | over 1 year ago | |
High-Fidelity GAN Inversion for Image Attribute Editing | | | |
Swapping Autoencoder for Deep Image Manipulation | | | |
Sketch Your Own GAN | | | |
Rewriting Geometric Rules of a GAN | | | |
Anycost GANs for Interactive Image Synthesis and Editing | | | |
Third Time’s the Charm? Image and Video Editing with StyleGAN3 | | | |
Learning / Latent Space Interpretation |
Interpreting the Latent Space of GANs for Semantic Face Editing | | | |
GAN Dissection: Visualizing and Understanding Generative Adversarial Networks | | | |
Unsupervised Extraction of StyleGAN Edit Directions (CLIP2StyleGAN) | 86 | almost 2 years ago | |
Seeing What a GAN Cannot Generate | | | |
Learning / Image Matting |
Deep Image Matting | | | |
Background Matting: The World is Your Green Screen | | | |
Robust Video Matting | 8,610 | 8 months ago | |
Semantic Image Matting | | | |
Privacy-Preserving Portrait Matting | | | |
Deep Automatic Natural Image Matting | | | |
MatteFormer | 221 | about 1 year ago | |
MODNet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition | 3,819 | 7 months ago | |
|
NVIDIA Imaginaire | 4,015 | almost 2 years ago | : 2D Image synthesis library |
NVIDIA Omniverse | | | : The platform for creating and operating metaverse applications |
mmgeneration | 1,909 | about 1 year ago | |
Modelverse | | | : Content-Based Search for Deep Generative Models |
PaddleGAN | 7,902 | 5 months ago | |
|
Tensorflow.js | | | |
ml5.js | | | |
MediaPipe | | | |
Wekinator | | | |
ofxAddons | | | |
|
Keras | | | |
Tensorflow | | | |
🤗 Transformers | | | |
🤗 Diffusers | 26,223 | 4 days ago | |
JAX | 30,499 | 7 days ago | |
dlib | | | |
Darknet | | | |
|
FFCV: an Optimized Data Pipeline for Accelerating ML Training | | | |
ONNX Runtime | | | |
DeepSpeed (training, inference, compression) | 35,463 | 7 days ago | |
TensorRT | | | |
Tensorflow Lite | | | |
TorchScript | | | |
TorchServe | | | |
AITemplate | 4,561 | 29 days ago | |
Tools / Text-to-Image |
Imagen | 8,088 | about 2 months ago | |
DALLE 2 | 11,148 | 6 months ago | |
VQGAN+CLIP | 350 | over 2 years ago | |
Parti | 1,548 | over 2 years ago | |
Muse: Text-To-Image Generation via Masked Generative Transformers | | | : More efficient than diffusion or autoregressive text-to-image models used masked image modeling w/ transformers |
|
Dream Studio | | | : Official cloud hosted service |
features | 142,886 | 15 days ago | ⭐️ : A user friendly UI for SD with additional to make common workflows easy |
AI render (Blender) | | | : Render scenes in Blender using a text prompt |
Dream Textures (Blender) | 7,832 | 3 months ago | : Plugin to render textures, reference images, and background with SD |
lexica.art | | | SD Prompt Search |
koi (Krita) | 445 | over 1 year ago | : SD plugin for for img2img generation |
Alpaca (Photoshop) | | | : Photoshop plugin (beta) |
Christian Cantrell's Plugin (Photoshop) | | | : Another Photoshop plugin |
Stable Diffusion Studio | 478 | about 2 years ago | : Animation focused frontend for SD |
DeepSpeed-MII | 1,898 | 13 days ago | : Low-latency and high-throughput inference for a variety (20,000+) models/tasks, including SD |
|
COLMAP | | | |
NVlabs/instant-ngp | 16,033 | 14 days ago | |
NerfAcc | | | |
|
openFrameworks (C++) | | | |
Cinder (C++) | | | |
nannou (Rust) | | | |
vvvv | | | |
Max/MSP/Jitter | | | |
Pure Data | | | |
Datasets / Permissively Licensed/Open Access |
LAION Datasets | 235 | about 2 years ago | : Various very large scale image-text pairs datasets (notably used to train the open source models) |
LAION-Face | 278 | almost 2 years ago | |
Unsplash Images | | | |
Pixabay | | | |
Pexels | | | |
Open Images | | | : Open Images is a dataset of ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives: |
Mozilla Common Voice | | | : 17,127 validated hours of transcribed speech covering 104 languages. Additionally many of the recorded hours in the dataset also include demographic metadata like age, sex, and accent that can help improve the accuracy of speech recognition engines |
Flickr Commons | | | : Flickr Commons is a unique collection of historical photography from over 100 cultural institutions from all around the world, all with no known copyright restrictions |
Internet Archive | | | : Internet Archive is a non-profit library of millions of free books, movies, software, music, websites, and more |
Wikimedia Commons | | | : a collection of 106,323,506 freely usable media files to which anyone can contribute |
Prelinger Archives | | | |
Getty Library Open Content Program | | | : Making images from Getty’s collections freely available for study, teaching, and enjoyment |
Smithsonian Open Access | | | |
Public Domain Review | | | : Focused on works now fallen into the public domain, the vast commons of out-of-copyright material that everyone is free to enjoy, share, and build upon without restrictions |
Library of Congress | | | |
Biodiversity Heritage Library | | | |
The Met Open Access | | | |
The National Gallery of Art Open Access | | | |
Art Institute of Chicago Open Access | | | |
NY Public Library Public Domain Collections | | | |
Museum für Kunst und Gewerbe Hamburg Steintorplatz | | | |
FairFace | | | |
Conceptual Captions | | | |
Quick, Draw! | | | |
Open Images | | | |
Visual Question Answering | | | |
TensorFlow Flowers | | | |
Stanford Online Products dataset | 342 | about 5 years ago | |
DeepMind 3d Shapes | 135 | 8 months ago | |
PASS | | | : An ImageNet replacement for self-supervised pretraining without humans which can be used for high-quality pretraining while significantly reducing privacy concerns |
Datasets / Faces/People (restricted licenses) |
Labeled Faces in the Wild (LFW) | | | |
CelebA | | | |
LFWA+ | | | |
CelebAMask-HQ | 2,123 | 5 months ago | |
CelebA-Spoof | 535 | over 3 years ago | |
UTKFace | | | |
SSHQ | | | : full body 1024 x 512px |
Datasets / Other |
Brutus Light Field | 38 | almost 2 years ago | |
Products/Apps |
Artbreeder | | | |
Midjourney | | | |
DALLE 2 (OpenAI) | | | |
Runway | | | AI powered video editor |
Facet AI | | | AI powered image editor |
Adobe Sensei | | | AI powered features for the Creative Cloud suite |
NVIDIA AI Demos | | | |
ClipDrop | | | and |
Artists |
Memo Akten | | | |
Neural Bricolage (helena sarin) | | | |
Sofia Crespo | | | |
Lauren McCarthy | | | |
Philipp Schmitt | | | |
Anna Ridler | | | |
Tom White | | | |
Ivona Tau | | | |
Trevor Paglen | | | |
Sasha Stiles | | | |
Mario Klingemann | | | |
Tega Brain | | | |
Mimi Onuoha | | | |
Allison Parrish | | | |
Caroline Sinders | | | |
Robbie Barrat | | | |
Kyle McDonald | | | |
Golan Levin | | | |
Institutions/Places |
STUDIO for Creative Inquiry | | | |
ITP @ NYU | | | |
Gray Area Foundation for the Arts | | | |
Stability AI (Eleuther, LAION, et al.) | | | |
Goldsmiths @ University of London | | | |
UCLA Design Media Arts | | | |
Berkeley Center for New Media | | | |
Google Artists and Machine Intelligence | | | |
Google Creative Lab | | | |
The Lab at the Google Cultural Institute | | | |
Tokyo | | | Sony CSL ( and ) |
|
Machine Learning for Art | | | |
Tools and Resources for AI Art (pharmapsychotic) | | | Big list of Google Colab notebooks for generative text-to-image techniques as well as general tools and resources |
Awesome Generative Deep Art | 2,538 | 8 days ago | A curated list of Generative Deep Art / Generative AI projects, tools, artworks, and models |