Generative Adversarial Networks

2018-06-04

GAN

Generative Models ：PixelRNN and PixelCNN ； Variational Autoencoders (VAE) ； Generative Adversarial Networks (GAN)

Generative Models

Given training data, generate new samples from same distribution:

Several flavors:

Explicit density estimation: explicitly define and solve for $p_{model}(x)$
Implicit density estimation: learn model that can sample from $p_{model}(x)$ w/o explicitly defining it

Why Generative Models?

Realistic samples for artwork, super-resolution, colorization, etc.
Generative models of time-series data can be used for simulation and planning (reinforcement learning applications!)
Training generative models can also enable inference of latent representations that can be useful as general features

PixelRNN and PixelCNN

PixelRNN

Generate image pixels starting from corner
Dependency on previous pixels modeled using an RNN (LSTM)

Drawback: sequential generation is slow!

PixelCNN

Still generate image pixels starting from corner
Dependency on previous pixels now modeled using a CNN over context region

Training: maximize likelihood of training images

Training is faster than PixelRNN (can parallelize convolutions since context region values known from training images) Generation must still proceed sequentially => still slow

Variational Autoencoders (VAE)

PixelCNNs define tractable density function, optimize likelihood of training data:

VAEs define intractable density function with latent z:

Cannot optimize directly, derive and optimize lower bound on likelihood instead.

Background: Autoencoders

Unsupervised approach for learning a lower-dimensional feature representation from unlabeled training data:

How to learn this feature representation?
Train such that features can be used to reconstruct original data “Autoencoding” - encoding itself

L2 Loss function:
$$||x - \widetilde{x}||^2$$

Variational Autoencoders

Assume training data $\{x^{(i)}\}_{i=1}^{N}$ is generated from underlying unobserved (latent) representation z :

_Intuition (remember from autoencoders!): x is an image, z is latent factors used to generate x: attributes, orientation, etc._

We want to estimate the true parameters of this generative model. How should we represent this model?
Choose prior p(z) to be simple, e.g. Gaussian. Conditional p(x|z) is complex (generates image) => represent with neural network.

How to train the model?
Remember strategy for training generative models from FVBNs. Learn model parameters to maximize likelihood of training data.

$p_θ(z)$ : Simple Gaussian prior
$p_θ(x|z)$ : Decoder neural network

Solution: In addition to decoder network modeling $p_θ(x|z)$, define additional encoder network $q_Ψ(z|x)$ that approximates $p_θ(z|x)$ .This allows us to derive a lower bound on the data likelihood that is tractable, which we can optimize.

Putting it all together: maximizing the likelihood lower bound

Generating Data

Use decoder network. Now sample z from prior!

Summary

Probabilistic spin to traditional autoencoders => allows generating data Defines an intractable density => derive and optimize a (variational) lower bound

Pros: - Principled approach to generative models - Allows inference of q(z|x), can be useful feature representation for other tasks
- Cons: - Maximizes lower bound of likelihood: okay, but not as good evaluation as PixelRNN/PixelCNN - Samples blurrier and lower quality compared to state-of-the-art (GANs)

GAN

Don’t work with any explicit density function! Instead, take game-theoretic approach: learn to generate from training distribution through 2-player game.

Generator network: try to fool the discriminator by generating real-looking images
Discriminator network: try to distinguish between real and fake images

Training GANs: Two-player game

Discriminator $θ_d$ wants to maximize objective such that D(x) is close to 1 (real) and D(G(z)) is close to 0 (fake)
Generator $θ_g$ wants to minimize objective such that D(G(z)) is close to 1 (discriminator is fooled into thinking generated G(z) is real)

Alternate between:

Gradient ascent on discriminator
Gradient descent on generator

_In practice, optimizing this generator objective does not work well!_

Instead of minimizing likelihood of discriminator being correct, now maximize likelihood of discriminator being wrong.
Gradient ascent on generator, different objective:

Putting it together: GAN training algorithm

After training, use generator network to generate new images.

Summary

Don’t work with an explicit density function.
Take game-theoretic approach: learn to generate from training distribution through 2-player game
Pros:

Beautiful, state-of-the-art samples!

Cons:

Trickier / more unstable to train
Can’t solve inference queries such as p(x), p(z|x)

Active areas of research:

Better loss functions, more stable training (Wasserstein GAN, LSGAN, many others)
Conditional GANs, GANs for all kinds of applications

Recap

Generative Models

PixelRNN and PixelCNN : Explicit density model, optimizes exact likelihood, good samples. But inefficient sequential generation.
Variational Autoencoders (VAE) : Optimize variational lower bound on likelihood. Useful latent representation, inference queries. But current sample quality not the best.
Generative Adversarial Networks (GANs) : Game-theoretic approach, best samples! But can be tricky and unstable to train, no inference queries.

Helic He

Machine Learning

Generative Adversarial Networks

Generative Models

Why Generative Models?

PixelRNN and PixelCNN

PixelRNN

PixelCNN

Variational Autoencoders (VAE)

Background: Autoencoders

Variational Autoencoders

Generating Data

Summary

GAN

Training GANs: Two-player game

Summary

Recap