Generative Adversarial Networks

Generative Models :PixelRNN and PixelCNN ; Variational Autoencoders (VAE) ; Generative Adversarial Networks (GAN)

Generative Models

Given training data, generate new samples from same distribution:


Several flavors:

  • Explicit density estimation: explicitly define and solve for $p_{model}(x)$
  • Implicit density estimation: learn model that can sample from $p_{model}(x)$ w/o explicitly defining it

Why Generative Models?

  • Realistic samples for artwork, super-resolution, colorization, etc.
  • Generative models of time-series data can be used for simulation and planning (reinforcement learning applications!)
  • Training generative models can also enable inference of latent representations that can be useful as general features

PixelRNN and PixelCNN

PixelRNN

  • Generate image pixels starting from corner
  • Dependency on previous pixels modeled using an RNN (LSTM)

Drawback: sequential generation is slow!

PixelCNN

  • Still generate image pixels starting from corner
  • Dependency on previous pixels now modeled using a CNN over context region

Training: maximize likelihood of training images

enter image description here

Training is faster than PixelRNN (can parallelize convolutions since context region values known from training images) Generation must still proceed sequentially => still slow

Variational Autoencoders (VAE)

PixelCNNs define tractable density function, optimize likelihood of training data:

VAEs define intractable density function with latent z:

Cannot optimize directly, derive and optimize lower bound on likelihood instead.

Background: Autoencoders

Unsupervised approach for learning a lower-dimensional feature representation from unlabeled training data:

How to learn this feature representation?
Train such that features can be used to reconstruct original data “Autoencoding” - encoding itself

L2 Loss function:
$$||x - \widetilde{x}||^2$$

Variational Autoencoders

Assume training data $\{x^{(i)}\}_{i=1}^{N}$ is generated from underlying unobserved (latent) representation z :

_Intuition (remember from autoencoders!): x is an image, z is latent factors used to generate x: attributes, orientation, etc._

We want to estimate the true parameters of this generative model. How should we represent this model?
Choose prior p(z) to be simple, e.g. Gaussian. Conditional p(x|z) is complex (generates image) => represent with neural network.

How to train the model?
Remember strategy for training generative models from FVBNs. Learn model parameters to maximize likelihood of training data.

  • $p_θ(z)$ : Simple Gaussian prior
  • $p_θ(x|z)$ : Decoder neural network

Solution: In addition to decoder network modeling $p_θ(x|z)$, define additional encoder network $q_Ψ(z|x)$ that approximates $p_θ(z|x)$ .This allows us to derive a lower bound on the data likelihood that is tractable, which we can optimize.

Putting it all together: maximizing the likelihood lower bound

Generating Data

Use decoder network. Now sample z from prior!

Summary

Probabilistic spin to traditional autoencoders => allows generating data Defines an intractable density => derive and optimize a (variational) lower bound

  • Pros: - Principled approach to generative models - Allows inference of q(z|x), can be useful feature representation for other tasks
    • Cons: - Maximizes lower bound of likelihood: okay, but not as good evaluation as PixelRNN/PixelCNN - Samples blurrier and lower quality compared to state-of-the-art (GANs)

GAN

Don’t work with any explicit density function! Instead, take game-theoretic approach: learn to generate from training distribution through 2-player game.

  • Generator network: try to fool the discriminator by generating real-looking images

  • Discriminator network: try to distinguish between real and fake images

Training GANs: Two-player game

  • Discriminator $θ_d$ wants to maximize objective such that D(x) is close to 1 (real) and D(G(z)) is close to 0 (fake)
  • Generator $θ_g$ wants to minimize objective such that D(G(z)) is close to 1 (discriminator is fooled into thinking generated G(z) is real)

Alternate between:

  1. Gradient ascent on discriminator
  2. Gradient descent on generator

_In practice, optimizing this generator objective does not work well!_

Instead of minimizing likelihood of discriminator being correct, now maximize likelihood of discriminator being wrong.
Gradient ascent on generator, different objective:

Putting it together: GAN training algorithm

After training, use generator network to generate new images.

Summary

Don’t work with an explicit density function.
Take game-theoretic approach: learn to generate from training distribution through 2-player game
Pros:

  • Beautiful, state-of-the-art samples!

Cons:

  • Trickier / more unstable to train
  • Can’t solve inference queries such as p(x), p(z|x)

Active areas of research:

  • Better loss functions, more stable training (Wasserstein GAN, LSGAN, many others)
  • Conditional GANs, GANs for all kinds of applications

Recap

Generative Models

  • PixelRNN and PixelCNN : Explicit density model, optimizes exact likelihood, good samples. But inefficient sequential generation.
  • Variational Autoencoders (VAE) : Optimize variational lower bound on likelihood. Useful latent representation, inference queries. But current sample quality not the best.

  • Generative Adversarial Networks (GANs) : Game-theoretic approach, best samples! But can be tricky and unstable to train, no inference queries.