# Machine Learning

Generative Models ：PixelRNN and PixelCNN ； Variational Autoencoders (VAE) ； Generative Adversarial Networks (GAN)

## Generative Models

Given training data, generate new samples from same distribution:

Several flavors:

• Explicit density estimation: explicitly define and solve for $p_{model}(x)$
• Implicit density estimation: learn model that can sample from $p_{model}(x)$ w/o explicitly defining it

### Why Generative Models?

• Realistic samples for artwork, super-resolution, colorization, etc.
• Generative models of time-series data can be used for simulation and planning (reinforcement learning applications!)
• Training generative models can also enable inference of latent representations that can be useful as general features

## PixelRNN and PixelCNN

### PixelRNN

• Generate image pixels starting from corner
• Dependency on previous pixels modeled using an RNN (LSTM)

Drawback: sequential generation is slow!

### PixelCNN

• Still generate image pixels starting from corner
• Dependency on previous pixels now modeled using a CNN over context region

Training: maximize likelihood of training images

Training is faster than PixelRNN (can parallelize convolutions since context region values known from training images) Generation must still proceed sequentially => still slow

## Variational Autoencoders (VAE)

PixelCNNs define tractable density function, optimize likelihood of training data:

VAEs define intractable density function with latent z:

Cannot optimize directly, derive and optimize lower bound on likelihood instead.

### Background: Autoencoders

Unsupervised approach for learning a lower-dimensional feature representation from unlabeled training data:

How to learn this feature representation?
Train such that features can be used to reconstruct original data “Autoencoding” - encoding itself

L2 Loss function:
$$||x - \widetilde{x}||^2$$

### Variational Autoencoders

Assume training data $\{x^{(i)}\}_{i=1}^{N}$ is generated from underlying unobserved (latent) representation z :

_Intuition (remember from autoencoders!): x is an image, z is latent factors used to generate x: attributes, orientation, etc._

We want to estimate the true parameters of this generative model. How should we represent this model?
Choose prior p(z) to be simple, e.g. Gaussian. Conditional p(x|z) is complex (generates image) => represent with neural network.

How to train the model?
Remember strategy for training generative models from FVBNs. Learn model parameters to maximize likelihood of training data.

• $p_θ(z)$ : Simple Gaussian prior
• $p_θ(x|z)$ : Decoder neural network

Solution: In addition to decoder network modeling $p_θ(x|z)$, define additional encoder network $q_Ψ(z|x)$ that approximates $p_θ(z|x)$ .This allows us to derive a lower bound on the data likelihood that is tractable, which we can optimize.

Putting it all together: maximizing the likelihood lower bound

#### Generating Data

Use decoder network. Now sample z from prior!

#### Summary

Probabilistic spin to traditional autoencoders => allows generating data Defines an intractable density => derive and optimize a (variational) lower bound

• Pros: - Principled approach to generative models - Allows inference of q(z|x), can be useful feature representation for other tasks
• Cons: - Maximizes lower bound of likelihood: okay, but not as good evaluation as PixelRNN/PixelCNN - Samples blurrier and lower quality compared to state-of-the-art (GANs)

## GAN

Don’t work with any explicit density function! Instead, take game-theoretic approach: learn to generate from training distribution through 2-player game.

• Generator network: try to fool the discriminator by generating real-looking images

• Discriminator network: try to distinguish between real and fake images

### Training GANs: Two-player game

• Discriminator $θ_d$ wants to maximize objective such that D(x) is close to 1 (real) and D(G(z)) is close to 0 (fake)
• Generator $θ_g$ wants to minimize objective such that D(G(z)) is close to 1 (discriminator is fooled into thinking generated G(z) is real)

Alternate between:

_In practice, optimizing this generator objective does not work well!_

Instead of minimizing likelihood of discriminator being correct, now maximize likelihood of discriminator being wrong.
Gradient ascent on generator, different objective:

Putting it together: GAN training algorithm

After training, use generator network to generate new images.

### Summary

Don’t work with an explicit density function.
Take game-theoretic approach: learn to generate from training distribution through 2-player game
Pros:

• Beautiful, state-of-the-art samples!

Cons:

• Trickier / more unstable to train
• Can’t solve inference queries such as p(x), p(z|x)

Active areas of research:

• Better loss functions, more stable training (Wasserstein GAN, LSGAN, many others)
• Conditional GANs, GANs for all kinds of applications

## Recap

Generative Models

• PixelRNN and PixelCNN : Explicit density model, optimizes exact likelihood, good samples. But inefficient sequential generation.
• Variational Autoencoders (VAE) : Optimize variational lower bound on likelihood. Useful latent representation, inference queries. But current sample quality not the best.

• Generative Adversarial Networks (GANs) : Game-theoretic approach, best samples! But can be tricky and unstable to train, no inference queries.