Edison Mucllari 2021 generative adversarial network (GAN)

Generative Adversarial Network

Blog image

Generative Adversarial Networks (GANs)

What is a GAN?
A Generative Adversarial Network (GAN) is a machine learning setup with two neural networks: a Generator (makes fakes) and a Discriminator (spots fakes). They compete, and both get better over time.

GAN Architecture Explained

Component Role Analogy
Generator (G) Creates fake data (like fake images) from random noise. Forger (makes counterfeits)
Discriminator (D) Tries to tell if data is real (from the dataset) or fake (from the generator). Detective (spots fakes)
How it works:
  • Generator makes fakes from random noise.
  • Discriminator sees both real and fake data, and tries to guess which is which.
  • Both networks learn and improve by competing with each other.

The Math Behind GANs (Simple Version)

The GAN Game:
$$ \min_G \max_D V(D, G) = \mathbb{E}_{x \sim p_{data}(x)} [\log D(x)] + \mathbb{E}_{z \sim p_z(z)} [\log(1 - D(G(z)))] $$
  • $x$: Real data from the dataset
  • $z$: Random noise input
  • $G(z)$: Fake data made by the generator
  • $D(x)$: Probability that $x$ is real
  • $D(G(z))$: Probability that the fake is real
What does this mean?
  • The discriminator tries to say "real" for real data and "fake" for fakes.
  • The generator tries to make fakes so good that the discriminator thinks they're real.

Why Does This Work?

  • At first, the generator makes poor fakes, and the discriminator easily spots them.
  • Over time, the generator learns to make better fakes, and the discriminator gets pickier.
  • Eventually, the generator's fakes are so good that the discriminator can't tell real from fake (it guesses 50/50).
Fun Fact: When GANs are trained well, even humans can have trouble telling the fakes from the real data!

Mathematical Proof: Why GANs Work

1. The Optimal Discriminator
For a fixed generator $G$, the best possible discriminator $D^*$ is:

$$ D^*(x) = \frac{p_{data}(x)}{p_{data}(x) + p_g(x)} $$
where $p_{data}(x)$ is the real data distribution and $p_g(x)$ is the generator's distribution.
1. Proof of the Optimal Discriminator
Given a fixed generator $G$, the GAN value function is: $$ V(D, G) = \mathbb{E}_{x \sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z \sim p_z}[\log(1 - D(G(z)))] $$ Let $p_g$ be the distribution of $G(z)$. Then: $$ V(D, G) = \int_x p_{\text{data}}(x) \log D(x) dx + \int_x p_g(x) \log(1 - D(x)) dx $$ We maximize $V(D, G)$ with respect to $D(x)$ for each $x$. Define: $$ f(D(x)) = p_{\text{data}}(x) \log D(x) + p_g(x) \log(1 - D(x)) $$ Take the derivative with respect to $D(x)$ and set to zero: $$ \frac{d}{dD(x)} f(D(x)) = \frac{p_{\text{data}}(x)}{D(x)} - \frac{p_g(x)}{1 - D(x)} = 0 $$ Solve for $D(x)$: $$ \frac{p_{\text{data}}(x)}{D(x)} = \frac{p_g(x)}{1 - D(x)} $$ $$ p_{\text{data}}(x)(1 - D(x)) = p_g(x) D(x) $$ $$ p_{\text{data}}(x) = D(x)(p_{\text{data}}(x) + p_g(x)) $$ $$ D^*(x) = \frac{p_{\text{data}}(x)}{p_{\text{data}}(x) + p_g(x)} $$ So, the optimal discriminator is: $$ \boxed{ D^*(x) = \frac{p_{\text{data}}(x)}{p_{\text{data}}(x) + p_g(x)} } $$
2. The Value Function at the Optimum
Plugging $D^*(x)$ back into the GAN objective, we get:

$$ V(G, D^*) = \mathbb{E}_{x \sim p_{data}} \left[ \log \frac{p_{data}(x)}{p_{data}(x) + p_g(x)} \right] + \mathbb{E}_{x \sim p_g} \left[ \log \frac{p_g(x)}{p_{data}(x) + p_g(x)} \right] $$
This can be shown to be equal to:
$$ -\log(4) + 2 \cdot \text{JSD}(p_{data} \| p_g) $$ where JSD is the Jensen-Shannon divergence, a way of measuring how different two distributions are.
2. Proof of the Value Function at the Optimum
Plug $D^*(x)$ into the value function: $$ V(G, D^*) = \mathbb{E}_{x \sim p_{\text{data}}}[\log D^*(x)] + \mathbb{E}_{x \sim p_g}[\log(1 - D^*(x))] $$ Recall: - $D^*(x) = \frac{p_{\text{data}}(x)}{p_{\text{data}}(x) + p_g(x)}$ - $1 - D^*(x) = \frac{p_g(x)}{p_{\text{data}}(x) + p_g(x)}$ So, $$ V(G, D^*) = \int_x p_{\text{data}}(x) \log \frac{p_{\text{data}}(x)}{p_{\text{data}}(x) + p_g(x)} dx + \int_x p_g(x) \log \frac{p_g(x)}{p_{\text{data}}(x) + p_g(x)} dx $$ Let $m(x) = \frac{1}{2}(p_{\text{data}}(x) + p_g(x))$. The Jensen-Shannon divergence is: $$ \mathrm{JSD}(p_{\text{data}} \| p_g) = \frac{1}{2} \int_x p_{\text{data}}(x) \log \frac{p_{\text{data}}(x)}{m(x)} dx + \frac{1}{2} \int_x p_g(x) \log \frac{p_g(x)}{m(x)} dx $$ Notice that: $$ \log \frac{p_{\text{data}}(x)}{p_{\text{data}}(x) + p_g(x)} = \log \frac{p_{\text{data}}(x)}{2 m(x)} = \log \frac{p_{\text{data}}(x)}{m(x)} - \log 2 $$ (similar for $p_g(x)$). Therefore, $$ V(G, D^*) = 2 \cdot \mathrm{JSD}(p_{\text{data}} \| p_g) - 2 \log 2 $$ or, $$ V(G, D^*) = -\log 4 + 2 \cdot \mathrm{JSD}(p_{\text{data}} \| p_g) $$ So, at the optimum, the GAN objective is minimized when the generator's distribution matches the real data, and the minimum value is achieved when the Jensen-Shannon divergence is zero.
Conclusion:
  • The generator gets better by making $p_g$ (the fake data) as close as possible to $p_{data}$ (the real data).
  • When $p_g = p_{data}$, the discriminator can't tell the difference and always outputs 0.5.
  • This is what it means for the GAN to "work" – the generator has learned to make perfect fakes!

References