论文笔记之：Generative Adversarial Nets

Generative Adversarial Nets

NIPS 2014

　　摘要：本文通过对抗过程，提出了一种新的框架来预测产生式模型，我们同时训练两个模型：一个产生式模型 G，该模型可以抓住数据分布；还有一个判别式模型 D 可以预测来自训练样本　而不是 G 的样本的概率．训练 G 的目的是让 D 尽可能的犯错误，让其无法判断一个图像是产生的，还是来自训练样本．这个框架对应了一个 minimax two-player game. 也就是，一方得势，必然对应另一方失势，不存在两方共赢的局面，这个就是这个游戏的规则和属性。当任意函数 G 和 D的空间，存在一个特殊的解，G 恢复出训练数据的分布，D 在任何地方都等于 1/2 。当 G 和 D 定义为 multilayer perceptrons, 整个系统可以通过 BP 算法来进行训练。在训练或者产生样本的过程中，不需要马尔科夫链或者 unrolled approximate inference network 。

　　引言：深度学习的希望是发现丰富的，等级模式，代表在人工只能应用中遇到的数据的分布，像 natural images，audio waveforms 包含 speech, 自然语言库的 symbols。到目前为止，最有影响力的 DL 的应用已经涉及到 discriminative models，通常都是将高维，丰富的输入到一个类别标签。 Deep discriminative models 没有那么大的影响力，因为预测许多很难搞定的概率计算是相当困难的，例如：最大似然估计和相关的策略；由于结合 piecewise linear units 的优势也很困难。我们提出了一种新的 generative model estimation procedure 避开了这些困难。

　　在这个提出的 adversarial nets framework 中，产生式模型需要和一个敌手进行对抗：一个 discriminative model 需要学习是否是一个样本是来自于 model distribution 或者是 data distribution 。这个产生式模型需要看作是造假的团伙，企图制造假币；而 discriminative model 类似于警察，试着检查出假钞。这个游戏竞争的结果就是，使得两个队伍的不断的改善其自身的模型，而产生的假钞变成名副其实的艺术品。（做到真假难辨）

　　这个 framework 可以产生用于许多类别的模型和优化算法特定的 training algorithm 。我们探索一种特殊的情况，称为 adversarial nets。

　　Adversarial nets :

　　The adversarial modeling framework 是最直接的方式，当 models 都是多层感知机（multilayer perceptrons）。为了在数据 x 上学习到 generator 的分布 $p_g$，我们在输入 noise variable $p_z(z)$ 定义一个 prior，然后表示到 data space 的 $G(z; \theta_g)$ 一个 mapping，其中 G 是一个 differentiable function，由多层感知机 $D(x; \theta_d)$ 表示。D（x）表示 x 来自 data 而非 $p_g$ 的概率。我们训练 D 来最大化赋予 training example 和来自 G 的样本的概率。我们同时训练 G 来最小化 $log(1-D(G(z))): $

　　换句话说，就是 D 和 G 采用下面的 two-player minimax game with value function V(G, D) :　　

　　在接下来的一节，我们展示 adversarial nets 的理论分析，本质上展示了训练的准则（training criterion）允许恢复出数据产生分布 as G and D are given enough capacity, i.e. the non-parametric limit. 图 1 给出了一个很好的展示，实际上，我们必须以一种迭代的方式来进行这个游戏。优化 D 在训练的内部训练中完成的代价是非常昂贵的，在有限的数据集上会导致 overfitting。相反，我们相互间隔 k steps 来优化 D ，one step 来优化 G 。这使得 D 保持在其 optimal solution 附近，只要 G 改变的足够缓慢。这个策略类比 SML/PCD training，这个过程总结在算法 1 中。

　　实际上，Equation 1 可能并没有提供足够的梯度来使得 G 学习的足够好。在学习的早期，G 是 poor 的，D 可以高置信度的方式 reject samples，因为他们和原始数据很明显不相同。在这种情况下，$log(1-D(G(z)))$ saturates （饱和了）。Rather than training G to minimize $log(1-D(G(z)))$ , 我们可以训练 G 来最大化 $log D(G(z))$ 。这个目标函数 results in the same fixed point of the dynamics of G and D but provides much stronger gradients early in learning . （在早期，提供了非常强的梯度信息）　　

　　图 1. 这四个小图展示了对抗训练的过程。其中，这几条线的意思分别是：

　　—— the discriminative distribution (D, blue, dashed line) 蓝色的虚线表示判别式的分布；

　　—— the data generating distribution (black, dotted line) $p_x$ 黑色的点线表示数据产生的分布；

　　—— the generative distribution $p_g (G)$ 绿色的实线。

　　—— the lower horizontal line is the domain from which z is sampled . 　　底部的水平线是采样 z 的 domain

　　—— the horizontal line above is part of the domain of x . 　　上部的水平线是 x domain 的部分。

　　—— the upward arrows show the mapping x = G(z) imposes the non-uniform distribution $p_g$ on transformed samples. 　　向上的箭头展示了 mapping x = G(z)，这个映射是非均匀分布到转换的samples。

　　（a）考虑一个接近收敛的对抗 pair。$p_g$ 和 $p_{data}$ 相似；D 是一个有一定准确性的 classifier。

　　（b）在算法 D 的内部循环被训练用来从数据中判断出 samples，收敛到 $D^*(x) = \frac{p_{data}(x)}{p_{data}(x) + p_g(x)}$ 。

　　（c）在更新 G 之后，D 的梯度已经引导 G(z) to flow to regions that are more likely to be classified as data.

　　（d）在几次训练之后，如果 G 和 D 有足够的能力，他们会达到一个平衡，使得两者都已经无法进一步的提升自我，即：$p_g = p_{data}$ 。这个时候，discriminator 已经无法判别两个分布的区别，也就是说，此时的 D(x) = 1/2 。

　　Theoretical Results .

　　作者表明 the minimax game has a global optimum for $p_g = p_{data}$。

　　Global Optimality of $p_g = p_{data}$：

　　对于任意一个 generator G，我们考虑最优的 discriminator D 。

　　Proposition 1 . 对于 fixed G，最优的 discriminator D 是：

　　Proof . 对于判别器 D 的训练准则，给定任意的 generator G，为了最大化 quantity V(G, D)

　　对于任意的 $ (a, b) \in R^2 \ {0, 0} $，函数 y ->a log(y) + b log(1-y) 在 $\frac{a}{a+b}$ 达到其最大值。The discriminator 不需要在 $Supp (p_{data} U Supp(p_g))$ 之外进行定义。

　　训练 D 的目标可以表达为：maximizing the log-likelihood for estimating the conditional probability $P(Y = y|x)$，其中 Y 表示是否 x 来自于 $p_{data}$ (with y = 1) 还是 $p_g$ （with y = 0）。Equation 1 的 minimax game 可以表达为：

　　Experiments :

个人收藏笔记记录

开通VIP