About the main GAN models

Yann Lucan, one of the main founders of CNNs (convolutional neural networks).

This, and the variations that are now being proposed, is the most interesting idea in ML in the last 10 years, in my opinion.”
“The most interesting idea in the last 10 years in ML, in my opinion.

GAN is an artificial intelligence algorithm.

I’m learning machine learning using Python Machine Learning Programming: Theory and Practice by Master Data Scientists, and this GAN came up in chapter 17, which is much more difficult to understand than the previous algorithms, and there are many derivative models. I have tried to summarize it in my own way.

I’ve tried to summarize it to the best of my ability. I’ve left out the mathematical formulas and theoretical parts because I don’t understand many of them myself.
I’ll update this page when I understand it.

For now, I’ve summarized the major ones.
As for the GAN model, it is summarized here and here, so I hope to read through and summarize it little by little.

I’m a beginner, so there is a big possibility that there will be some mistakes. Please feel free to point them out to me.
What is GAN?
GAN stands for Generative Adversarial Network, which translates to Adversarial Generative Network.

The main feature of GAN is that it consists of two networks: Generator (Generative Network) and Discriminator (Discriminative Network).

The Generator takes the noise Z as input and generates a fake image (Fake).
The Discriminator compares the Fake image generated by the Generator with the Real image to determine whether it is real or fake.

The Generator is trained to generate images that look so lifelike that the Discriminator cannot detect them, while the Discriminator is trained to detect such difficult-to-detect fakes more accurately.

This is called adversarial because the two networks learn for conflicting purposes, repeatedly cheating each other and improving their performance.

In the original version of GAN, the Generator and Discriminator are two all-coupled networks with one or more hidden layers, sometimes called vanila GANs.

What is DCGAN?
DCGAN (Deep Convolutional Generative Adversarial Networks) is translated as Deep Convolutional GAN. (Also known as Deep Convolutional GAN).

In short, it is a GAN model that incorporates convolutional layers into the Generator and Discriminator of vanila GAN.
In the official paper, it is proposed to incorporate both of them.

Looking at the Generator, the 20-dimensional noise is finally upsampled into a 28×28 image using transpose convolution (reverse convolution).

There is an image gif of the transpose convolution. ⇩
Outputting a 6×6 image from a 4×4 image with a kernel size of 3×3 and a stride of 1
It may be easier to understand if you have the image from the bottom to the top.

It also performs Batch Normarization along the way.
The goal of Batch Normarization is to normalize the input of the layers in each mini-batch and fix the distribution to reduce overtraining and achieve faster convergence.

As for the last activation function, the official paper uses Tanh (hyperbolic tangent) only for the last output layer, and ReLU for the rest.

For the Discriminator, the structure is almost the same as the so-called CNN model.

Compared to Relu, when z becomes negative, the gradient is no longer zero. With Relu, the gradient becomes zero when z becomes negative, which causes the learning process to stop. It is used to avoid such problems.

There are other important points in the implementation of DCGAN, but these are just the ones that stand out.

What is CGAN?
CGAN stands for Conditional GAN, which is a model that also takes label information into account.
To illustrate with the MNIST example, the above two models randomly generate numbers from 0 to 9, but this GAN model can generate arbitrary numbers.

Label information along with the input noise of the Generator
The label information is also given to the Discriminator as an input along with the image data.

However, the role of the Discriminator does not change from “determining whether an image is real or fake”.

Official Paper

Conclusion
That’s it for now.
There are still CycleGAN, LightWeightGAN, etc., so I will summarize them as needed.
I’d like to summarize the mathematical part as well, once this clunky brain of mine catches up with its understanding.

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です