[Module 6] Deep Learning: CNN and Image Classification

Artificial Intelligence/LG Aimers: AI전문가과정

[Module 6] Deep Learning: CNN and Image Classification

Hyo__ni 2024. 1. 12. 14:07

Part 3. Convolutional Neural Networks and Image Classification

How ConvNet works?

trickier cases : translation, scaling, rotation, weight

→ ConvNets match pieces of the image

Filtering : the Match Behind the Match

→ 3*3patch를 overlap 시켜서 각 pixel별로 매칭이 됐는가 안 됐는가를 두 값들 간의 곱셈으로 나타내면, 3*3 patch에 대해서 mapping이 되었는가 아닌가에 대한 값들을 얻을 수 있다. 이 값들을 다 합하고 총 patch의 개수(3*3 patch인 경우, 9!)로 나누면 이것이 매칭된 정도이다.

위와 같이, 가운데 pixel 위치의 값에 기록하는 식으로 주어진 행렬을 완성하게 되면, 매칭정도를 나타내는 이미지를 얻을 수 있음!

→ 이러한 결과 이미지를 activation map(활성화 지도) 이라고 부른다.

즉, activation map : 3*3 patch를 주어진 input image의 가능한 모든 위치에 overlap 시켜서 matching 되는 정도를 얻었을 때 나오는 이미지 (convolution operation 결과!!)

To get an activation map,

해당하는 이미지 패치와 align 시켰을 때 pixel 값들을 서로곱하고, 이 값을 다 더해서 최종적으로 total pixel 개수로 나누어 준다.

1. Overlap the convolution filter and the image patch.

2. Multiply each image pixel by the corresponding filter coefficient.

3. Add them up.

4. Divide by the total number of pixels in the feature. (*optional, 보통의 경우 생략하는 경우가 많음)

Convolution Layer

Convolve the filter with the image i.e., 'slide over the image spatially, computing dot products'

(Filters always extend the full depth of the input volum; depth는 channel 수를 의미함)

Pooling Layer: Shrinking the image stack

- Pick a window size (usually 2)

- Pick a stride (usually 2)

- Walk your window across your filtered images

- From each window, take the maximum value

max pooling을 하게 되면 결국 영역에서 가장 큰 값 1개만을 뽑기 때문에, 가로*세로 size를 반으로 줄여주는 효과를 가지게 된다. (정보를 축약하는 과정을 통해서, 찾고자 하는 패턴이 나타났다면 그것을 인정해 주는 것!!)

채널별로 따로따로 max pooling을 진행해 줌, 공통적으로 각 channel들의 가로*세로 크기가 반으로 줄어든 activation map을 얻게 됨

→ A stack of images becomes a stack of smaller images.

ReLU Layer

convolution을 통한 선형 연산(input * weights) 이후에 활성 함수를 통과시켜 줌으로써, 함수를 유연하게 다양한 패턴을 표현할 수 있게끔 만들어준다.

이전의 convolution layer에서 나왔던 output activation map을 입력으로 받아서 각각의 값에 ReLU function을 적용해 주게 되면, 양수인 값은 그대로, 음수였던 값은 0으로 clipping 해주는 변형된 output activation map이 나온다.

→ A stack of images becomes a stack of images with no negative values.

Deep stacking

한 neural network의 stacking 된 형태 (conv - ReLU - conv - ReLU - max pooling ... )

→ 이런 형태로 계속 Deep learning layer를 쌓는다.

(Layers can be repeated several or many times)

Fully connected Layer

conv - ReLU - conv - ReLU - max pooling ... 구조의 stack을 어느정도 쌓은 이후에 FC layer

- Every value gets a vote.

- Vote depends on how strongly a value predicts X or O.

- A list of feature values becomes a list of votes.

Hyperparameters

각 layer 마다 결정해 주어야 하는 hyperparameter 값들

- Convolution : Number of filters, Size of filters

- Pooling : Window size, Window stride

- Fully Connected : Number of layers, Number of neurons

Advanced CNN Architectures

VGG, GoogLeNet, and ResNet are all in wide use, ResNet is currently the best default, Recent trends are going towards extremely deep networks.

VGGNet

Main Idea of ResNet :

Use network layers to fit a residual mapping instead of directly trying to fit a desired underlying mapping