juooo1117

Building Autoencoders in Keras 본문

Deep Learning Study/Udemy Deep Learning A-Z

Building Autoencoders in Keras

Hyo__ni 2023. 12. 18. 11:26

What are autoencoders?

"Autoencoding" is a data compression algorithm where the compression and decompression functions are 

    1) data-specific

    2) lossy

    3) learned automatically from examples rather than engineered by a human.

Additionally, in almost all contexts where the term "autoencoder" is used, the compression and decompression functions are implemented with neural networks.

 

  -  Autoencoders are data-specific, which means that they will only be able to compress data similar to what they have been trained on. (학습 시 사용한 데이터와 유사한 데이터만 압축 가능)
  -  Autoencoders are lossy, which means that the decompressed outputs will be degraded compared to the original inputs. (인코딩~디코딩 과정을 거치면서 손실이 발생한다.)
  -  Autoencoders are learned automatically from data examples. (데이터를 통해서 자동으로 학습된다)

 

Vanilla autoencoder on a connected layers network

encoding_dim = 32  →  인코딩 사이즈를 32로 지정해 준다. (28*28=784이므로, input shape은 784로 지정)

configure our model to use a per-pixel binary crossentropy loss, and the Adam optimizer

import keras
from keras import layers

encoding_dim = 32 
input_img = keras.Input(shape=(784,))    # input image

encoded = layers.Dense(encoding_dim, activation='relu')(input_img)
decoded = layers.Dense(784, activation='sigmoid')(encoded)
autoencoder = keras.Model(input_img, decoded)

encoder = keras.Model(input_img, encoded)           # encoder model
encoded_input = keras.Input(shape=(encoding_dim,))  # decoder model
decoder_layer = autoencoder.layers[-1]
decoder = keras.Model(encoded_input, decoder_layer(encoded_input))

autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

MNIST digits dataset을 이용해서 결과를 살펴보자

인코더-디코더 기능이 잘 작동하는지를 살펴보기 위해서 사용하므로 데이터셋의 labels 정보는 필요없다.

normalize all values between 0 and 1

flatten the 28x28 images into vectors of size 784

from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
print(x_train.shape)   # (60000, 784)
print(x_test.shape)    # (10000, 784)

autoencoder.fit(x_train, x_train,
                epochs=50,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))
                
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)

 

결과(윗 줄): 원래 데이터

결과(아랫 줄): 오토인코더 과정을 걸쳐 만들어진 reconstructed digits

We are losing quite a bit of detail with this basic approach.

Adding a sparsity constraint on the encoded representations

Vanilla autoencoder 에서는 hidden layer (32) 의 크기에 의해서만 제한되었는데, 이 경우에 일반적으로 hidden layer가 PCA의 근사치(approximation of PCA)를 학습한다. 

표현을 compact하게 만들기 위해서 hidden representations의 활동에 희소성 제한(sparsity contraint)을 추가해서 주어진 시간에 더 적은 수의 단위가 활성화될 수 있도록 만든다.

  →  In Keras, this can be done by adding an activity_regularizer to our Dense layer:

from keras import regularizers

encoding_dim = 32
input_img = keras.Input(shape=(784,))

# Add a Dense layer with a L1 activity regularizer
encoded = layers.Dense(encoding_dim, activation='relu',
                activity_regularizer=regularizers.l1(10e-5))(input_img)
decoded = layers.Dense(784, activation='sigmoid')(encoded)

autoencoder = keras.Model(input_img, decoded)

Deep autoencoder

인코더와 디코더의 레이어를 깊게 쌓아보자

input_img = keras.Input(shape=(784,))
encoded = layers.Dense(128, activation='relu')(input_img)
encoded = layers.Dense(64, activation='relu')(encoded)
encoded = layers.Dense(32, activation='relu')(encoded)

decoded = layers.Dense(64, activation='relu')(encoded)
decoded = layers.Dense(128, activation='relu')(decoded)
decoded = layers.Dense(784, activation='sigmoid')(decoded)

autoencoder = keras.Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
autoencoder.fit(x_train, x_train,
                epochs=100,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))

Convolutional autoencoder

Since our inputs are images, it makes sense to use convolutional neural networks (convnets) as encoders and decoders. In practical settings, autoencoders applied to images are always convolutional autoencoders (성능이 더 좋다)

 → 입력이 이미지이기 때문에 convolutional networks를 인코더와 디코더로 사용하면 성능이 더 좋다.

 

인코더는 Conv2D, MaxPooling2D 를 쌓아서 만든다. (max pooling being used for spatial down-sampling)

디코더는 Conv2D, UpSampling2D 를 쌓아서 만든다.

input_img = keras.Input(shape=(28, 28, 1))  # image = 28*28

x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)

x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(16, (3, 3), activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = keras.Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

Application to image denoising

Let's put our convolutional autoencoder to work on an image denoising problem. 

  →  노이즈가 있는 숫자 이미지를 깨끗한 숫자 이미지로 매핑하도록 autoencoder를 학습해보자

가우스 잡음 행렬(gaussian noise matri)을 적용하고 이미지를 0과 1 사이에서 자르자(clip the images between 0 and 1).

(x_train, _), (x_test, _) = mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))

noise_factor = 0.5
x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape) 
x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape) 

x_train_noisy = np.clip(x_train_noisy, 0., 1.)
x_test_noisy = np.clip(x_test_noisy, 0., 1.)

# input data
input_img = keras.Input(shape=(28, 28, 1))

x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)

x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = keras.Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

 

[Practice Code]