Generating MNIST images from an autoencoder model in Keras

Autoencoder are a type of model that are trained by recontructing an output identical to the input after reducing it to lower dimensions inside the model. That lower dimension vector is called latent space. Lower dimensions reduction allows autoencoder to be good at generalizing data tasks like removing noise from images.

A Dense based autoencoder

Mnist images are 28×28 black and white image containing hand drawn numbers.

Images are best handled by convolution layers, but autoencoder are useful in more that one way. Dense NN also train faster, so I tried a dense layer network first.

Using only sequential Keras models we can build an autoencoder by stacking a encoder and a decoder with a sequential model. Model can be used just as a custom layer in keras.

As nour model is an autoencoder we use x_train as input and as the expected output.

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(-1,784)
x_test = x_test.reshape(-1,784)

# create a encoder model
encoder = Sequential()
encoder.add(Dense(128, activation='relu', input_shape=(784,)))
encoder.add(Dense(64, activation='relu'))
encoder.add(Dense(64, activation='relu'))

# create a decoder model
decoder = Sequential()
decoder.add(Dense(64, activation='relu', input_shape=(64,)))
decoder.add(Dense(128, activation='relu'))
decoder.add(Dense(784, activation='softmax'))

# models behave like layers. We can build a model from other models
autoencoder = Sequential()
autoencoder.add(Encoder())
autoencoder.add(Decoder())

autoencoder.compile(loss=keras.losses.binary_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])

autoencoder.fit(x_train, x_train,
batch_size=100,
epochs=10,
verbose=1,
shuffle=True)

Above I presented how to define the autoencoder using the Keras Sequential API as this is what the Keras documentation explains first and I find it slightly more readable at the beginning.

However most autoencoder tutorials will use the functional API to define models. Let’s see how to to it with the functional API. These two codes defines the same model, just in different way. Writing the code both way was helpful for me in understand how the functional API works in keras.

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(-1,784)
x_test = x_test.reshape(-1,784)

# create a encoder model
i = Input((784,))
x = Dense(128, activation='relu', input_shape=(784,))(i)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
encoder = Model(inputs=i, outputs=x)

# create a decoder model
i = Input((64,))
x = Dense(64, activation='relu', input_shape=(64,))(i)
x = Dense(128, activation='relu')(x)
x = Dense(784, activation='softmax')(x)
decoder = Model(inputs=i, outputs=x)

# models behave like layers. We can build a model from other models
i = Input((784,))
x = Encoder()(i)
x = Decoder()(x)
autoencoder = Model(inputs=i, outputs=x)

autoencoder.compile(loss=keras.losses.binary_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])

autoencoder.fit(x_train, x_train,
batch_size=100,
epochs=10,
verbose=1,
shuffle=True)

Here is what the autoencoder produce as image. The sequential and the functional API of course provide similar results. The top image are the input data, which are then reduced to low dimension latent space by the encoder, then reconstructed to the bottom images by the decoder. If we have some latent space vector (64 numbers ) we can reconstruct all 784 pixels from those 64 numbers.

A convolutional based autoencoder

In the dense model above, we converted all images to 784 numbers without dimension. By doing that we removed all information about the spatial organisation of pixels in mnist images. Effectively the network ignore that some pixel is above some other pixel, or to the right, left of it. The output is a function of each pixel of the input.

Convolutional layers are a way to include spatial information into the network. Remember that a network can only predict something if the input contains information about the prediction.

Here is the model with the functional API :

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(-1, 28,28,1)
x_test = x_test.reshape(-1,28,28,1)

i = Input(shape=(28, 28, 1))
x = Conv2D(16, (3, 3), activation='relu', padding='same')(i)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
Encoder = Model(i, x)

i = Input(shape=(4, 4, 8)) # 8 conv2d features
x = Conv2D(8, (3, 3), activation='relu', padding='same')(i)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
Decoder = Model(i, x)

# define input to the model:
x = Input(shape=(28, 28, 1))

# make the model
autoencoder = Model(x, Decoder(Encoder(x)))

# compile the model:
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

autoencoder.fit(x_train, x_train,
batch_size=100,
epochs=1,
verbose=1,
shuffle=True,
validation_data=(x_test, x_test))

Here are what the convolutional autoencoder produce as output. The principle is the same as the dense autoencoder, but it produce number that are more strongly defined. The numbers are still a bit fuzzy which is a tendency of autoencoder due to the simplication.

Randomly sampling numbers

Having done that it seems that it could be possible to sample some random vector from the latent space and to use that to generate an image.

Sadly this is not possible, because the latent space is not dense, which means that a random vector will be unlikely to produce a valid image.

To do that we need to find a way to pack the latent space more densely so that any random vector will produce a valid number. This is what a variational autoencoder does.