An autoencoder reduce an input of many dimensions, to a vector space of less dimension, then it recompute the lossed dimension from that limited number of intermediate vectors. This intermediate dimension is called the latent space.

An autoencoder is good at task like filtering noise, however, it is difficult to make it generate new images because the latent vector space is sparse. That means, we can’t sample randomly from that space and expects a vector producing a valid image. A lot of points in that space produce nothing of value. A variational autoencoder force the latent space to be continuous so that we can pick a random vector and get a meaningful image from it. We are going to explore how to do this.

Beyond that case, this article can also be useful if you want to understand why and how to use alternative loss function in Keras.

Forcing the VAE to only generate humanly interpretable image can be understood as having all dots in the latent space to produce interpretable image. To do that we need to sample all dots of the latent space somehow and to force them to produce some expected result.

Sampling all the dots of the latent space is of course impossible, because to fill the latent space we would need an infinity of them. So we’d prefer to use something with a surface. We could imagine filling the space with squares or disks but as they are discrete shapes, it would be hard to cover the whole space. It would really be like a game of Tetris. They would also be hard to move as this would create conflict with other pieces.

So, instead of a dot or a discrete surface, we’ll use a normal distribution, which is a probability density function. In two dimensions it looks a bit like a disk with a variable density of existence. As we get further from the center of it, it gets softer and softer (less probable ), so it’s easier to stitch other PDF next to it.

We now need to know how to place our PDF across the latent space to cover it entirely. However, the backpropagation optimize the latent space one sample at a time with labels of some surface. We can’t just distribute them across the space. The way to do it is to grow individual elements as large as possible, given it doesn’t break the quality of the results, then we do it again on the next epoch and grows even further the elements if possible until it’s not possible to do better.

Pointing the backpropagation algorithm in the right direction is the role of the loss function.

We will define the loss function as the sum of two positive real numbers. That way it’s easy to see that to minimize A+B, A and B being positive we have to minimize as much as possible A and B.

For the first part, we simply use the crossentropy loss function of keras. This part will grow if the output is different from what is expected. This will cause the latent space to be organized into clusters of dots (it’s the same thing we do for the ordinary autoencoder ).

For the second part, we want to cover as much latent space as possible. In order to do that we have to decide what the latent space will be. Because we are going to sample from a normal distribution, it’s practical to decide that the latent space will be a normal distribution as well. What is the best cover we can imagine to cover a normal distribution ? Of course that’s the distribution itself. The measure how much two distributions are alike is the Kullback–Leibler divergence. It’s expressed as :

$$D_{KL}(P,Q) = E(log(P/Q))$$

We can observe that, the divergence of two same distribution is E(log(1)) = 0.

In practice we are interested about the specific case of the distance between two normal distributions which is :

KL divergence is the esperance of the difference of the quantity of information of the events under some distribution P with the quantity of information provided by the same events under some other distribution Q. Its seems complicated but what it means it that we measure the difference of what we expected for each case and what we got, then we do the ponderated mean of it.

$$D_{KL}(P\|Q) = E(log(P/Q)) = E(log(P)-log(Q))$$

For the VAE the distribution we use is the normal distribution. We want the NN to optimize the distribution of X so that they are more tightly packed around the origin. So we are going to optimize so that the P distribution look the most like the N(0,1) distribution (a gaussian distribution located around the origin). In the case of the VAE we are interested in the KL divergence of :

$$D_{KL}(N(\mu, \sigma) || N(0,1) )$$

We use the multivariate form of it (found on wikipedia ) :

$$D_{KL}(N((\mu_1,…,\mu_k)^T, diag(\sigma_1^2,…,\sigma_k^2))|N(0,1)) = \frac{1}{2}\sum_{i=1}^k(\sigma_1^2+\mu_i^2-ln(\sigma_i^2)-1]$$

The sample code below use a standard implementation for Keras. It actually implements something a little different. As the NN can learn to produce any function, it is trained to learn \(log(\sigma^2)\) instead of \(\sigma^2\) for numeric stability because the NN can map values between 0 and 1 to any negative number [stackoverflow].

$$D_{KL} = -1/2\sum_{i=0}^n -e^{log(\sigma^2)}-\mu^2+log(\sigma^2) + 1$$

The backpropagation need be able to determine how changing the mean and variance of each parameter affects the output of the loss function. However, we can’t just sample from \(N(\mu,\sigma)\) as it would not be possible estimate what change to mean and variance would have produced as sample.

Instead of sampling from \(N(\mu,\sigma)\), we sample from N(0,1) then reparametrize it to it’s correct position. This change the problem to a linear function which is derivable.

For a fixed epsilon we can now compute what changes to the mean and variance would create as change to the loss function.

# reparameterization trick

# instead of sampling from Q(z|X), sample eps = N(0,I)

z = z_mean + sqrt(var)*eps

def sampling(args):

z_mean, z_log_var = args

batch = K.shape(z_mean)[0]

dim = K.int_shape(z_mean)[1]

# by default, random_normal has mean=0 and std=1.0

epsilon = K.random_normal(shape=(batch, dim))

return z_mean + K.exp(0.5 * z_log_var) * epsilon

# MNIST dataset

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train[0:1000]

image_size = x_train.shape[1]

original_dim = image_size * image_size

x_train = np.reshape(x_train, [-1, original_dim])

x_test = np.reshape(x_test, [-1, original_dim])

x_train = x_train.astype('float32') / 255

x_test = x_test.astype('float32') / 255

# network parameters

input_shape = (original_dim, )

intermediate_dim = 128

batch_size = 128

latent_dim = 2

epochs = 1

# create a encoder model

inputs = Input((784,))

x = Dense(intermediate_dim, activation='relu')(inputs)

z_mean = Dense(latent_dim, name='z_mean')(x)

z_log_var = Dense(latent_dim, name='z_log_var')(x)

z = Lambda(sampling, output_shape=(latent_dim,), name='z')([z_mean, z_log_var])

encoder = Model(inputs=inputs, outputs=[z_mean, z_log_var, z], name='encoder')

# create a decoder model

i = Input((latent_dim,))

x = Dense(intermediate_dim, activation='relu')(i)

x = Dense(784, activation='softmax')(x)

decoder = Model(inputs=i, outputs=x,name='decoder')

# instantiate VAE model

outputs = decoder(encoder(inputs)[2])

vae = Model(inputs, outputs, name='vae_mlp')

# VAE loss = mse_loss or xent_loss + kl_loss

reconstruction_loss = binary_crossentropy(inputs, outputs)

reconstruction_loss *= original_dim

kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)

kl_loss = K.sum(kl_loss, axis=-1)

kl_loss *= -0.5

vae_loss = K.mean(reconstruction_loss + kl_loss)

vae.add_loss(vae_loss)

vae.compile(optimizer='adam', loss=None)

parser = OptionParser()

parser.add_option("-t", "--train", action="store_true")

(options, args) = parser.parse_args()

# train the autoencoder

vae.load_weights('vae_mlp_mnist.h5')

if options.train:

vae.fit(x_train,

epochs=40,

batch_size=100)

vae.save_weights('mnist_variational_autoencoder.h5')

else:

vae.load_weights('mnist_variational_autoencoder.h5')

# generate some image from random latent vector

lsv = np.random.normal(size=(5, latent_dim))

lsv = np.array([[x/10.0,y/10.0] for x in range(-10,10,2) for y in range(-10,10,2)])

imgs = decoder.predict(lsv)

print(imgs)

iplot = 1

for img in imgs:

img = img.reshape(28,28)

plt.subplot(10,10, iplot)

iplot+=1

plt.imshow(img)

plt.show()

I needed a few reminder about probability and information theory before understanding the intuition about KL divergence. Here they are :

A bit of information theory and probability :

- A event of probability \(\frac{1}{2^n}\) provide a quantity of information of \(log_2(\frac{1}{2^n}) = n\). If something has a probability of information of \(\frac{1}{2^3}\) it’s quantity of information is \(log_2(\frac{1}{2^3})=3\). The generalization of this is :

*$$I = -log(p(x))$$*

- Mean and variance of continuous random variables

The mean is the sum of x * density probability function at that point.

$$\mu = \int_{-\infty}^{\infty} xp(x)$$

The variance is the sum of distance to the mean times the density probability function at that point.

$$\sigma^2 = \int_{-\infty}^{\infty} (x-\mu)^2p(x)$$

- Esperance of a probability distribution is the value we could expect to have if we repeat some experiment a lot of times. It’s the sum of the value of the events ponderated by the probability of the event to occur.

$$E(X)=\sum_{0}^{n} p(x_n) x_n$$

- The information entropy is the esperance of the information of a random variable.

$$H(X)=-E(log(P(X))))=\sum_{i=0}^{n} log_n(x_i)p(x_i)$$

We use the special case of a diagonal multivariate normal, and a standard normal distribution as stated in : https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence#Multivariate_normal_distributions

- The normal distribution

$$f(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left (\frac{x-\mu}{\sigma}\right)^2}$$

- Multivariate normal gaussian

$$p(x)=\frac{1}{(2\pi)^{n/2}det(\sigma)^{1/2} }e\left(-1/2(x-\mu)^T)\sigma^{-1}(x-\mu)\right)$$

The standard normal distribution N(0,1) (the formula above with \(\mu=0\) and \(\sigma=0\)

A few readings that were interesting in understanding the VAE :

**KL divergence between two univariate Gaussians***I need to determine the KL-divergence between two Gaussians. I am comparing my results to these, but I can’t reproduce*- stats.stackexchange.com
**Kullback-Leibler Divergence Explained***There are plenty of existing error metrics, but our primary concern is with minimizing the amount of information we…*www.countbayesie.com **Intuitively Understanding Variational Autoencoders***And why they’re so useful in creating your own generative text, art and even music*towardsdatascience.com**Variational Autoencoder: Intuition and Implementation — Agustinus Kristiadi’s Blog***Variational Autoencoder (VAE) (Kingma et al., 2013) is a new perspective in the autoencoding business. It views…*wiseodd.github.io- Tutorial on variational autoencoder : https://arxiv.org/pdf/1606.05908.pdf

Mnist images are 28×28 black and white image containing hand drawn numbers.

Images are best handled by convolution layers, but autoencoder are useful in more that one way. Dense NN also train faster, so I tried a dense layer network first.

Using only sequential Keras models we can build an autoencoder by stacking a encoder and a decoder with a sequential model. Model can be used just as a custom layer in keras.

As nour model is an autoencoder we use x_train as input and as the expected output.

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(-1,784)

x_test = x_test.reshape(-1,784)

# create a encoder model

encoder = Sequential()

encoder.add(Dense(128, activation='relu', input_shape=(784,)))

encoder.add(Dense(64, activation='relu'))

encoder.add(Dense(64, activation='relu'))

# create a decoder model

decoder = Sequential()

decoder.add(Dense(64, activation='relu', input_shape=(64,)))

decoder.add(Dense(128, activation='relu'))

decoder.add(Dense(784, activation='softmax'))

# models behave like layers. We can build a model from other models

autoencoder = Sequential()

autoencoder.add(Encoder())

autoencoder.add(Decoder())

autoencoder.compile(loss=keras.losses.binary_crossentropy,

optimizer=keras.optimizers.Adadelta(),

metrics=['accuracy'])

autoencoder.fit(x_train, x_train,

batch_size=100,

epochs=10,

verbose=1,

shuffle=True)

Above I presented how to define the autoencoder using the Keras Sequential API as this is what the Keras documentation explains first and I find it slightly more readable at the beginning.

However most autoencoder tutorials will use the functional API to define models. Let’s see how to to it with the functional API. These two codes defines the same model, just in different way. Writing the code both way was helpful for me in understand how the functional API works in keras.

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(-1,784)

x_test = x_test.reshape(-1,784)

# create a encoder model

i = Input((784,))

x = Dense(128, activation='relu', input_shape=(784,))(i)

x = Dense(64, activation='relu')(x)

x = Dense(64, activation='relu')(x)

encoder = Model(inputs=i, outputs=x)

# create a decoder model

i = Input((64,))

x = Dense(64, activation='relu', input_shape=(64,))(i)

x = Dense(128, activation='relu')(x)

x = Dense(784, activation='softmax')(x)

decoder = Model(inputs=i, outputs=x)

# models behave like layers. We can build a model from other models

i = Input((784,))

x = Encoder()(i)

x = Decoder()(x)

autoencoder = Model(inputs=i, outputs=x)

autoencoder.compile(loss=keras.losses.binary_crossentropy,

optimizer=keras.optimizers.Adadelta(),

metrics=['accuracy'])

autoencoder.fit(x_train, x_train,

batch_size=100,

epochs=10,

verbose=1,

shuffle=True)

Here is what the autoencoder produce as image. The sequential and the functional API of course provide similar results. The top image are the input data, which are then reduced to low dimension latent space by the encoder, then reconstructed to the bottom images by the decoder. If we have some latent space vector (64 numbers ) we can reconstruct all 784 pixels from those 64 numbers.

In the dense model above, we converted all images to 784 numbers without dimension. By doing that we removed all information about the spatial organisation of pixels in mnist images. Effectively the network ignore that some pixel is above some other pixel, or to the right, left of it. The output is a function of each pixel of the input.

Convolutional layers are a way to include spatial information into the network. Remember that a network can only predict something if the input contains information about the prediction.

Here is the model with the functional API :

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(-1, 28,28,1)

x_test = x_test.reshape(-1,28,28,1)

i = Input(shape=(28, 28, 1))

x = Conv2D(16, (3, 3), activation='relu', padding='same')(i)

x = MaxPooling2D((2, 2), padding='same')(x)

x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)

x = MaxPooling2D((2, 2), padding='same')(x)

x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)

x = MaxPooling2D((2, 2), padding='same')(x)

Encoder = Model(i, x)

i = Input(shape=(4, 4, 8)) # 8 conv2d features

x = Conv2D(8, (3, 3), activation='relu', padding='same')(i)

x = UpSampling2D((2, 2))(x)

x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)

x = UpSampling2D((2, 2))(x)

x = Conv2D(16, (3, 3), activation='relu')(x)

x = UpSampling2D((2, 2))(x)

x = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

Decoder = Model(i, x)

# define input to the model:

x = Input(shape=(28, 28, 1))

# make the model

autoencoder = Model(x, Decoder(Encoder(x)))

# compile the model:

autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

autoencoder.fit(x_train, x_train,

batch_size=100,

epochs=1,

verbose=1,

shuffle=True,

validation_data=(x_test, x_test))

Here are what the convolutional autoencoder produce as output. The principle is the same as the dense autoencoder, but it produce number that are more strongly defined. The numbers are still a bit fuzzy which is a tendency of autoencoder due to the simplication.

Having done that it seems that it could be possible to sample some random vector from the latent space and to use that to generate an image.

Sadly this is not possible, because the latent space is not dense, which means that a random vector will be unlikely to produce a valid image.

To do that we need to find a way to pack the latent space more densely so that any random vector will produce a valid number. This is what a variational autoencoder does.

]]>I could not find a description of Keras tensor however Keras is implemented over Tensorflow and share the same concepts. This paragraph on the Tensorflow website was what made tensors clearer :

A

tensoris a generalization of vectors and matrices to potentially higher dimensions

That one was clear from the beginning. Tensor are matrices of many dimensions. All right. Then I read :

`tf.Tensor`

object represents a partially defined computation that will eventually produce a value.

That one was what I missed. The tf.Tensor object is the result of a function that is not yet evaluated.

I feel that it is like *f(x)* in the mathematical function *f(x) = x²*. If we provide some value to the input x, then *f(x)* would evaluate to something. As long as there is no value, it is just *f(x)* a partially defined computation (a function).

TensorFlow programs work by first building a graph of

`tf.Tensor`

objects, detailing how each tensor is computed based on the other available tensors and then by running parts of this graph to achieve the desired results.

The tf.Tensor object is not just a matrix of many dimensions, it also link to other tensors by the way it is computed. The way a tf.Tensor is computed is a function that transform a tensor A to a tensor B. I suppose we recurse from the output tensor until we reach all necessary inputs, then we evaluate everything forward.

All of this is about Tensorflow, but I feel that this is a correct for Keras as well.

from keras import backend as K

i = K.placeholder(shape=(4,), name=”input”)

f = K.function([i], [i])

ival = np.ones((4,))

print( f([ival]) )

> [array([ 1., 1., 1., 1.], dtype=float32)]

Useless function that takes an input and returns it. i is a tensor. f is a function. It takes input and outputs as tensor. When we evaluate it with f([ival]), the tensor graph is walked from the i output to the i input. Quite easy here :). i is evaluated with it’s value, then the output is returned by the function.

from keras import backend as K

i = K.placeholder(shape=(4,), name="input")

square = K.square(i)

f = K.function([i], [square])

ival = np.ones((4,))*2

print( f([ival]) )

> [array([ 4., 4., 4., 4.], dtype=float32)]

A function that returns the square of each value of the input. It is just the same as precedently but we walk the graph through the square function before getting to the input values.

from keras import backend as K

i = K.placeholder(shape=(4,), name=”input”)

square = K.square(i)

mean = K.mean(i)

mean_of_square = K.mean(K.square(i))

f = K.function([i], [i, square, mean, mean_of_square])

ival = np.ones((4,))

print( f([ival]) )

> [array([ 2., 2., 2., 2.], dtype=float32), array([ 4., 4., 4., 4.], dtype=float32), 2.0, 4.0]

A function that returns the input, the square, the mean and the mean of the square. We can compose functions.

from keras import backend as K

i = K.placeholder(shape=(4,), name=”input”)

square = K.square(i)

grad = K.gradients([square], [i])

f = K.function([i], [i,square] + grad)

ival = np.ones((4,))*3

print( f([ival]) )

> [[array([ 3., 3., 3., 3.], dtype=float32), array([ 9., 9., 9., 9.], dtype=float32), array([ 6., 6., 6., 6.], dtype=float32)]

A function that compute the gradient of square relative to the variable i. Gradient compute a tensor that is the composition of all functions between square and i. square(x) = x² so square(x)/dx = 2x. For the input 3, the derivative is 6. We can compute the derivative between two related tensors.

I think a few of those examples might be useful to the Keras documentation. I’ll do it when my understanding would have improved a bit. At least this article will help me avoid the curse of knowledge latter and reminds me of difficulties along the way.

]]>So for testing, I looked for what could be a simple enough project. At work, I had a lot of news sorted by categories. It seemed nice to create a NN that would be able to automatically classify individual news into each category.

It would provide a little bit of value by possibly proposing automatically appropriate category for a feed or by proposing alternative classification for individual news for better navigation.

I had read a few NN tutorials but going from explanations to a working example on my own data was not trivial. As often when you learn a new topic, there is a lot of vocabulary and articles are a bit painful to read until you master enough of it. One of the thing that was difficult too was preparing the data for the NN.

In that article I’m going quickly to implementation. This is a very beginner oriented article written mostly to consolidate my own understanding. I’ll start with the NN itself, then we’ll prepare the data for training the NN itself.

The NN we are going to use is defined with Keras that way :

from keras.models import Sequential

from keras.layers import Dense, Activation

model = Sequential()

model.add(Dense(512, input_dim=10000))

model.add(Activation(‘relu’))

model.add(Dense(8))

model.add(Activation(‘softmax’))

model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

I didn’t included inputs and output so that we have a look at how the NN is built first. So what do we have here ?

**Sequential model**

model = Sequential()

Sequential is the simplest way to define a NN with Keras. It means we are going to stack each layer over the previous one.

**First Dense**

model.add(Dense(512, input_dim=10000))

Dense is a classical NN layer. Because it is the first layer, it’s also considered the input layer. The input_dim parameter is the dimension of the data the NN is going to expect as input. That’s also called the dimension of the data.

That layer has 512 neurons. Those 512 neurons will output 512 values which are going to be the inputs of the next layer. We don’t need to specify the size of the input of next layer because Keras does it automatically.

512 is also the dimension that the NN is going to use to represent your data internally. Larger values learns more slowly. They are also more subject to overfitting which means that the network instead of generalizing the problem will ultraspecialise it for the case provided reducing the capability of the network to predict good output for unknown inputs. Smaller values may not be enough to generalize the problem.

Eventually, everybody tries different values to find something that perform well.

**Role of the first Activation**

model.add(Activation(‘relu’))

Activation is what is called the activation function. NN use activation function because stacking linear operation over linear operation ends up being a linear operation itself resulting in the network being equivalent to a single layer. Activation take the type of function used as activation. For internal layers *relu*is often a good pick.

ConvNetJS demo is great to see how activation function changes how the activation function change the shape of the learned data. Try changing the tanh to relu to experiment.

**The second Dense layer**

model.add(Dense(8))

This is the last layer of the NN, so that’s it’s output. As I have 8 categories, the NN needs to ouput 8 values, so that layer has 8 neurons. It’ll output an array such as [2,6,0,1,0,0,0.5,0].

**Softmax Activation**

model.add(Activation(‘softmax’))

The previous Dense layer outputs real numbers. It’s not easy looking for one output to tell what means a 6 for example. The interpretation of it depends of the other values.

Softmax squashes the value of each output between 0 and 1 and normalise the whole vector so that the sum of the probability of the classes is 1. For a single specific output Softmax tells you the probability that that class is true.

model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

We are done defining our NN. categorical_crossentropy is the loss function we use when sorting in category. It expects output to be a vector of probability of category such as softmax output does.

I didn’t research much into the other parameters. Some optimisers may work best on different types of data but for my use case, most did worked well.

I found model.summary() to be very helpful at the beginning. I didn’t saw a lot of tutorial use it, probably because it doesn’t provide a lot of information.

Still, it does help understanding to see how parameters affect the dimension of the layers of the NN. It also helps with debugging because when you have an error, Keras will use the name of the layer to tell you where the issue is. I always have difficulty when I see a 1 or 2 index to know if the count starts at 0 or 1. It made me scratching my head until I found summary().

Layer (type) Output Shape Param #

=================================================================

dense_1 (Dense) (None, 512) 5120512

_________________________________________________________________

activation_1 (Activation) (None, 512) 0

_________________________________________________________________

dense_2 (Dense) (None, 8) 4104

_________________________________________________________________

activation_2 (Activation) (None, 8) 0

=================================================================

Total params: 5,124,616

Trainable params: 5,124,616

Non-trainable params: 0

Training is the part where all the coefficients are chosen to have the NN predict the closest output given an input. Keras does it in one line but it’s actually where most of the NN magic happens.

model.fit(x_train, y_train,

batch_size=batch_size,

epochs=2,

verbose=1,

validation_split=0.1)

The training itself is done by an algorithm named backpropagation. A very clear explanation of backpropagation works can be found on the Matt Mazur blog.**A Step by Step Backpropagation Example***Background Backpropagation is a common method for training a neural network. There is no shortage of papers online that…*mattmazur.com

x_train and y_train are the inputs and outputs of the NN used for training. Both are multidimensional array. The first row of x_train is the first example of input. The first row of y_train is the first example of expected output. Each row of x_train is given to the NN and neurons coefficient are corrected toward y_train corresponding value. Doing that once for all data is called an epoch.

epochs=2 means to do that process twice. If you have a lot of data, you may put only 1, but most of the time using a larger epoch make a better use of the data.

batch_size is how many data are used to do a forward pass and a backward pass (from the Keras description ). I find that a little bit unclear, here is my guess, it might be wrong. Batch exists to solve a performance problem : updating weight of each neuron after each data is slow. It may not be very slow on CPU, but on GPU that would means transferring weights to the model after each forward pass. Also on many cores architecture, it allows to parallelize passes.

My guess is that Keras does many forward passes, compute the new weights, but only update them each batch_size pass. The nicest explication I found was on machinelearningmastery. It actually call the algorithm mini-batch.**A Gentle Introduction to Mini-Batch Gradient Descent and How to Configure Batch Size***Stochastic gradient descent is the dominant method used to train deep learning models. There are three main variants of…*machinelearningmastery.com

CSV is often used as an input format for NN. My data format is basically composed of title and text of the article in the first field and category in the second field. It’s possible to read them that way :

import pandas

dataframe = pandas.read_csv(‘articles.csv’, header=None, escapechar=’\\’, na_filter=False)

dataset = dataframe.values

texts = dataset[:9000,0]

categories = dataset[9000:,1]

test_texts = dataset[:9000,0]

test_categories = dataset[9000:,1]

For evaluation purpose of the training, ML often split a part of the data to verify against them how good the trained NN is at predicting a result.

With python it’s really easy to do. dataset[9000:] returns the 9000 first rows and dataset[:9000] returns all rows after the 9000 one. The first one are the training data, the last one are the validation data.

This was my most head scratching issue at the beginning. NN expects numbers as inputs, but wasn’t evident for me how to convert a list of words to a list of numbers.

We could pick anything like a dictionary and assign a index to each word, but that wouldn’t be a great idea (or at least not in that case ). All transformations are not equals.

What we actually want is a transform that preserve some meaning of the information. We also want a transformation that is of fixed input dimension. Sentences as they are can’t be used as input because their length vary. Preserving the order of the words in the sentence also is only useful if we are going to exploit this order. The NN we defined doesn’t know how to use that, so preserving the information is useless.

The presentation I chose is to present sentence as a matrix where each column represent one word. In the most simple understanding of this, we could flag 1 if the word is present and 0 if the word isn’t present. Here we use a *tfidf*representation which is a statistically more representative version of the data. I chose that one because it gave better result.

tk = Tokenizer(num_words=10000)

tk.fit_on_texts(texts)

x_train = tk.texts_to_matrix(x_train, mode=”tfidf”)

x_test = tk.texts_to_matrix(x_test, mode=”tfidf”)

What is important to understand is that the way we present data to the NN is going to orient how the NN is going to learn. There are different ways of representing data. I’ll try to write an article about a few of them. Each representation of the data also requires an appropriate NN structure.

As we have exactly one label for each text, we can represent this as a vector as well. It’s usually called a one hot vector. Which means it’s a vector where a category is represented as a column. Here is a one hot vector for the category 2

[0,1,0,0,0,0,0,0]

It is very close to what the keras tokenizer and text_to_matrix, but keras tokenizer reserve the 0 which adds an extra useless column. Most people prefer to use LabelEncoder from sklearn.

from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()

encoder.fit(text_categories)

num_categories = encoder.transform(text_categories)

y_train = keras.utils.to_categorical(num_categories)

That NN as simple as it is performed quite well. It is able to classify articles category with 75% accuracy which is not too bad given that there is 8 categories. It could probably be improved by giving it more articles for each category.

One thing I didn’t thought about was that differences in classification between what I had and what the NN ouputted would be even more interesting than a perfect 100% match. 100% wouldn’t actually learn me anything. Correct matchs were often very close to 100% in the correct category. Invalid match were often more average score, such as news being part actuality, part health for example if it was talking about a new drug. The network actually provide via the softmax the probability for a news to be in some category and it could be helpful in proposing alternative category.

]]>*Nginx allows to do that with auth_request.*

I had a bit of pain to connect all the dots but eventually it ends up adding only two lines of configuration to any route that you want to secure.

- Checking if the request is authorized with
*auth_request.* - Returning the error message from the auth_request by proxying that request on a 401 by displaying a custom error page with
*error_page 401*.

Both request are done on a /auth route that proxy the auth server.

auth_request checks the authentification status and returns 401 unauthorized or 200 authorized. We need to pass Authorization header so that the bearer token is provided to the auth server. We do this with :

proxy_pass_header Authorization;

proxy_set_header Authorization $http_authorization;

We probably don’t need post data when checking auth so we do not forward this content to the auth server with :

proxy_pass_request_body off;

proxy_set_header Content-Length "";

proxy_set_header X-Original-URI $request_uri;

If the auth_request response was a 401 then we show the result of the /auth request as an error page.

error_page 401 =401 /auth;

server {

listen 80 default_server;

server_name _;

index index.php;

rewrite_log on;

root /var/www/html/public;

location / {

auth_request /auth;

error_page 401 =401 /auth;

try_files $uri $uri/ /index.php?$query_string;

}

location ~ \.php$ {

# With php5-fpm:

fastcgi_index index.php;

fastcgi_connect_timeout 3s;

fastcgi_read_timeout 60s;

fastcgi_pass unix:/var/run/php5-fpm.sock;

fastcgi_param SCRIPT_NAME $fastcgi_script_name;

fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;

include /etc/nginx/fastcgi_params;

}

location = /auth {]]>

proxy_pass http://api/v1/oauth2/user;

proxy_pass_header Authorization;

proxy_set_header Authorization $http_authorization;

proxy_pass_request_body off;

proxy_set_header Content-Length "";

proxy_set_header X-Original-URI $request_uri;

}

}

RUN ln -sf /dev/stdout /var/log/nginx/access.log \

&& ln -sf /dev/stderr /var/log/nginx/error.log

However this is not sufficient, because supervisor capture output of processes it manages. Nginx logs now ends up in /var/logs/supervisor/nginx-*.log.

We need to tell supervisor to redirect logs it captures to /dev/stdout.

[program: nginx]

command=nginx -g ‘daemon off;’

autostart=true

autorestart=true

stopsignal=QUIT

exitcodes=0

numprocs=1

startsecs=10

startretries=3stdout_logfile=/dev/stdout

stdout_logfile_maxbytes=0

stderr_logfile=/dev/stderr

stderr_logfile_maxbytes=0

We can now run docker-compose up and watch nginx output

]]>**let **allows to describe scope based variables

**function** declaration are limited in scope to the block where they are defined

**spread** operator expand an array as individual values :

f(…[1,2,3]); expand to f(1,2,3)

function f(x=1,y=2) { return x+y; }

f(undefined,3); returns 4

With array :

var [x,y,z] = [1,2,3,4]

var [y,x] = [x,y]

var [,y,z] = [1,2,3,4]

var [x,…y] = [1,2,3,4]

With objects :

var {x,y,z} = {x:1,y:2,z:3}

var {x:a,y:b,z:c} = {x:1,y:2,z:3}

var {x:o.a,y:o.b,z:o.c} = {x:1,y:2,z:3}

var {x:a[0],y:a[1],z:a[2]} = {x:1,y:2,z:3}

var {x=10,y=11,z=12} = {x:1}

Nested :

var [a,{x,y}] = [1,{x:1,y:2}]

function f({x,y}) {}

f({x:1,y:2})

var o = {

x() {},

y() {}

}

Take care that this can’t work (not because of obvious recursion problem ) but just because x is not defined inside function scope :

var o = {

x() { x(); }

}

var o = {

get id() { return __id; },

set id(id) { __id = id; }

}

console.log(o.id);

var o = {

[prefix+”_var”] : 1,

[prefix+”_func”() {}

}

Object.setPrototypeOf(obj,proto)

`my name is ${name}`

`my name is ${f(name)}`

Arrow functions are great for short inline function but as the function grows it less and less readable.

var f1 = () => 1

var f2 = x => x

var f2 = (x,y) => { return x+y; }

This is kept as pointing to the original object like a bind(this) would, so they are quite practical as callbacks.

var o = {

listen: function() {

btn.addEventListener(() => { this.hello() })

},

hello:function() {

console.log(“hello”);

}

}

But be careful if you chain arrow function notation as this will be the upper original object.

for(var v of [“a”,”b”,”c”]) { console.log(v); } // will print “a”,”b”,”c”

**for…of** also works with iterators, strings, destructured objects

Regex match start can be manually positioned on string with lastIndex, so you don’t have to split a string in piece to test a regex against a long text. It works with the sticky flag **y**

Flags can be queried on a regex

var r = /test/gi

console.log(r.flags); // will print “gi”.

The order of the flags is always **“gimuy”**

**Octal is explicit in strict mode**

0o52 = 42

**Unicode**

Unicode now works in regex, in strings, in variables. String.codePointAt is the equivalent of String.charCodeAt on unicode characters. Position in string will be correctly handled.

Symbols are meant to be used for constants. They are like the :label syntax of Ruby if you know that.

var CONSTANT = Symbol(“APP.CONSTANT”)

The text inside Symbol is just for the description of the symbol. It’s possible to recall a constant from another place in code with :

var CONSTANT = Symbol.for(“APP.CONSTANT”)]]>

]]>sudo docker exec -i $(sudo docker-compose ps -q mysql) mysql -uusername -ppassword -Ddatabase < ../export.sql

A quick way to expose a website is to expose the 8080 on the VirtualBox NAT instead as it is above 1024 and it is not a problem, then to redirect the port 80 with the OSX NAT to the port 8080. I can be done simply that way :

echo “

rdr pass inet proto tcp from any to any port 80 -> 127.0.0.1 port 8080

“ | sudo pfctl -ef –

You can then disable it with :

]]>sudo pfctl -F all -f /etc/pf.conf

*That’s a fairy guys ok ? So that’s ok to say that c++ is beautiful there.*

It’s sharp moves were a legend and troubadours like Merlin Mann and others sing its accomplishments.

Many nights came and went and that proud coder sadly never came. Quicksilver was still locked in his walled garden were never an apple got eaten by more than bite. I was in love with these three panes and dreamed of deploying them in more dimensions.

[NANY](http://www.donationcoder.com/forum/index.php?board=304.0) the challenge to release new app for the new year was coming, asking for an always larger price of new young and innocent apps. Each year, the price was higher. I saw that the piece would be to the taste of the beast. I threw my forces in the battle, building and crafting bytes and pixels out the keyboard. [Iconfinder](http://www.iconfinder.com/) proved to

be a strong ally, bringing it’s carefully referenced free icons adding flavor

to the project.

Legions were sacrified to the monster, but Qatapult was so large that the beast of

the new year couldn’t eat it all alive.

There I am, NANY is over, and Qatapult, my sweet 3 pane launcher is pursuing it’s quest of being the sweet graphical launcher that Quicksilver is in some other land. There walks Qatapult, grabbing new features as it’s creator feels the need for them.

Qatapult is the Quicksilver of the 99%, it’s rewrite for Windows rather than a port. If you’re a fan of keyboard launchers and you tried all of them on Windows you’d like it probably.