June 16, 2014 Simon Raper

Picturing the output of a neural net

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

Some time ago during a training session a colleague asked me what a surface plot of a two input neural net would look like. That is, if you have two inputs x_1 and x_2 and plot the output y as a surface what do you get? We thought it might look like a set of waterfalls stacked on each other.

Tip

For this post I’m going to use draw.io, wolfram alpha and some javascript. Check other tools in the toolbox.

Since neural nets are often considered a black box solution I thought it would be interesting to check this. If you can picture the output for simple case it makes things less mysterious in the more general case.

Let’s look at a neural net with a single hidden layer of two nodes and a sigmoid activation function. In other words something that looks like this. (If you need to catch up on the theory please see this brilliant book by David MacKay that’s free to view online)

Neural net with a single hidden layer

Neural net with a single hidden layer

Drawn using the lovely draw.io

 

Output for a single node

We can break the task down a little by doing a surface plot of a single node, say node 1. We are using the sigmoid activation function so what we are plotting is the application of the sigmoid function to what is essentially a function describing a plane.

The activation (i.e. the plane) is  \alpha_1 = w_4x_1+w_3x_2

Applying the sigmoid function we get

 y_1=\frac{1}{1+e^{-\alpha_1}}

 

Another good tool I use now and then when testing out some maths is to is wolframe alpha. I’m going to pretend that weights w4 and w3 have been fitted as 0.1 and 0.2 and so the equation I input is plot y=1/(1+exp(-(0.1*x1+0.2x2)))

 

Surface plot using the sigmoid function

Surface plot using the sigmoid function

What we see is sort of waterfall shape angled in the direction of the plane. This is what we’d expect. As the alpha values get higher and lower the output of the sigmoid function tends to one and zero respectively.

 

Output for multiple nodes

Lets now look at the output for the whole network. The outputs from nodes 1 and 2 feed into node 3 creating a new activation

 \alpha_3 = w_1*y_1+w_2*y_2

 

This is just a weighted sum of the outputs of the nodes 1 and 2 and so should look as predicted like one waterfall stacked on another (sort of). We can check this out in wolfram alpha by submitting the following (picking say w1=0.8, w2=0.7, w5=0.3, w6=-0.2) plot y=0.8/(1+exp(-(0.1*x1+0.2x2)))+0.7/(1+exp(-(-0.2*x1+0.3*x2)))

Activation for node 3

Activation for node 3

Adding the final sigmoid function to the activation of node 3 doesn’t change things much. It just puts things back on the scale from zero to one.

plot y=1/(1+exp(-(0.8/(1+exp(-(0.1*x1+0.2x2)))+0.7/(1+exp(-(-0.2*x1+0.3*x2))))))

 

Exploring changes in the weights

It would be nice now to see what happens when we change the weights for the inputs. To do this I’ve used the javascript iibrary javascript-surface-plot. The sliders explore only weight ranges between 0.1 to 1 but this is enough to give us an idea of the range of shapes that the output can take. We can see that the two node hidden layer has the potential to divide the input space into roughly four zones of activation which gives us much more complexity compared to say logistic regression. If we imagine the effect of stacking more of these waterfall shapes then we see the potential of additional nodes in the hidden layer.

w1

w2

w3

w4

w5

w6



About the Author

Simon Raper I am an RSS accredited statistician with over 15 years’ experience working in data mining and analytics and many more in coding and software development. My specialities include machine learning, time series forecasting, Bayesian modelling, market simulation and data visualisation. I am the founder of Coppelia an analytics startup that uses agile methods to bring machine learning and other cutting edge statistical techniques to businesses that are looking to extract value from their data. My current interests are in scalable machine learning (Mahout, spark, Hadoop), interactive visualisatons (D3 and similar) and applying the methods of agile software development to analytics. I have worked for Channel 4, Mindshare, News International, Credit Suisse and AOL. I am co-author with Mark Bulling of Drunks and Lampposts - a blog on computational statistics, machine learning, data visualisation, R, python and cloud computing. It has had over 310 K visits and appeared in the online editions of The New York Times and The New Yorker. I am a regular speaker at conferences and events.

Leave a Reply

Your email address will not be published. Required fields are marked *

Machine Learning and Analytics based in London, UK