Perceptrons got a lot of folks excited back in the day. However, there was a bit of a problem. Say we are trying to classify digits correctly using a network of perceptrons. Our model is given an image of the digit "1" and incorrectly classifies it as a "2". Why don't we just go ahead and tweak the weights and bias to improve our model's classification? Well we can, but by doing so we get unpredictable results. For example, say we do tune the parameters and correctly predict "1" in the next training run, but now our model is performing poorly on other digits. We've essentially kicked the can down the road, and our model begins to juggle between different weights and biases with no significant gains in accuracy across all digits.

This sensitivity to change quickly becomes unwieldy when tackling machine learning tasks, even the slightest change to the parameters will produce a completely different output. Instead, we need a model that gives us greater control of the output so we are able to fine tune the model to achieve better performance. In other words, for any small change in the weights and bias, we only want a small change in the output.

With this simple idea, the sigmoid neuron was born. The math can be kinda scary at first but some generous folks have broken it down into edible chunks for us. I'll summarise for you. The sigmoid has similar qualities to a perceptron but a few key differences. Like it's sibling, the sigmoid neuron still takes inputs $x_1, x_2, ..., x_j$, however $x_j$ can be any value between $0$ and $1$ (e.g. 0.7456). Weights are still assigned to each input $w_1, w_2, ..., w_3$, and the overall bias remains. However, the output is no longer a 0 or 1, instead it's any number between 0 and 1 depending on the magnitude of $w.x + b$. This is because of the $\sigma$ being applied to get $\sigma(w.x +b)$. So what does the sigmoid part of it actually do? It does this: $$\sigma(z) \equiv \frac{1}{1+e^{-z}}$$

Now let's go another step further to make things more explicit. The output of a sigmoid neuron with the inputs, weights and bias detailed above is: $$\frac{1}{1 + \exp(-\sum_jw_jx_j-b)}$$

Wowza. It's starting to look pretty funky. Despite our function looking more complex, the sigmoid neuron still possesses similarities to the perceptron. For instance, say $z$, or $w.x+b$ is a big number - $x^{-z}=2.718281828^{-100}=0$. By finishing the calculation, we produce 1 as the output - feel free to pause here and calculate this yourself. This demonstrates that when $w.x + b$ is large and positive, the output will be 1. On the other hand, if the input is large and negative, the output will be 0. This is awfully similar to the perceptron, right? Where the sigmoid differentiates itself, is when z lands in the between these two extremes, with subtle tweaks to the weights and biases producing small corresponding updates to the output.

This post was developed with the help of:

Michael Nielsen, Neural Networks and Deep Learning link