Let's start with one of the earliest forms of machine learning, the perceptron. It make's sense to start here given modern day neural networks evolved from this point. The perceptron was inspired by neurons in the brain; the idea being that a neuron's functionality is represented using a function that takes a set of inputs and generates an output. So how does it work? Let's visualise it first:

Perceptron

Here we can see a perceptron takes a set of binary inputs, $x_1,x_2,...,x_j$, and multiplies each input by an assigned weight $w_1, w_2 \text{,...,} w_j$. Each input's weight is based on importance; the larger the weight, the greater the importance of the input. To represent this algebraically, we get the following:

$$ y = \sum_j w_jx_j$$

If the output, $y$, is greater than the defined threshold, the output will be $1$, otherwise it will be $0$. So let's extend our equation to represent this:

$$ output = \begin{cases} 1 & \text{if} & \sum_j w_jx_j \gt \text{threshold} \\ 0 & \text{if} & \sum_j w_jx_j \le \text{threshold} \end{cases} $$

I think at this juncture we can cement this knowledge with an example. Let's use a perceptron as a decision making model for deciding whether we go watch the West Coast Eagles versus North Melbourne Kangaroos. Our model will take in three binary inputs:

  1. Is it raining or not?
  2. Are any friends or family available to join us?
  3. Is the opponent a competitive side (i.e. top 8)?

I don't like heading to the footy in the rain, in light of this we assign a weight of 6 to #1. Moreover, I like to have company when I head to the footy, so I assign a weight of 4 to #2. Lastly, I prefer to go to the important games against top 8 sides so we assign a weight of 2. We now have the following weights: $w_1=6, w_2=4, w_3=2$. For my decision threshold, I will assign a 5, in other words, if my model output is greater than 5 I will attend the footy.

We are just informed that the weather is going to be sunny, however no friends or family are available to join me. And well, North Melbourne aren't the side they used to be, sitting last on the ladder. This provides the following inputs to my model: $x_1 = 1, x_2=0, x_3=0$. Let's go ahead and calculate the output:

$$\ 6*1 + 4*0 + 2*0 = 6$$

6 is greater than my threshold, we get a 1 as output, and I'm heading to the footy! The model clearly appreciates sunny weather and the availability of company, if neither are true, it's likely this model will suggest I stay home. I'm a little more passionate about the Eagles than this example let's on but you get the gist. It's important to note that by changing the weights and threshold, we will get different models of decision making.

Perceptrons provide a building block for thinking about and building multi-layer networks. Like a web of lego, we can leverage perceptrons to build multi-layered models, like so:

Multi-layer Network

The above example illustrates an initial layer taking a set of inputs and producing a set of n decisions or outputs (4 in this case). The subsequent layer then takes those outputs as inputs to determine a final decision. We can think of the first layer as making a set of simple decisions, which then allows the next layer to conduct more subtle or complex decisions. This layered approach can be extended both by the number of layers and nodes to achieve complex and sophisticated decision making models.

To finish, let's put the final touches on how we represent the perceptron mathematically. We will move the threshold to the other side of the inequality and introduce the bias ($b$), then lastly we will exchange the sum for the dot product - enter linear algebra. This prepares us nicely for some of the notation coming up in deep learning.

$$ output = \begin{cases} 1 & \text{if} & w.x + b \gt 0 \\ 0 & \text{if} & w.x + b \le 0 \end{cases} $$

This post was developed with the help of: