Notes on Neural Networks

Jun, 12 2012. Updated on May, 4 2025.

Neural networks imitate the cognitive function of the brain to approximate intelligent machines¹. The neuron, shown in the figure below, is the fundamental building block of the human brain. The neuron is a special type of cell that processes and transmits information electrochemically. Individually each neuron receives nerve impulses from preceding connected neurons through connections called dendrites. Each input received is amplified or reduced according to the receiving neuron's learned sensitivity to inputs originating from each connected sender neuron. Within the cell body, the adjusted input signals are aggregated and an output signal is calculated. The output signal is transmitted on the axon. The axon in turn is the connection point for multiple successive neurons.

The ability of the human brain to encode knowledge is realized by signals sent within a complex network of roughly 100 billion neurons, each neuron connected to several thousand other neurons². Each neuron's ability to alter its sensitivity to the various inputs it receives contributes a tiny bit to the overall network's computational capacity.

Artificial neural networks originated from attempts to mimic the learning ability of the human brain³. In this section artificial neural networks are introduced. The first subsection outlines the perceptron and the multilayer perceptron neural network model. The second subsection introduces the back propagation algorithm, a parameter optimization method used to induce neural network classifiers. The third subsection introduces the concept of over fitting, as well as techniques that can be used to prevent over fitting. The last subsection lists advantages and disadvantages of MLP neural network classifiers.

Perceptrons and multilayer perceptrons

In the same manner as the human brain, an artificial neural network is composed of nodes connected by directed network links⁴. The net-input $u_j$ to some node $j$ is computed by combining the input signals $x_1$ to $x_r$ in a weighted sum and adding a node bias $-\theta_j$ . This is,

u_j = \sum_{i=1}^{r}w_{ij}x_i - \theta_j

where the connection weight from input node $i$ to node $j$ is $w_{ij}$ and the bias is $\theta_j$ . By introducing an extra dummy input signal $x_{0j}$ equal to one, the combination function can be written as⁵

u_j = \sum_{i=0}^{r}w_{ij}x_i

with $w_{0j}=-\theta_j$ .

The node output $a_j$ is $1$ if the net-input $u_j$ is greater than or equal to $0$ , else the node output is $0$ . This hard limit function, or step function, can be written as

a_j = \begin{Bmatrix} 0 & \text{if}& u_j < 0 \Rightarrow \sum_{i=1}^{r}w_{ij}x_i < \theta_j \\ 1 & \text{if} & u_j \ge 0 \Rightarrow \sum_{i=1}^{r}w_{ij}x_i \ge \theta_j \\ \end{Bmatrix}.

Such a node, called a perceptron, is depicted in the figure below.

A feed-forward neural network is composed of multiple perceptrons interconnected in such a way as to allow the output of one perceptron to be the input to another perceptron⁵. The connections in a feed-forward network only allow signals to pass in one direction. Specifically, the multilayer perceptron (MLP) neural network is the most widely studied and used feed-forward neural network model³. Such a neural network is structured as three or more distinct layers: an input layer, one or more hidden layers, and an output layer. Each layer constitutes a set of nodes, with nodes from different layers being highly interconnected in a feed-forward manner: All nodes from the input layer are connected to all nodes in the first hidden layer, all nodes in the first hidden layer are connected to all nodes in the second hidden layer, and so forth, until all nodes in the last hidden layer are connected to all nodes in the output layer. The neural network maintains a vector of connection weights used to adjust the input signal as it propagates through the network towards the output layer. The structure of a three-layer MLP neural network is shown in the figure below.

MLP — The multilayer perceptron neural network

This MLP neural network receives $r+1$ input signals at the input layer nodes. The input vector $\vec{x} \in R^{r+1}$ represents the $r$ input features $x_1$ to $x_r$ and the bias signal $x_0$ set equal to one. The input signal is propagated forward to $J$ hidden layer nodes. Each hidden layer node functions like a perceptron. The node input is computed as the weighted sum of inputs received from all preceding nodes. That is $u_j = \sum_{i=0}^{r}w_{ij}x_i$ where $w_{ij}$ is the weight of the connection between input node $x_i$ and hidden node $z_j$ .

The hidden layer's output is computed by passing the weighted sum of inputs $u_j$ through an activation function $f^h(u_j)$ . This activation function can be any monotonically increasing function, such as the symmetric sigmoid function⁶,

z_j = f^h(u_j) = \tanh(u_j) = \frac{1+e^{-2u_j}}{1-e^{-2u_j}}.

The output of the hidden layer is a vector $\vec{z} \in R^{J+1}$ with the bias signal $z_0$ set to one. The vector $\vec{z}$ supplies inputs to the output layer with $L$ nodes.

Similar to the nodes in the hidden layer, each output layer node functions like a perceptron. Input signals from the hidden layer nodes are combined in a weighted sum $u_l = \sum_{j=0}^{J}w_{jl} z_j$ where $w_{jl}$ is the weight of the connection from hidden node $z_j$ to output layer node $\hat{y}_l$ . The output layer produces a vector $\hat{\vec{y}} \in R^{L}$ . Each output node produces output $\hat{y}_l$ by passing the weighted sum of inputs $u_l$ through a monotonically increasing activation function $f^o(u_l)$ . For example, a logistic activation function can be used,

\hat{y}_l = f^o(u_l) = \frac{1}{1+e^{-u_l}}.

The entire MLP model can be represented as the following non-linear function,

\hat{y}_l = h_l(\vec{w},\vec{x}) = f^o(\sum_{j=0}^{J} w_{jl} f^h(\sum_{i=0}^{r} w_{ij}x_i))

for output $l=1 \dots L$ , where $f^h$ and $f^0$ are the activation functions in the output layer and hidden layer respectively, and $\vec{w}\in R^{J(r+1)+L(J+1)}$ is the vector of connection weights⁴.

Each connection weight $w \in \vec{w}$ is used to adjust the input signal as it propagates through the network towards the output layer. By maintaining a weight for each connection in the network, a signal originating from a node can be very important in generating the output of one node and unimportant in generating the output of another node. It is through this process of connection specific signal weighting that a neural network has its predictive power. It has been shown that a three-layer MLP neural network with enough hidden layer nodes can approximate any continuous function with any desired degree of accuracy⁷.

The process in which the optimal weights vector ${\vec{w}}^*$ is determined, hence the way the network ``learns", is commonly referred to as network training. The back propagation algorithm is a popular technique used for network training. Gaining a general understanding of this algorithm is important due to its significance in neural network literature and because it is used in the empirical study of this dissertation. The back propagation algorithm is outlined in the next subsection.

Notes

Published as part of Wilgenbus, E.F., 2013. The file fragment classification problem: a combined neural network and linearprogramming discriminant model approach. Masters thesis, North West University.

Footnotes and references

Ramlall, I., 2010. Artifcial intelligence: Neural networks simplifed. InternationalResearch Journal of Finance and Economics (39), 105-120. ↩
Olson, D. L., Shi, Y., 2006. Introduction to Business Data Mining. Irwin-McGraw-Hill Series: Operations and Decision Sciences. McGraw-Hill. ↩
Zhang, G. P., 2010. Neural Networks For Data Mining. Data Mining and Knowledge Discovery Handbook. 2nd Edition. Springer Science and Business Media, Ch. 12, pp. 419-444. ↩ ↩²
Russel, S., Novig, P., 2010. Artifcial Intelligence: A modern approach, 3rd Edition. Pearson. ↩ ↩²
Bishop, C. M., 1995. Neural Networks for Pattern Recognition. Oxford University Press. ↩ ↩²
Steeb, W. H., 2005. The Nonlinear Workbook: Chaos, Fractals, Cellular Automata, Neural Networks, Genetic Algorithms, Gene Expression Programming, Support Vector Machine, Wavelets, Hidden Markov Models, Fuzzy Logic with C++, Java and Symbolic++ Programs, 3rd Edition. World Scientific. ↩
Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are universal approximators. Neural networks 2 (5), 359-366. ↩