Skip to main content

Order in the Black Box: Consistency and Robustness of Hidden Neuron Activation of Feed Forward Neural Networks and Its Use in Efficient Optimization of Network Structure

  • Chapter
  • First Online:
Artificial Neural Network Modelling

Part of the book series: Studies in Computational Intelligence ((SCI,volume 628))

  • 7843 Accesses

Abstract

Neural networks are widely used for nonlinear pattern recognition and regression. However, they are considered as black boxes due to lack of transparency of internal workings and lack of direct relevance of its structure to the problem being addressed making it difficult to gain insights. Furthermore, structure of a neural network requires optimization which is still a challenge. Many existing structure optimization approaches require either extensive multi-stage pruning or setting subjective thresholds for pruning parameters. The knowledge of any internal consistency in the behavior of neurons could help develop simpler, systematic and more efficient approaches to optimise network structure. This chapter addresses in detail the issue of internal consistency in relation to redundancy and robustness of network structure of feed forward networks (3-layer) that are widely used for nonlinear regression. It first investigates if there is a recognizable consistency in neuron activation patterns under all conditions of network operation such as noise and initial weights. If such consistency exists, it points to a recognizable optimum network structure for given data. The results show that such pattern does exist and it is most clearly evident not at the level of hidden neuron activation but hidden neuron input to the output neuron (i.e., weighted hidden neuron activation). It is shown that when a network has more than the optimum number of hidden neurons, the redundant neurons form clearly distinguishable correlated patterns of their weighted outputs. This correlation structure is exploited to extract the required number of neurons using correlation distance based self organising maps that are clustered using Ward clustering that optimally cluster correlated weighted hidden neuron activity patterns without any user defined criteria or thresholds, thus automatically optimizing network structure in one step. The number of Ward clusters on the SOM is the required optimum number of neurons. The SOM/Ward based optimum network is compared with that obtained using two documented pruning methods: optimal brain damage and variance nullity measure to show the efficacy of the correlation approach in providing equivalent results. Also, the robustness of the network with optimum structure is tested against perturbation of weights and confidence intervals for weights are illustrated. Finally, the approach is tested on two practical problems involving a breast cancer diagnostic system and river flow forecasting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. S. Samarasinghe, Neural Networks for Applied Sciences and Engineering-From Fundamentals to Complex Pattern Recognition (CRC Press, 2006)

    Google Scholar 

  2. C. Bishop, Neural Networks for Pattern Recognition (Clarendon Press, Oxford, UK, 1996)

    MATH  Google Scholar 

  3. S. Haykin, Neural Networks: A comprehensive Foundation, 2nd edn. (Prentice Hall Inc, New Jersey, USA, 1999)

    MATH  Google Scholar 

  4. R. Reed, Pruning algorithms-A survey. IEEE Trans. Neural Networks 4, 740–747 (1993)

    Article  Google Scholar 

  5. Y. Le Cun, J.S. Denker, S.A. Solla, Optimal brain damage, in Advances in Neural Information Processing (2), ed. by D.S. Touretzky (1990), pp. 598–605

    Google Scholar 

  6. B. Hassibi, D.G. Stork, G.J. Wolff, Optimal brain surgeon and general network pruning. IEEE International Conference on Neural Networks, vol. 1, (San Francisco, 1992), pp. 293–298

    Google Scholar 

  7. B. Hassibi, D.G. Stork, Second-order derivatives for network pruning: Optimal brain surgeon, in Advances in Neural Information Processing Systems, vol. 5, ed. by C. Lee Giles, S.J. Hanson, J.D. Cowan, (1993), pp. 164–171

    Google Scholar 

  8. A.P. Engelbrecht, A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans. Neural Networks 12(6), 1386–1399 (2001)

    Article  Google Scholar 

  9. K. Hagiwara, Regularization learning, early stopping and biased estimator. Neurocomputing 48, 937–955 (2002)

    Article  MATH  Google Scholar 

  10. M. Hagiwara, Removal of hidden units and weights for backpropagation networks. Proc. Int. Joint Conf. Neural Networks 1, 351–354 (1993)

    Google Scholar 

  11. F. Aires, Neural network uncertainty assessment using Bayesian statistics with application to remote sensing: 1. Network weights. J. Geophys. Res. 109, D10303 (2004). doi:10.1029/2003JD004173

    Article  Google Scholar 

  12. F. Aires, Neural network uncertainty assessment using Bayesian statistics with application to remote sensing: 2. Output Error. J. Geophys. Res. 109, D10304 (2004). doi:10.1029/2003JD004174

    Article  Google Scholar 

  13. F. Aires, Neural network uncertainty assessment using Bayesian statistics with application to remote sensing: 3. Network Jacobians. J. Geophys. Res. 109, D10305 (2004). doi:10.1029/2003JD004175

    Article  Google Scholar 

  14. K. Warne, G. Prasad, S. Rezvani, L. Maguire, Statistical computational intelligence techniques for inferential model development: A comparative evaluation and novel proposition for fusion. Eng. Appl. Artif. Intell. 17, 871–885 (2004)

    Article  Google Scholar 

  15. I. Rivals, L. Personnaz, Construction of Confidence Intervals for neural networks based on least squares estimation. Neural Networks 13, 463–484 (2000)

    Article  Google Scholar 

  16. E.J. Teoh, K.C. Tan, C. Xiang, Estimating the number of hidden neurons in a feed forward network using the singular value decomposition IEEE Trans. Neural Networks 17(6), (2006)

    Google Scholar 

  17. C. Xian, S.Q. Ding, T.H. Lee, Geometrical interpretation and architecture selection of MLP, IEEE Trans. Neural Networks 16(1), (2005)

    Google Scholar 

  18. P.A. Castillo, J. Carpio, J.J. Merelo, V. Rivas, G. Romero, A. Prieto, Evolving multilayer perceptrons. Neural Process. Lett. 12(2), 115–127 (2000)

    Article  MATH  Google Scholar 

  19. X. Yao, Evolutionary artificial neural networks. Proc. IEEE 87(9), 1423–1447 (1999)

    Article  Google Scholar 

  20. S. Samarasinghe, Optimum Structure of Feed Forward Neural Networks by SOM Clustering of Neuron Activations. Proceedings of the International Modelling and Simulation Congress (MODSM) (2007)

    Google Scholar 

  21. Neural Networks for Mathematica, (Wolfram Research, Inc. USA, 2002)

    Google Scholar 

  22. J. Sietsma, R.J.F. Dow, Creating artificial neural networks that generalize. Neural Networks 4(1), 67–77 (1991)

    Article  Google Scholar 

  23. Machine learning framework for Mathematica. 2002 Uni software Plus. www.unisoftwareplus.com

  24. J.H. Ward Jr, Hierarchical grouping to optimize an objective function. J. Am Stat. Assoc. 58, 236–244 (1963)

    Article  MathSciNet  Google Scholar 

  25. K. Hornik, M. Stinchcombe, H. White, Universal approximation of an unknown mapping and its derivatives using multi-layer feedforard networks. Neural Networks 3, 551–560 (1990)

    Article  Google Scholar 

  26. A.R. Gallant, H. White, On learning the derivative of an unknown mapping with multilayer feedforward networks. Neural Networks 5, 129–138 (1992)

    Article  Google Scholar 

  27. A. Al-yousef, S. Samarasinghe, Ultrasound based computer aided diagnosis of breast cancer: Evaluation of a new feature of mass central regularity degree. Proceedings of the International Modelling and Simulation Congress (MODSM) (2011)

    Google Scholar 

  28. S. Samarasinghe, Hydrocomplexity: New Tools for Solving Wicked Water Problems Hydrocomplexité: Nouveaux outils pour solutionner des problèmes de l’eau complexes (IAHS Publ. 338) (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandhya Samarasinghe .

Editor information

Editors and Affiliations

Appendix: Algorithm for Optimising Hidden Layer of MLP Based on SOM/Ward Clustering of Correlated Weighted Hidden Neuron Outputs

Appendix: Algorithm for Optimising Hidden Layer of MLP Based on SOM/Ward Clustering of Correlated Weighted Hidden Neuron Outputs

  1. I.

    Train an MLP with a relatively larger number of hidden neurons

  2. 1.

    For input vector X, the weighted input uj and output yj of hidden neuron j are:

    $$\begin{aligned} u_{j} & = a_{0\,j} + \sum\limits_{i = 1}^{n} {a_{ij} x_{i} } \\ y_{j} & = f\left( {u_{j} } \right) \\ \end{aligned}$$

    where a oj is bias weight and a ij are input-hidden neuron weights. f is transfer function.

  3. 2.

    The net input v k and output z k of output neuron k are:

    $$\begin{aligned} v_{k} & = b_{0k} + \sum\limits_{j = 1}^{m} {b_{jk} y_{j} } \\ z_{k} & = f\left( {v_{k} } \right) \\ \end{aligned}$$

    where b ok is bias weight and b jk are hidden-output weights.

  4. 3.

    Mean Square error (MSE) for the whole data set is:

    $$MSE = \frac{1}{2N}\left[ {\sum\limits_{i = 1}^{N} {\left( {t_{i} - z_{i} } \right)^{2} } } \right]$$

    where t is target and N is the sample size.

  5. 4.

    Weights are updated using a chosen method of least square error minimisation, such as Levenberg Marquardt method:

    $$w_{m} = w_{m - 1} - \varepsilon \,Rd_{m}$$

    where dm is sum of error gradient of weight w for epoch m, R is inverse of curvature, and ε is learning rate.

  6. 5.

    Repeat the process 1 to 4 until minimum MSE is reached using training, calibration (testing) and validation data sets.

  7. II.

    SOM clustering of weighted hidden neuron outputs

Inputs to SOM

An input vector X j into SOM is:

$$X_{j} = y_{j} b_{j} ;$$

where y j is output of hidden neuron j and b j is its weight to output neuron in MLP. Length n of the vector Xj is equal to the number of samples in the original dataset.

Normalise X j to unit length

SOM training

  1. 1.

    Projecting weighted output of hidden neurons onto a Self Organising Map:

    $$u_{j} = \sum\limits_{i = 1}^{n} {w_{ij} x_{i} }$$

    where u j is output of SOM neuron j and w ij is its weight with input component x i

  2. 2.

    Winner selection: Select winner neuron based on the minimum correlation distance between an input vector and SOM neuron weight vectors (same as Euclidean distance for normalised input vectors)

    $$\begin{aligned} d_{\,\,j} & = {\text{x}} - {\text{w}}_{\,\,j} \\ & \sqrt {\sum\limits_{i}^{n} {\left( {x_{i} - w_{ij} } \right)^{2} } } \\ \end{aligned}$$
  3. 3.

    Update of weights of winner and neighbours at iteration t:

    Select neighbourhood function NS(d, t) (such as Gaussian) and learning rate function β(t) (such as exponential or linear) where d is distance from winner to a neighbour neuron and t is iteration.

    $${\text{w}}_{\text{j}} \left( t \right) = {\text{w}}_{\text{j}} \left( {t - 1} \right) + \beta \left( t \right)NS\left( {d,t} \right)\left[ {{\text{x}}\left( t \right) - {\text{w}}_{\text{j}} \left( {t - 1} \right)} \right]$$
  4. 4.

    Repeat the process until mean distance D between weights W i and inputs x n is minimum.

    $$D = \sum\limits_{i = 1}^{k} {\sum\limits_{{n \in c_{i} }} {\left( {{\text{x}}_{n} - {\text{w}}_{i} } \right)^{2} } }$$

    where k is number of SOM neurons and ci is the cluster of inputs represented by neuron i

  1. III.

    Clustering of SOM neurons

Ward method minimizes the within group sum of squares distance as a result of joining two possible (hypothetical) clusters. The within group sum of squares is the sum of square distance between all objects in the cluster and its centroid. Two clusters that produce the least sum of square distance are merged in each step of clustering. This distance measure is called the Ward distance (d ward ) and is expressed as:

$$d_{wand} = \frac{{\left( {n_{r}^{ * } n_{s} } \right)}}{{\left( {n_{r} + n_{s} } \right)}}\left\| {{\text{x}}_{\text{r}} - {\text{x}}_{\text{s}} } \right\|^{2}$$

where x r and x s are the centre of gravity of two clusters. n r and n s are the number of data points in the two clusters.

The centre of gravity of the two merged clusters x r(new) is calculated as:

$${\text{x}}_{{r\left( {new} \right)}} = \frac{1}{{n_{r} + n_{s} }}\left( {n_{r}^{ * } {\text{x}}_{r} + n_{s}^{ * } {\text{x}}_{s} } \right)$$

The likelihood of various numbers of clusters is determined by WardIndex as:

$$WardIndex = \frac{1}{NC}\left( {\frac{{d_{t} - d_{t - 1} }}{{d_{t - 1} - d_{t - 2} }}} \right) = \frac{1}{NC}\left( {\frac{{\Delta d_{t} }}{{\Delta d_{t - 1} }}} \right)$$

where d t is the distance between centres of two clusters to be merged at current step and d t-1 and d t-2 are such distances in the previous two steps. NC is the number of clusters left.

The numbers of clusters with the highest WardIndex is selected as the optimum.

  1. IV.

    Optimum number of hidden neurons in MLP

The optimum number of hidden neurons in the original MLP is equal to this optimum number of clusters on the SOM.

Train an MLP with the above selected optimum number of hidden neurons.

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Samarasinghe, S. (2016). Order in the Black Box: Consistency and Robustness of Hidden Neuron Activation of Feed Forward Neural Networks and Its Use in Efficient Optimization of Network Structure. In: Shanmuganathan, S., Samarasinghe, S. (eds) Artificial Neural Network Modelling. Studies in Computational Intelligence, vol 628. Springer, Cham. https://doi.org/10.1007/978-3-319-28495-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28495-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28493-4

  • Online ISBN: 978-3-319-28495-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics