Soft Computing

, Volume 21, Issue 3, pp 597–609 | Cite as

Closed determination of the number of neurons in the hidden layer of a multi-layered perceptron network

Focus
  • 158 Downloads

Abstract

Multi-layered perceptron networks (MLP) have been proven to be universal approximators. However, to take advantage of this theoretical result, we must determine the smallest number of units in the hidden layer. Two basic theoretically established requirements are that an adequate activation function be selected and a proper training algorithm be applied. We must also guarantee that (a) The training data compile with the demands of the universal approximation theorem (UAT) and (b) The amount of information present in the training data be determined. We discuss how to preprocess the data in order to meet such demands. Once this is done, a closed formula to determine H may be applied. Knowing H implies that any unknown function associated to the training data may, in practice, be arbitrarily approximated by a MLP. We take advantage of previous work where a complexity regularization approach tried to minimize the RMS training error. In that work, an algebraic expression of H is attempted by sequential trial-and-error. In contrast, here we find a closed formula \(H=f(m_{O}, N)\) where \(m_{O}\) is the number of units in the input layer and N is the effective size of the training data. The algebraic expression we derive stems from statistically determined lower bounds of H in a range of interest of the \((m_{O}, N)\) pairs. The resulting sequence of 4250 triples \((H, m_{O}, N)\) is replaced by a single 12-term bivariate polynomial. To determine its 12 coefficients and the degrees of the 12 associated terms, a genetic algorithm was applied. The validity of the resulting formula is tested by determining the architecture of twelve MLPs for as many problems and verifying that the RMS error is minimal when using it to determine H.

Keywords

Neural networks Perceptrons Information theory Genetic algorithms 

References

  1. Alistair M (1990) Implementing the PPM data compression scheme. IEEE Trans Commun 38(11):1917–1921CrossRefGoogle Scholar
  2. Ash T (1989) Dynamic node creation in backpropagation networks. Connect Sci 1(4):365–375CrossRefGoogle Scholar
  3. Barron AR (1994) Approximation and estimation bounds for artificial neural networks. Mach Learn 14:115–133MATHGoogle Scholar
  4. Bohanec M, Rajkovic V (1990) Expert system for decision making. Sistemica 1(1):145–157. https://archive.ics.uci.edu/ml/datasets/Car+Evaluation
  5. Cheney EW (1966) Introduction to approximation theory. McGraw-Hill, New York, pp 45–51Google Scholar
  6. Ein-Dor P, Feldmesser Ein-Dor J Computer Hardware Data Set. Faculty of Management, Ramat-Aviv. https://archive.ics.uci.edu/ml/datasets/Computer+Hardware
  7. Fahlman SE (1988) An empirical study of learning speed in back propagation networks. In: Proceedings of the 1988 Connectionist Models Summer School, Morgan KaufmanGoogle Scholar
  8. Fanaee-T H Laboratory of Artificial Intelligence and Decision Support (LIAAD), University of Porto. https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
  9. Fletcher L, Katkovnik V, Steffens FE, Engelbrecht AP (1998) Optimizing the number of hidden nodes of a feedforward artificial neural network. In: Proceedings of the IEEE International Joint Conference on Neural Networks, vol 2, pp 1608–1612Google Scholar
  10. Forina M et al Wine data set. PARVUS, Via Brigata Salerno. https://archive.ics.uci.edu/ml/datasets/Wine
  11. Funahashi KI, Nakamura Y (1993) Approximation of dynamical systems by continuous time recurrent neural networks. Neural Netw 6(6):801–806CrossRefGoogle Scholar
  12. George C (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314MathSciNetCrossRefMATHGoogle Scholar
  13. Gong G Carnegie-Mellon University, Bojan Cestnik, Jozef Stefan Institute. https://archive.ics.uci.edu/ml/datasets/Hepatitis
  14. Haykin SS et al (2009) Neural networks and learning machines, vol 3. Pearson Education, Upper Saddle RiverGoogle Scholar
  15. Hearst MA, Dumais ST, Osman E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28Google Scholar
  16. Hecht-Nielsen R(1989) Theory of the backpropagation neural network. In: IEEE International Joint Conference on Neural Networks, 1989. IJCNN. pp 593–605Google Scholar
  17. Hirose Y, Yamashita IC, Hijiya S (1991) Back-propagation algorithm which varies the number of hidden units. Neural Netw 4:61–66CrossRefGoogle Scholar
  18. Jau-hari S, Morankar A, Fokoue E Rochester Institute of Technology. https://archive.ics.uci.edu/ml/datasets/Tennis+Major+Tournament+Match+Statistics
  19. Kohavi R, Becker B Data mining and visualization. Silicon graphics. https://archive.ics.uci.edu/ml/datasets/Census+Income
  20. Kuri-Morales A, Aldana-Bobadilla E (2013) The best genetic algorithm I. In: Advances in soft computing and its applications. Springer, Berlin, pp 1–15Google Scholar
  21. Kuri-Morales A, Cartas-Ayala A (2014) Polynomial multivariate approximation with genetic algorithms. In: Canadian Conference on Artificial Intelligence. Springer International Publishing, pp 307–312Google Scholar
  22. Kuri-Morales A, Aldana-Bobadilla E, López-Peña I (2013) The best genetic algorithm II. In: Advances in soft computing and its applications. Springer, Berlin, pp 16–29Google Scholar
  23. Kurt H, Maxwell S, Halbert W (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366CrossRefGoogle Scholar
  24. Li M, Vitányi P (1997) An introduction to Kolmogorov complexity and its applications, 2nd edn. Springer, New YorkCrossRefMATHGoogle Scholar
  25. Medeiros CMS, Guilherme AB (2013) A novel weight pruning method for MLP classifiers based on the MAXCORE principle. Neural Comput Appl 22(1):71–84CrossRefGoogle Scholar
  26. Nakai Kenta Institue of Molecular and Cellular Biology, Osaka, University. https://archive.ics.uci.edu/ml/datasets/Yeast
  27. Nash Warwick J, Sellers Tracy L, Talbot Simon R, Cawthorn Andrew J, Ford Wes B (1994) The Population Biology of Abalone (_Haliotis_ species) in Tasmania. I. Blacklip Abalone (_H. rubra_) from the North Coast and Islands of Bass Strait. Sea Fisheries Division, Technical Report No. 48 (ISSN 1034-3288). https://archive.ics.uci.edu/ml/datasets/Abalone
  28. Networks N (1999) A comprehensive foundation, 2nd edn. Ch. 4, p 294, Notes and References 8, Prentice Hall InternationalGoogle Scholar
  29. Noboru M, Shuji Y, Shun-ichi A (1994) Network information criterion-determining the number of hidden units for an artificial neural network model. IEEE Trans Neural Netw 5(6):865–872CrossRefGoogle Scholar
  30. Park J, Sandberg IW (1991) Universal approximation using radial-basis-function networks. Neural Comput 3(2):246–257CrossRefGoogle Scholar
  31. Reed R (1993) Pruning algorithms a survey. IEEE Trans Neural Netw 4(5):707–740CrossRefGoogle Scholar
  32. Rivals I, Personnaz L (2000) A statistical procedure for determining the optimal number of hidden neurons of a neural model. In: Second International Symposium on Neural Computation (NC’2000), Berlin, May 23–26Google Scholar
  33. Saw JG, Yang MC, Mo TC (1984) Chebyshev inequality with estimated mean and variance. Am Stat 38(2):130–132MathSciNetGoogle Scholar
  34. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117CrossRefGoogle Scholar
  35. Shampine LF, Allen RC (1973) Numerical computing: an introduction. Harcourt Brace College Publishers, San DiegoMATHGoogle Scholar
  36. Teoh EJ, Tan KC, Xiang C (2006) Estimating the number of hidden neurons in a feedforward network using the singular value decomposition. IEEE Trans Neural Netw 17(6):1623–1629CrossRefGoogle Scholar
  37. Vladimir V (2000) The nature of statistical learning theory. Springer, BerlinGoogle Scholar
  38. Xin Y (1999) Evolving artificial neural networks. IEEE Proc 87(9):1423–1447CrossRefGoogle Scholar
  39. Xu L (1995) Ying-Yang machine: a Bayesian- Kullback scheme for unified learnings and new results on vector quantization. In: Keynote talk, Proceedings of International Conference on Neural Information Processing (ICONIP95), Oct. 30–NOV. 3, pp 977–988Google Scholar
  40. Xu L (1997) Bayesian Ying-Yang System and Theory as A Unified Statistical Learning Approach: (III) Models and Algorithms for Dependence Reduction, Data Dimension Reduction, ICA and Supervised Learning. Lecture Notes in Computer Science: Proc. Of International Workshop on Theoretical Aspects of Neural Computation, May 26–28, 1997, Hong Kong, Springer, pp 43–60Google Scholar
  41. Xu S, Chen L (2008) Novel approach for determining the optimal number of hidden layer neurons for FNN’s and its application in data mining. In: International Conference on Information Technology and Applications: iCITA. 2008. pp 683–686Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Instituto Tecnológico Autónomo de MéxicoMexicoMexico

Personalised recommendations