Advertisement

Architectural Parameter-Independent Network Initialization Scheme for Sigmoidal Feedforward ANNs

  • Sarfaraz MasoodEmail author
  • M. N. Doja
  • Pravin Chandra
Research Article - Computer Engineering and Computer Science
  • 14 Downloads

Abstract

The selection of the initial network weights has been a known key aspect affecting the convergence of sigmoidal activation function-based artificial neural networks. In this paper, a new network initialization scheme has been proposed that initializes the network weights such that activation functions in the network are not saturated initially. The proposed method ensures that the initial outputs of the hidden neurons are in the active region which positively impacts the network’s rate of convergence. Unlike most of the earlier proposed initialization schemes, this method does not depend on architectural parameters like the size of the input layer or the hidden layer. The performance of the proposed scheme has been compared against eight well-known weight initialization routines over six benchmark real-world problems. Results show that the proposed weight initialization routine enables the network to achieve better performance within the same count of network training epochs. A right-tailed t-test also shows that our proposed scheme is significantly better in most of the cases against the other techniques or statistically similar in a few cases but never underperforms. Hence, it may be considered as a strong alternative to the conventional neural network initialization techniques.

Keywords

Neural networks Sigmoidal feedforward neural networks Backpropagation Network weight initialization 

References

  1. 1.
    Hornik, K.; Stinchcombe, M.; White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)CrossRefGoogle Scholar
  2. 2.
    Hornik, K.; Stinchcombe, M.; White, H.: Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw. 3(5), 551–560 (1990)CrossRefGoogle Scholar
  3. 3.
    Masood, S.; Gupta, S.; Wajid, A.; Gupta, S.; Ahmed, M.: Prediction of human ethnicity from facial images using neural networks (2018)Google Scholar
  4. 4.
    Masood, S.; Gupta, S.; Khan, S.: Novel approach for musical instrument identification using neural network. In: 2015 Annual IEEE India Conference (INDICON), pp. 1–5 (2015)Google Scholar
  5. 5.
    Harun, N.H.; Yusof, Y.; Hassan, F.; Embong, Z.: Classification of fundus images for diabetic retinopathy using artificial neural network. In: 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), pp. 498–501 (2019)Google Scholar
  6. 6.
    Aljurayfani, M.; Alghernas, S.; Shargabi, A.: Medical self-diagnostic system using artificial neural networks. In: 2019 International Conference on Computer and Information Sciences (ICCIS), pp. 1–5 (2019)Google Scholar
  7. 7.
    Khashei, M.; Bijari, M.: An artificial neural network (p, d, q) model for timeseries forecasting. Expert Syst. Appl. 37(1), 479–489 (2010)CrossRefGoogle Scholar
  8. 8.
    Doucoure, B.; Agbossou, K.; Cardenas, A.: Time series prediction using artificial wavelet neural network and multi-resolution analysis: application to wind speed data. Renew. Energy 92, 202–211 (2016)CrossRefGoogle Scholar
  9. 9.
    Feng, X.; Li, Q.; Zhu, Y.; Hou, J.; Jin, L.; Wang, J.: Artificial neural networks forecasting of PM2.5 pollution using air mass trajectory based geographic model and wavelet transformation. Atmos. Environ. 107, 118–128 (2015)CrossRefGoogle Scholar
  10. 10.
    Qiu, M.; Song, Y.; Akagi, F.: Application of artificial neural network for the prediction of stock market returns: the case of the Japanese stock market. Chaos Solitons Fractals 85, 1–7 (2016)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Moghaddam, A.H.; Moghaddam, M.H.; Esfandyari, M.: Stock market index prediction using artificial neural network. J. Econ. Finance Adm. Sci. 21(41), 89–93 (2016)Google Scholar
  12. 12.
    Rummelhart, D.E.; McClelland, J.L.; Group, P.R.; et al.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1. MIT Press, Cambridge (1986)Google Scholar
  13. 13.
    Plagianakos, V.P.; Sotiropoulos, D.G.; Vrahatis, M.N.: An improved backpropagation method with adaptive learning rate. In: Proceedings of the 2nd International Conference on Circuits, Systems and Computers (1998)Google Scholar
  14. 14.
    Hagan, M.T.; Menhaj, M.B.: Training feedforward networks with the Marquardt algorithm. IEEE Trans. Neural Netw. 5(6), 989–993 (1994)CrossRefGoogle Scholar
  15. 15.
    Riedmiller, M.; Braun, H.: A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: Proceedings of the IEEE International Conference on Neural Networks, vol. 1993, pp. 586–591 (1993)Google Scholar
  16. 16.
    Narayanan, A.; Menneer, T.: Quantum artificial neural network architectures and components. Inf. Sci. 128(3–4), 231–255 (2000)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Fahlman, S.E.; Lebiere, C.: The cascade-correlation learning architecture. In: Advances in Neural Information Processing Systems, pp. 524–532 (1990)Google Scholar
  18. 18.
    Fahlman, S.E.: The recurrent cascade-correlation architecture. In: Advances in Neural Information Processing Systems, pp. 190–196 (1991)Google Scholar
  19. 19.
    Benardos, P.G.; Vosniakos, G.C.: Optimizing feedforward artificial neural network architecture. Eng. Appl. Artif. Intell. 20(3), 365–382 (2007)CrossRefGoogle Scholar
  20. 20.
    Ertuğrul, Ö.F.: A novel type of activation function in artificial neural networks: trained activation function. Neural Netw. 99, 148–157 (2018)CrossRefGoogle Scholar
  21. 21.
    Karlik, B.; Olgac, A.V.: Performance analysis of various activation functions in generalized MLP architectures of neural networks. Int. J. Artif. Intell. Expert Syst. 1(4), 111–122 (2011)Google Scholar
  22. 22.
    Nguyen, D.; Widrow, B.: Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In: 1990 IJCNN International Joint Conference on Neural Networks, pp. 21–26 (1990)Google Scholar
  23. 23.
    Drago, G.P.; Ridella, S.: Statistically controlled activation weight initialization (SCAWI). IEEE Trans. Neural Netw. 3(4), 627–631 (1992)CrossRefGoogle Scholar
  24. 24.
    Kim, Y.K.; Ra, J.B.: Weight value initialization for improving training speed in the backpropagation network. In: [Proceedings] 1991 IEEE International Joint Conference on Neural Networks, pp. 2396–2401 (1991)Google Scholar
  25. 25.
    Bottou, L.Y.: Reconnaissance de la parole par reseaux multi-couches. In: Proceedings of the International Workshop Neural Networks Application, Neuro-Nimes, vol. 88, pp. 197–217 (1988)Google Scholar
  26. 26.
    Thimm, G.; Fiesler, E.: High-order and multilayer perceptron initialization. IEEE Trans. Neural Netw. 8(2), 349–359 (1997)CrossRefGoogle Scholar
  27. 27.
    Pavelka, A.; Procházka, A.: Algorithms for initialization of neural network weights. In: Proceedings of the 12th Annual Conference, MATLAB, pp. 453–459 (2004)Google Scholar
  28. 28.
    Kolen, J.F.; Pollack, J.B.: Back propagation is sensitive to initial conditions. In: Advances in Neural Information Processing Systems, pp. 860–867 (1991)Google Scholar
  29. 29.
    Yam, Y.F.; Leung, C.T.; Tam, P.K.; Siu, W.C.: An independent component analysis based weight initialization method for multilayer perceptrons. Neurocomputing 48(1–4), 807–818 (2002)CrossRefGoogle Scholar
  30. 30.
    Sodhi, S.S.; Chandra, P.: A partially deterministic weight initialization method for SFFANNs. In: 2014 IEEE International Advance Computing Conference (IACC), pp. 1275–1280 (2014)Google Scholar
  31. 31.
    Bhatia, M.P.S.; Chandra, P.: A new weight initialization method for sigmoidal FFANN. J. Intell. Fuzzy Syst. (Preprint), 1–9 (2018)Google Scholar
  32. 32.
    Gorman, R.P.; Sejnowski, T.J.: Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw. 1(1), 75–89 (1988)CrossRefGoogle Scholar
  33. 33.
    Arda, J.; Pilesja, P.; Skidmore, A.: Neural networks, multitemporal landsat thematic mapper data and topographic data to classify forest damages in the Czech Republic. Can. J. Remote Sens. 23(3), 217–229 (1997).  https://doi.org/10.1080/07038992.1997.10855204 CrossRefGoogle Scholar
  34. 34.
    Haykin, S.: Neural Networks: A Comprehensive Foundation, 1st edn. Prentice Hall PTR, Upper Saddle River (1994)zbMATHGoogle Scholar
  35. 35.
    Mittal, A.; Singh, A.P.; Chandra, P.: A Modification to the Nguyen–Widrow Weight Initialization Method (2020)Google Scholar
  36. 36.
    Roffman, D.; Hart, G.; Girardi, M.; Ko, C.J.; Deng, J.: Predicting non-melanoma skin cancer via a multi-parameterized artificial neural network. Sci. Rep. 8(1), 1701 (2018)CrossRefGoogle Scholar
  37. 37.
    Shebani, A.; Iwnicki, S.: Prediction of wheel and rail wear under different contact conditions using artificial neural networks. Wear 406, 173–184 (2018)CrossRefGoogle Scholar
  38. 38.
    Fernando, C.; Banarse, D.; Blundell, C.; Zwols, Y.; Ha, D.; Rusu, A.A.; Pritzel, A.; Wierstra, D.: Pathnet: Evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734 (2017)
  39. 39.
    Wang, L.; Zeng, Y.; Chen, T.: Back propagation neural network with adaptive differential evolution algorithm for time series forecasting. Expert Syst. Appl. 42(2), 855–863 (2015)CrossRefGoogle Scholar
  40. 40.
    Leema, N.; Nehemiah, H.K.; Kannan, A.: Neural network classifier optimization using differential evolution with global information and back propagation algorithm for clinical datasets. Appl. Soft Comput. 49, 834–844 (2016)CrossRefGoogle Scholar
  41. 41.
    Alshahrani, H.; Alzahrani, A.; Alshehri, A.; Alharthi, R.; Fu, H.: Evaluation of gradient descent optimization: using android applications in neural networks. In: 2017 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 1471–1476 (2017)Google Scholar
  42. 42.
    Wanto, A.; Andani, S.R.; Poningsih, P.; Dewi, R.; Lubis, M.R.; Saputra, W.; Kirana, I.O.: Analysis Of Standard Gradient Descent with GD Momentum and Adaptive LR For SPR Prediction (2018)Google Scholar
  43. 43.
    Taheri-Garavand, A.; Meda, V.; Naderloo, L.: Artificial neural network-genetic algorithm modeling for moisture content prediction of savory leaves drying process in different drying conditions. Eng. Agric. Environ. Food 11(4), 232–238 (2018)CrossRefGoogle Scholar
  44. 44.
    Lenka, S.K.; Mohapatra, A.G.: Gradient descent with momentum based neural network pattern classification for the prediction of soil moisture content in precision agriculture. In: 2015 IEEE International Symposium on Nanoelectronic and Information Systems, pp. 63–66 (2015)Google Scholar
  45. 45.
    Nayak, S.; Choudhury, B.B.; Lenka, S.K.: Gradient descent with momentum based backpropagation neural network for selection of industrial robot. In: Proceedings of First International Conference on Information and Communication Technology for Intelligent Systems, vol. 1, pp. 487–496 (2016)Google Scholar
  46. 46.
    Mohanty, S.; Jha, M.K.; Raul, S.K.; Panda, R.K.; Sudheer, K.P.: Using artificial neural network approach for simultaneous forecasting of weekly groundwater levels at multiple sites. Water Resour. Manag. 29(15), 5521–5532 (2015)CrossRefGoogle Scholar
  47. 47.
    Premalatha, N.; Arasu, A.V.: Prediction of solar radiation for solar systems by using ANN models with different back propagation algorithms. J. Appl. Res. Technol. 14(3), 206–214 (2016)CrossRefGoogle Scholar
  48. 48.
    Dua, D.; Graff, C.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2019). http://archive.ics.uci.edu/ml
  49. 49.
    MATLAB: version 9.0.0 (R2016a). The MathWorks Inc., Natick (2016)Google Scholar
  50. 50.
    Kim, T.K.: T test as a parametric statistic. Korean J. Anesthesiol. 68(6), 540 (2015)CrossRefGoogle Scholar

Copyright information

© King Fahd University of Petroleum & Minerals 2019

Authors and Affiliations

  1. 1.Department of Computer EngineeringJamia Millia IslamiaNew DelhiIndia
  2. 2.USICTGuru Gobind Singh Indraprastha University, Sector 16CDwarka, DelhiIndia

Personalised recommendations