The Study of Architecture MLP with Linear Neurons in Order to Eliminate the “vanishing Gradient” Problem

  • Janusz Kolbusz
  • Pawel RozyckiEmail author
  • Bogdan M. Wilamowski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10245)


Research in deep neural networks are becoming popular in artificial intelligence. Main reason for training difficulties is the problem of vanishing gradients while number of layers increases. While such networks are very powerful they are difficult in training. The paper discusses capabilities of different neural network architectures and presents the proposition of new multilayer architecture with additional linear neurons, that is much easier to train that traditional MLP network and reduces effect of vanishing gradients. Efficiency of suggested approach has been confirmed by several exeriments.


Deep neural networks Vanishing gradient Nonlinearity 


  1. 1.
    Larochelle, H., et al.: Exploring strategies for training deep neural networks. J. Mach. Learn. Res. 10, 1–40 (2009)zbMATHGoogle Scholar
  2. 2.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  3. 3.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  4. 4.
    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  5. 5.
    Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)CrossRefGoogle Scholar
  6. 6.
    Wilamowski, B.M., Bo, W., Korniak, J.: Big data and deep learning. In: 2016 IEEE 20th Jubilee International Conference on Intelligent Engineering Systems (INES). IEEE (2016)Google Scholar
  7. 7.
    Wilamowski, B.M., Korniak, J.: Learning architectures with enhanced capabilities and easier training. In: 2015 IEEE 19th International Conference on Intelligent Engineering Systems (INES). IEEE (2015)Google Scholar
  8. 8.
    Rozycki, P., Kolbusz, J., Wilamowski, B.M.: Estimation of deep neural networks capabilities based on a trigonometric approach. In: IEEE 20th International Conference on Intelligent Engineering Systems (INES 2016), Budapest, pp. 30–2, June 2016Google Scholar
  9. 9.
    Wilamowski, B.M., Yu, H.: Neural network learning without backpropagation. IEEE Trans. Neural Networks 21(11), 1793–1803 (2010)CrossRefGoogle Scholar
  10. 10.
    Hunter, D., Hao, Y., Pukish, M.S., Kolbusz, J., Wilamowski, B.M.: Selection of proper neural network sizes and architectures A comparative study. IEEE Trans. Industr. Inf. 8, 228–240 (2012)CrossRefGoogle Scholar
  11. 11.
    Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Unc. Fuzz. Knowl. Based Syst. 06, 107 (1998)CrossRefzbMATHGoogle Scholar
  12. 12.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS (2010)Google Scholar
  13. 13.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV (2015)Google Scholar
  14. 14.
    LeCun, Y., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient backProp. In: Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 1524, pp. 9–50. Springer, Heidelberg (1998). doi: 10.1007/3-540-49430-8_2 CrossRefGoogle Scholar
  15. 15.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)Google Scholar
  16. 16.
    He, K., J. Sun, J.: Convolutional neural networks at constrained time cost. In: CVPR (2015)Google Scholar
  17. 17.
    Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. arXiv preprint arxiv:1505.00387 (2015)
  18. 18.
    Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is dificult. IEEE Trans. Neural Networks 5(2), 157–166 (1994)CrossRefGoogle Scholar
  19. 19.
    Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. arXiv preprint arxiv:1409.5185 (2014)
  20. 20.
    Rozycki, P., Kolbusz, J., Korostenskyi, R., Wilamowski, B.M.: Estimation of deep neural networks capabilities using polynomial approach. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2016. LNCS (LNAI), vol. 9692, pp. 136–147. Springer, Cham (2016). doi: 10.1007/978-3-319-39378-0_13 Google Scholar
  21. 21.
    Wilamowski, B.M., Yu, H.: Improved computation for levenberg marquardt training. IEEE Trans. Neural Networks 21(6), 930–937 (2010)CrossRefGoogle Scholar
  22. 22.
    Rozycki, P., Kolbusz, J., Wilamowski, B.M.: Dedicated deep neural network architectures and methods for their training. In: IEEE 19th International Conference on Intelligent Engineering Systems (INES 2015), Bratislava, pp. 73–78, 3–5 September 2015Google Scholar
  23. 23.
    Hunter, D.: Utilizing Dual Neural Networks as a Tool for Training, Optimization, and Architecture Conversion. Ph.D. thesis, Auburn University (2013)Google Scholar
  24. 24.
    Wilamowski, B.M., Yu, H.: NNT - Neural Networks Trainer.

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Janusz Kolbusz
    • 1
  • Pawel Rozycki
    • 1
    Email author
  • Bogdan M. Wilamowski
    • 2
  1. 1.University of Information Technology and Management in RzeszowRzeszowPoland
  2. 2.Auburn UniversityAuburnUSA

Personalised recommendations