Journal of Global Optimization

, Volume 73, Issue 2, pp 239–277 | Cite as

Global optimization issues in deep network regression: an overview

  • Laura PalagiEmail author


The paper presents an overview of global issues in optimization methods for training feedforward neural networks (FNN) in a regression setting. We first recall the learning optimization paradigm for FNN and we briefly discuss global scheme for the joint choice of the network topologies and of the network parameters. The main part of the paper focuses on the core subproblem which is the continuous unconstrained (regularized) weights optimization problem with the aim of reviewing global methods specifically arising both in multi layer perceptron/deep networks and in radial basis networks. We review some recent results on the existence of non-global stationary points of the unconstrained nonlinear problem and the role of determining a global solution in a supervised learning paradigm. Local algorithms that are widespread used to solve the continuous unconstrained problems are addressed with focus on possible improvements to exploit the global properties. Hybrid global methods specifically devised for FNN training optimization problems which embed local algorithms are discussed too.


Supervised learning Deep networks Feedforward neural networks Global optimization Weights optimization Hybrid algorithms 



Many thanks to two anonymous referees who read carefully the paper and gave useful suggestions that allowed to improve substantially the paper. Thanks to Marianna De Santis and to the Ph.D. students at DIAG who gave their comments on a first version of the paper. Finally I wish to thank prof. Luigi Grippo for pleasant and fruitful conversations on optimization topics, not only about ML, since the time of my Ph.D.


  1. 1.
    Abraham, A.: Meta learning evolutionary artificial neural networks. Neurocomputing 56, 1–38 (2004)Google Scholar
  2. 2.
    Adam, S., Magoulas, G., Karras, D., Vrahatis, M.: Bounding the search space for global optimization of neural networks learning error: an interval analysis approach. J. Mach. Learn. Res. 17, 1–40 (2016)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Adamu, A., Maul, T., Bargiela, A.: On training neural networks with transfer function diversity. In: International Conference on Computational Intelligence and Information Technology (CIIT 2013), Elsevier (2013)Google Scholar
  4. 4.
    Amato, S., Apolloni, B., Caporali, G., Madesani, U., Zanaboni, A.: Simulated annealing approach in backpropagation. Neurocomputing 3(5), 207–220 (1991)Google Scholar
  5. 5.
    An, G.: The effects of adding noise during backpropagation training on a generalization performance. Neural Comput. 8(3), 643–674 (1996)Google Scholar
  6. 6.
    Bagirov, A., Rubinov, A., Soukhoroukova, N., Yearwood, J.: Unsupervised and supervised data classification via nonsmooth and global optimization. Top 11(1), 1–75 (2003)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Baldi, P., Hornik, K.: Neural networks and principal component analysis: learning from examples without local minima. Neural Netw. 2(1), 53–58 (1989)Google Scholar
  8. 8.
    Baldi, P., Lu, Z.: Complex-valued autoencoders. Neural Netw. 33, 136–147 (2012)zbMATHGoogle Scholar
  9. 9.
    Baldi, P., Sadowski, P.: The dropout learning algorithm. Artif. Intell. 210, 78–122 (2014)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Barhen, J., Protopopescu, V., Reister, D.: TRUST: a deterministic algorithm for global optimization. Science 276(5315), 1094–1097 (1997)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Bates, D.M., Watts, D.G.: Nonlinear Regression Analysis and Its Applications. Wiley Series in Probability and Statistics. Wiley, Hoboken (2007)Google Scholar
  12. 12.
    Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp. 41–48. ACM (2009)Google Scholar
  13. 13.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)zbMATHGoogle Scholar
  15. 15.
    Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. Optim. Mach. Learn. 2010(1–38), 3 (2011)Google Scholar
  16. 16.
    Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Englewood Cliffs (1989)zbMATHGoogle Scholar
  17. 17.
    Bertsekas, D.P., Tsitsiklis, J.N.: Gradient convergence in gradient methods with errors. SIAM J. Optim. 10(3), 627–642 (2000)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Bertsimas, D., Dunn, J.: Optimal classification trees. Mach. Learn. 106(7), 1039–1082 (2017). MathSciNetzbMATHGoogle Scholar
  19. 19.
    Bertsimas, D., Shioda, R.: Classification and regression via integer optimization. Oper. Res. 55(2), 252–271 (2007)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Bianchini, M., Frasconi, P., Gori, M.: Learning without local minima in radial basis function networks. IEEE Trans. Neural Netw. 6(3), 749–756 (1995)Google Scholar
  21. 21.
    Bishop, C.: Improving the generalization properties of radial basis function neural networks. Neural Comput. 3(4), 579–588 (1991)Google Scholar
  22. 22.
    Bishop, C.: Pattern Recognition and Machine Learning (Information Science and Statistics), 1st edn. 2006. corr. 2nd printing edn (2007)Google Scholar
  23. 23.
    Blum, A., Rivest, R.L.: Training a 3-node neural network is NP-complete. In: Proceedings of the 1st International Conference on Neural Information Processing Systems, pp. 494–501. MIT Press (1988)Google Scholar
  24. 24.
    Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural networks (2015). arXiv preprint arXiv:1505.05424
  25. 25.
    Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS’07, pp. 161–168. Curran Associates Inc., USA (2007).
  26. 26.
    Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Boubezoul, A., Paris, S.: Application of global optimization methods to model and feature selection. Pattern Recognit. 45(10), 3676–3686 (2012)zbMATHGoogle Scholar
  28. 28.
    Branke, J.: Evolutionary algorithms for neural network design and training. In: Proceedings of the First Nordic Workshop on Genetic Algorithms and its Applications, pp. 145–163 (1995)Google Scholar
  29. 29.
    Bravi, L., Piccialli, V., Sciandrone, M.: An optimization-based method for feature ranking in nonlinear regression problems. IEEE Trans. Neural Netw. Learn. Syst. 28(4), 1005–1010 (2017)Google Scholar
  30. 30.
    Bray, A.J., Dean, D.S.: Statistics of critical points of Gaussian fields on large-dimensional spaces. Phys. Rev. Lett. 98(15), 150 201 (2007)Google Scholar
  31. 31.
    Breuel, T.M.: On the convergence of SGD training of neural networks (2015). arXiv preprint arXiv:1508.02790
  32. 32.
    Buchtala, O., Klimek, M., Sick, B.: Evolutionary optimization of radial basis function classifiers for data mining applications. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 35(5), 928–947 (2005)Google Scholar
  33. 33.
    Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)Google Scholar
  34. 34.
    Buzzi, C., Grippo, L., Sciandrone, M.: Convergent decomposition techniques for training RBF neural networks. Neural Comput. 13(8), 1891–1920 (2001)zbMATHGoogle Scholar
  35. 35.
    Carrizosa, E., Martín-Barragán, B., Morales, D.R.: A nested heuristic for parameter tuning in support vector machines. Comput. Oper. Res. 43, 328–334 (2014)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Carrizosa, E., Morales, D.R.: Supervised classification and mathematical optimization. Comput. Oper. Res. 40(1), 150–165 (2013)MathSciNetzbMATHGoogle Scholar
  37. 37.
    Cetin, B., Barhen, J., Burdick, J.: Terminal repeller unconstrained subenergy tunneling ( trust) for fast global optimization. J. Optim. Theory Appl. 77(1), 97–126 (1993)MathSciNetzbMATHGoogle Scholar
  38. 38.
    Cetin, B.C., Burdick, J.W., Barhen, J.: Global descent replaces gradient descent to avoid local minima problem in learning with artificial neural networks. In: IEEE International Conference onNeural Networks, 1993, pp. 836–842. IEEE (1993)Google Scholar
  39. 39.
    Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)Google Scholar
  40. 40.
    Chao, J., Hoshino, M., Kitamura, T., Masuda, T.: A multilayer RBF network and its supervised learning. In: International Joint Conference on Neural Networks, 2001 (IJCNN’01), Proceedings, vol. 3, pp. 1995–2000. IEEE (2001)Google Scholar
  41. 41.
    Chapelle, O., Sindhwani, V., Keerthi, S.S.: Optimization techniques for semi-supervised support vector machines. J. Mach. Learn. Res. 9, 203–233 (2008)zbMATHGoogle Scholar
  42. 42.
    Chen, S., Wu, Y., Luk, B.: Combined genetic algorithm optimization and regularized orthogonal least squares learning for radial basis function networks. IEEE Trans. Neural Netw. 10(5), 1239–1243 (1999)Google Scholar
  43. 43.
    Chiang, H.D., Reddy, C.K.: TRUST-TECH based neural network training. In: International Joint Conference on Neural Networks, 2007. (IJCNN 2007), pp. 90–95. IEEE (2007)Google Scholar
  44. 44.
    Cho, Sy, Chow, T.W.: Training multilayer neural networks using fast global learning algorithm—least-squares and penalized optimization methods. Neurocomputing 25(1), 115–131 (1999)zbMATHGoogle Scholar
  45. 45.
    Choromanska, A., Henaff, M., Mathieu, M., Arous, G.B., LeCun, Y.: The loss surfaces of multilayer networks. In: AISTATS (2015)Google Scholar
  46. 46.
    Choromanska, A., LeCun, Y., Arous, G.B.: Open problem: the landscape of the loss surfaces of multilayer networks. In: COLT, pp. 1756–1760 (2015)Google Scholar
  47. 47.
    Cohen, S., Intrator, N.: Global optimization of RBF networks (2000).
  48. 48.
    Cohen, S., Intrator, N.: A hybrid projection-based and radial basis function architecture: initial values and global optimisation. Pattern Anal. Appl. 5(2), 113–120 (2002)MathSciNetzbMATHGoogle Scholar
  49. 49.
    Dai, Q., Ma, Z., Xie, Q.: A two-phased and ensemble scheme integrated backpropagation algorithm. Appl. Soft Comput. 24, 1124–1135 (2014)Google Scholar
  50. 50.
    Dauphin, Y.N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: Advances in neural information processing systems, pp. 2933–2941 (2014)Google Scholar
  51. 51.
    David, O.E., Greental, I.: Genetic algorithms for evolving deep neural networks. In: Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp. 1451–1452. ACM (2014)Google Scholar
  52. 52.
    Dietterich, T.G.: Ensemble methods in machine learning. In: International workshop on multiple classifier systems, pp. 1–15. Springer (2000)Google Scholar
  53. 53.
    Duarte Silva, A.P.: Optimization approaches to supervised classification. Eur. J. Oper. Res. 261(2), 772–788 (2017)MathSciNetzbMATHGoogle Scholar
  54. 54.
    Duch, W., Jankowski, N.: New neural transfer functions. Appl. Math. Comput. Sci. 7, 639–658 (1997)MathSciNetzbMATHGoogle Scholar
  55. 55.
    Duch, W., Jankowski, N.: Survey of neural transfer functions. Neural Comput. Surv. 2(1), 163–212 (1999)Google Scholar
  56. 56.
    Duch, W., Korczak, J.: Optimization and global minimization methods suitable for neural networks. Neural Comput. Surv. 2, 163–212 (1998)Google Scholar
  57. 57.
    Feng-wen, H., Ai-ping, J.: An improved method of wavelet neural network optimization based on filled function method. In: 16th International Conference on Industrial Engineering and Engineering Management, 2009 (IE&EM’09), pp. 1694–1697. IEEE (2009)Google Scholar
  58. 58.
    Fischetti, M.: Fast training of support vector machines with gaussian kernel. Discrete Optim. 22, 183–194 (2016)MathSciNetzbMATHGoogle Scholar
  59. 59.
    Floudas, C.A.: Deterministic Global Optimization: Theory, Methods and Applications, vol. 37. Springer, Berlin (2013)Google Scholar
  60. 60.
    Fukumizu, K., Amari, Si: Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural Netw. 13(3), 317–327 (2000)Google Scholar
  61. 61.
    Ge, R.: A filled function method for finding a global minimizer of a function of several variables. Math. Program. 46(1–3), 191–204 (1990)MathSciNetzbMATHGoogle Scholar
  62. 62.
    González, J., Rojas, I., Ortega, J., Pomares, H., Fernandez, F.J., Díaz, A.F.: Multiobjective evolutionary optimization of the size, shape, and position parameters of radial basis function networks for function approximation. IEEE Trans. Neural Netw. 14(6), 1478–1495 (2003)Google Scholar
  63. 63.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)zbMATHGoogle Scholar
  64. 64.
    Goodfellow, I.J., Vinyals, O.: Qualitatively characterizing neural network optimization problems. CoRR (2014).
  65. 65.
    Gori, M., Tesi, A.: On the problem of local minima in backpropagation. IEEE Trans. Pattern Anal. Mach. Intell. 14(1), 76–86 (1992)Google Scholar
  66. 66.
    Gorse, D., Shepherd, A.J., Taylor, J.G.: Avoiding local minima by a classical range expansion algorithm. In: ICANN94, pp. 525–528. Springer, London (1994)Google Scholar
  67. 67.
    Gorse, D., Shepherd, A.J., Taylor, J.G.: A classical algorithm for avoiding local minima. In: Proceedings of the World Congress on Neural Networks, pp. 364–369. Citeseer (1994)Google Scholar
  68. 68.
    Gorse, D., Shepherd, A.J., Taylor, J.G.: The new ERA in supervised learning. Neural Netw. 10(2), 343–352 (1997)Google Scholar
  69. 69.
    Graves, A.: Practical variational inference for neural networks. In: Advances in Neural Information Processing Systems, pp. 2348–2356 (2011)Google Scholar
  70. 70.
    Grippo, L.: Convergent on-line algorithms for supervised learning in neural networks. IEEE Trans. Neural Netw. 11(6), 1284–1299 (2000)Google Scholar
  71. 71.
    Grippo, L., Manno, A., Sciandrone, M.: Decomposition techniques for multilayer perceptron training. IEEE Trans. Neural Netw. Learn. Syst. 27(11), 2146–2159 (2016)MathSciNetGoogle Scholar
  72. 72.
    Grippo, L., Sciandrone, M.: Globally convergent block-coordinate techniques for unconstrained optimization. Optim. Methods Softw. 10(4), 587–637 (1999)MathSciNetzbMATHGoogle Scholar
  73. 73.
    Grippo, L., Sciandrone, M.: Nonmonotone globalization techniques for the Barzilai–Borwein gradient method. Comput. Optim. Appl. 23(2), 143–169 (2002)MathSciNetzbMATHGoogle Scholar
  74. 74.
    Györfi, L., Kohler, M., Krzyzak, A., Walk, H.: A Distribution-free Theory of Nonparametric Regression. Springer, Berlin (2006)zbMATHGoogle Scholar
  75. 75.
    Hamey, L.G.: XOR has no local minima: a case study in neural network error surface analysis. Neural Netw. 11(4), 669–681 (1998)Google Scholar
  76. 76.
    Hamm, L., Brorsen, B.W., Hagan, M.T.: Comparison of stochastic global optimization methods to estimate neural network weights. Neural Process. Lett. 26(3), 145–158 (2007)Google Scholar
  77. 77.
    Haykin, S.: Neural Networks and Learning Machines, vol. 3. Pearson, Upper Saddle River (2009)Google Scholar
  78. 78.
    Hochreiter, S., Schmidhuber, J.: Flat minima. Neural Comput. 9(1), 1–42 (1997)zbMATHGoogle Scholar
  79. 79.
    Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches. Springer, Berlin (2013)zbMATHGoogle Scholar
  80. 80.
    Huang, G., Huang, G.B., Song, S., You, K.: Trends in extreme learning machines: a review. Neural Netw. 61, 32–48 (2015)zbMATHGoogle Scholar
  81. 81.
    Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE International Joint Conference on Neural Networks, 2004. Proceedings, vol. 2, pp. 985–990. IEEE (2004)Google Scholar
  82. 82.
    Hui, L.C.K., Lam, K.Y., Chea, C.W.: Global optimisation in neural network training. Neural Comput. Appl. 5(1), 58–64 (1997)Google Scholar
  83. 83.
    Jin, Y., Sendhoff, B.: Pareto-based multiobjective machine learning: an overview and case studies. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 38(3), 397–415 (2008)Google Scholar
  84. 84.
    Kawaguchi, K.: Deep learning without poor local minima. In: Advances In Neural Information Processing Systems, pp. 586–594 (2016)Google Scholar
  85. 85.
    Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. In: ICLR 2017 (2016)Google Scholar
  86. 86.
    Lang, K.: Learning to tell two spiral apart. In: Proceedings of the 1988 Connectionist Models Summer School, pp. 52–59 (1989)Google Scholar
  87. 87.
    Laurent, T., von Brecht, J.: The multilinear structure of ReLU networks (2017). arXiv preprint arXiv:1712.10132
  88. 88.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)Google Scholar
  89. 89.
    LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.R.: Efficient backprop. In: Neural networks: Tricks of the trade, pp. 9–48. Springer (2012)Google Scholar
  90. 90.
    Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only converges to minimizers. In: Conference on Learning Theory, pp. 1246–1257 (2016)Google Scholar
  91. 91.
    Lee, J.S., Park, C.H.: Global optimization of radial basis function networks by hybrid simulated annealing. Neural Netw. World 20(4), 519 (2010)Google Scholar
  92. 92.
    Li, H.R., Li, H.L.: A global optimization algorithm based on filled-function for neural networks. J. Northeast. Univ. Nat. Sci. 28(9), 1247 (2007)MathSciNetzbMATHGoogle Scholar
  93. 93.
    Lin, S.W., Tseng, T.Y., Chou, S.Y., Chen, S.C.: A simulated-annealing-based approach for simultaneous parameter optimization and feature selection of back-propagation networks. Expert Syst. Appl. 34(2), 1491–1499 (2008)Google Scholar
  94. 94.
    Lisboa, P., Perantonis, S.: Complete solution of the local minima in the XOR problem. Network: Comput. Neural Syst. 2(1), 119–124 (1991)MathSciNetzbMATHGoogle Scholar
  95. 95.
    Liu, H., Wang, Y., Guan, S., Liu, X.: A new filled function method for unconstrained global optimization. Int. J. Comput. Math. 94(12), 2283–2296 (2017)MathSciNetzbMATHGoogle Scholar
  96. 96.
    Locatelli, M., Schoen, F.: Global optimization: theory, algorithms, and applications. Society for Industrial and Applied Mathematics, Philadelphia, PA (2013).
  97. 97.
    Magoulas, G., Plagianakos, V., Vrahatis, M.: Hybrid methods using evolutionary algorithms for on-line training. In: International Joint Conference on Neural Networks, 2001 (IJCNN’01) Proceedings, vol. 3, pp. 2218–2223. IEEE (2001)Google Scholar
  98. 98.
    Martin-Guerreo, J., Gómez-Chova, L., Calpe-Maravilla, J., Camps-Valls, G., Soria-Olivas, E., Moreno, J.: A soft approach to ERA algorithm for hyperspectral image classification. In: Proceedings of the 3rd International Symposium on Image and Signal Processing and Analysis, 2003 (ISPA 2003), vol. 2, pp. 761–765. IEEE (2003)Google Scholar
  99. 99.
    Neelakantan, A., Vilnis, L., Le, Q.V., Sutskever, I., Kaiser, L., Kurach, K., Martens, J.: Adding gradient noise improves learning for very deep networks (2015). arXiv preprint arXiv:1511.06807
  100. 100.
    Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(o(1/k^2)\). Sov. Math. Doklady 27(2), 372–376 (1983)zbMATHGoogle Scholar
  101. 101.
    Nguyen, Q., Hein, M.: The loss surface and expressivity of deep convolutional neural networks (2017). arXiv preprint arXiv:1710.10928
  102. 102.
    Nguyen, Q., Hein, M.: The loss surface of deep and wide neural networks (2017). arXiv preprint arXiv:1704.08045
  103. 103.
    Ojha, V.K., Abraham, A., Snášel, V.: Metaheuristic design of feedforward neural networks: a review of two decades of research. Eng. Appl. Artif. Intell. 60, 97–116 (2017)Google Scholar
  104. 104.
    Palmes, P.P., Hayasaka, T., Usui, S.: Mutation-based genetic neural network. IEEE Trans. Neural Netw. 16(3), 587–600 (2005)Google Scholar
  105. 105.
    Peng, C.C., Magoulas, G.D.: Adaptive nonmonotone conjugate gradient training algorithm for recurrent neural networks. In: 19th IEEE International Conference on Tools with Artificial Intelligence, 2007 (ICTAI 2007), vol. 2, pp. 374–381. IEEE (2007)Google Scholar
  106. 106.
    Peng, C.C., Magoulas, G.D.: Nonmonotone Levenberg–Marquardt training of recurrent neural architectures for processing symbolic sequences. Neural Comput. Appl. 20(6), 897–908 (2011)Google Scholar
  107. 107.
    Piccialli, V., Sciandrone, M.: Nonlinear optimization and support vector machines. 4OR 16(2), 111–149 (2018)MathSciNetzbMATHGoogle Scholar
  108. 108.
    Pintér, J.D.: Calibrating artificial neural networks by global optimization. Expert Syst. Appl. 39(1), 25–32 (2012)Google Scholar
  109. 109.
    Plagianakos, V., Magoulas, G., Vrahatis, M.: Learning in multilayer perceptrons using global optimization strategies. Nonlinear Anal. Theory Methods Appl. 47(5), 3431–3436 (2001)MathSciNetzbMATHGoogle Scholar
  110. 110.
    Plagianakos, V., Magoulas, G., Vrahatis, M.: Improved learning of neural nets through global search. In: Global Optimization, pp. 361–388. Springer (2006)Google Scholar
  111. 111.
    Plagianakos, V.P., Magoulas, G.D., Vrahatis, M.N.: Deterministic nonmonotone strategies for effective training of multilayer perceptrons. IEEE Transactions on Neural Networks 13(6), 1268–1284 (2002)Google Scholar
  112. 112.
    Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1497 (1990)zbMATHGoogle Scholar
  113. 113.
    Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)Google Scholar
  114. 114.
    Prieto, A., Prieto, B., Ortigosa, E.M., Ros, E., Pelayo, F., Ortega, J., Rojas, I.: Neural networks: an overview of early research, current frameworks and new challenges. Neurocomputing 214, 242–268 (2016)Google Scholar
  115. 115.
    Rere, L.R., Fanany, M.I., Arymurthy, A.M.: Simulated annealing algorithm for deep learning. Proc. Comput. Sci. 72, 137–144 (2015)Google Scholar
  116. 116.
    Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)MathSciNetzbMATHGoogle Scholar
  117. 117.
    RoyChowdhury, P., Singh, Y.P., Chansarkar, R.: Dynamic tunneling technique for efficient training of multilayer perceptrons. IEEE Trans. Neural Netw. 10(1), 48–55 (1999)Google Scholar
  118. 118.
    Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric regression. In: Cambridge Series in Statistical and Probabilistic mathematics, vol. 12. Mathematical Reviews (MathSciNet): MR1998720. Cambridge Univ. Press, Cambridge (2003)Google Scholar
  119. 119.
    Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric regression during 2003–2007. Electron. J. Stat. 3, 1193 (2009)MathSciNetzbMATHGoogle Scholar
  120. 120.
    Saad, D.: On-Line Learning in Neural Networks, vol. 17. Cambridge University Press, Cambridge (2009)zbMATHGoogle Scholar
  121. 121.
    Scardapane, S., Wang, D.: Randomness in neural networks: an overview. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 7(2), 1200 (2017)Google Scholar
  122. 122.
    Schaffer, J.D., Whitley, D., Eshelman, L.J.: Combinations of genetic algorithms and neural networks: a survey of the state of the art. In: International Workshop on Combinations of Genetic Algorithms and Neural Networks, 1992 (COGANN-92), pp. 1–37. IEEE (1992)Google Scholar
  123. 123.
    Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)Google Scholar
  124. 124.
    Schwenker, F., Kestler, H.A., Palm, G.: Three learning phases for radial-basis-function networks. Neural Netw. 14(4), 439–458 (2001)zbMATHGoogle Scholar
  125. 125.
    Sexton, R.S., Dorsey, R.E., Johnson, J.D.: Toward global optimization of neural networks: a comparison of the genetic algorithm and backpropagation. Decis. Support Syst. 22(2), 171–185 (1998)Google Scholar
  126. 126.
    Sexton, R.S., Dorsey, R.E., Johnson, J.D.: Optimization of neural networks: a comparative analysis of the genetic algorithm and simulated annealing. Eur. J. Oper. Res. 114(3), 589–601 (1999)zbMATHGoogle Scholar
  127. 127.
    Shang, Y., Wah, B.W.: Global optimization for neural network training. Computer 29(3), 45–54 (1996)Google Scholar
  128. 128.
    Šíma, J.: Training a single sigmoidal neuron is hard. Neural Comput. 14(11), 2709–2728 (2002)zbMATHGoogle Scholar
  129. 129.
    Soudry, D., Carmon, Y.: No bad local minima: data independent training error guarantees for multilayer neural networks (2016). arXiv preprint arXiv:1605.08361
  130. 130.
    Sprinkhuizen-Kuyper, I.G., Boers, E.J.: The error surface of the 2-2-1 XOR network: The finite stationary points. Neural Netw. 11(4), 683–690 (1998)Google Scholar
  131. 131.
    Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  132. 132.
    Steijvers, M., Grünwald, P.: A recurrent network that performs a context-sensitive prediction task. In: Proceedings of the 18th Annual Conference of the Cognitive Science Society, pp. 335–339 (1996)Google Scholar
  133. 133.
    Sutskever, I., Martens, J., Dahl, G.E., Hinton, G.E.: On the importance of initialization and momentum in deep learning. ICML 3(28), 1139–1147 (2013)Google Scholar
  134. 134.
    Swirszcz, G., Czarnecki, W.M., Pascanu, R.: Local minima in training of deep networks. CoRR (2016). arXiv:1611.06310v1
  135. 135.
    Teboulle, M.: A unified continuous optimization framework for center-based clustering methods. J. Mach. Learn. Res. 8, 65–102 (2007)MathSciNetzbMATHGoogle Scholar
  136. 136.
    Teo, C.H., Smola, A., Vishwanathan, S., Le, Q.V.: A scalable modular convex solver for regularized risk minimization. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 727–736. ACM (2007)Google Scholar
  137. 137.
    Tirumala, S.S., Ali, S., Ramesh, C.P.: Evolving deep neural networks: A new prospect. In: 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), 2016, pp. 69–74. IEEE (2016)Google Scholar
  138. 138.
    Toh, K.A.: Deterministic global optimization for FNN training. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 33(6), 977–983 (2003)Google Scholar
  139. 139.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (2013)zbMATHGoogle Scholar
  140. 140.
    Voglis, C., Lagaris, I.: A global optimization approach to neural network training. Neural Parallel Sci. Comput. 14(2), 231 (2006)MathSciNetzbMATHGoogle Scholar
  141. 141.
    Voglis, C., Lagaris, I.E.: Towards ideal multistart: a stochastic approach for locating the minima of a continuous function inside a bounded domain. Appl. Math. Comput. 213(1), 216–229 (2009)MathSciNetzbMATHGoogle Scholar
  142. 142.
    Wang, D.: Editorial: Randomized algorithms for training neural networks. Inf. Sci. 364–365, 126–128 (2016)Google Scholar
  143. 143.
    Werbos, P.J.: Supervised learning: Can it escape its local minimum? In: Theoretical Advances in Neural Computation and Learning, pp. 449–461. Springer (1994)Google Scholar
  144. 144.
    Yeung, D.S., Li, J.C., Ng, W.W.Y., Chan, P.P.K.: Mlpnn training via a multiobjective optimization of training error and stochastic sensitivity. IEEE Trans. Neural Netw. Learn. Syst. 27(5), 978–992 (2016). MathSciNetGoogle Scholar
  145. 145.
    Yu, W., Zhuang, F., He, Q., Shi, Z.: Learning deep representations via extreme learning machines. Neurocomputing 149, 308–315 (2015)Google Scholar
  146. 146.
    Zhang, J.R., Zhang, J., Lok, T.M., Lyu, M.R.: A hybrid particle swarm optimization-back-propagation algorithm for feedforward neural network training. Appl. Math. Comput. 185(2), 1026–1037 (2007)zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Dip. di Ingegneria informatica automatica e gestionale A. RubertiSapienza - University of RomeRomeItaly

Personalised recommendations