Abstract
Learning in artificial neural networks is usually based on local minimization methods which have no mechanism that allows them to escape the influence of an undesired local minimum. This chapter presents strategies for developing globally convergent modifications of local search methods and investigates the use of popular global search methods in neural network learning. The proposed methods tend to lead to desirable weight configurations and allow the network to learn the entire training set, and, in that sense, they improve the efficiency of the learning process. Simulation experiments on some notorious for their local minima learning problems are presented and an extensive comparison of several learning algorithms is provided.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
N. Ampazis and S.J. Perantonis, (2002). Two Highly Efficient Second Order Algorithms for Training Feedforward Networks, IEEE Transactions on Neural Networks, 13, 1064–1074.
L. Armijo, (1966). Minimization of Functions Having Lipschitz-continuous First Partial Derivatives, Pacific Journal of Mathematics, 16, 1–3.
J. Barzilai and J.M. Borwein, (1988). Two Point Step Size Gradient Methods, IMA Journal of Numerical Analysis, 8, 141–148.
R. Battiti, (1989). Accelerated Backpropagation Learning: Two Optimization Methods, Complex Systems, 3, 331–342.
R. Battiti, (1992). First-and Second—order Methods for Learning: Between Steepest Descent and Newton’s Method, Neural Computation, 4, 141–166.
D.P. Bertsekas, (1995). Nonlinear Programming, Belmont, MA: Athena Scientific.
E.K. Blum, (1989). Approximation of Boolean Functions by Sig-moidal Networks: Part I: XOR and Other Two Variable Functions, Neural Computation, 1, 532–540.
M. Burton Jr. and G.J. Mpitsos, (1992). Event Dependent Control of Noise Enhances Learning in Neural Networks, Neural Networks, 5, 627–637.
L.W. Chan and F. Fallside, (1987). An Adaptive Training Algorithm for Back-propagation Networks, Computers Speech and Language, 2, 205–218.
A. Corana, M. Marchesi, C. Martini, and S. Ridella, (1987). Minimizing Multimodal Functions of Continuous Variables with the Simulated Annealing Algorithm, ACM Transactions on Mathematical Software, 13, 262–280.
J.E. Dennis and R.B. Schnabel, (1983). Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Englewood Cliffs, Prentice-Hall.
R.C. Eberhart and Y.H. Shi, (1998). Evolving Artificial Neural Networks, Proceedings International Conference on Neural Networks and Brain, Beijing, P.R. China.
R.C. Eberhart, P.K. Simpson and R.W. Dobbins (1996). Computational Intelligence PC Tools, Academic Press Professional, Boston, MA.
S.E. Fahlman (1988). Faster-learning Variations on Back-propagation: An Empirical Study, D.S. Touretzky, G.E. Hinton and T.J. Sejnowski (Eds.), Proceedings of the 1988 Connectionist Models Summer School, 38–51, San Mateo, Morgan Koufmann.
A.V. Fiacco and G.P. McCormick (1990). Nonlinear Programming: Sequential Unconstrained Minimization Techniques, Philadelphia, SIAM.
M. Gori and A. Tesi, (1992). On the Problem of Local Minima in Backpropagation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 76–85.
L. Grippo, F. Lampariello, and S. Lucidi, (1986). A Nonmonotone Line Search Technique for Newton’s Method, SIAM Journal on Numerical Analysis, 23, 707–716.
M.T. Hagan and M. Menhaj, (1994). Training Feedforward Networks with the Marquardt Algorithm, IEEE Transactions on Neural Networks, 5, 989–993.
J.H. Holland, (1975). Adaptation in Neural and Artificial Systems, University of Michigan Press.
C. Houck, J. Joines, and M. Kay, (1995). A Genetic Algorithm for Function Optimization: A Matlab Implementation, NCSU-IE TR, 95–09.
R.A. Jacobs, (1988). Increased Rates of Convergence Through Learning Rate Adaptation, Neural Networks, 1, 295–307.
J. Kennedy and R.C. Eberhart, (1995). Particle Swarm Optimization, Proceedings IEEE International Conference on Neural Networks, Piscataway, NJ, IV: 1942–1948.
J. Kennedy and R.C. Eberhart, (2001). Swarm Intelligence, Morgan Kaufmann Publishers.
S. Kirkpatrick, CD. Gelatt Jr., and M.P. Vecchi, (1983). Optimization by Simulated Annealing, Science, 220, 671–680.
S. Kollias and D. Anastassiou, (1989). An Adaptive Least Squares Algorithm for the Efficient Training of Multilayered Networks, IEEE Transactions on Circuits Systems, 36, 1092–1101.
Y. Lee, S.H. Oh, and M. Kim, (1993). An Analysis of Premature Saturation in Backpropagation Learning, Neural Networks, 6, 719–728.
G.D. Magoulas, V.P. Plagianakos, and M.N. Vrahatis, (2002). Globally Convergent Algorithms with Local Learning Rates, IEEE Transactions Neural Networks, 13, 774–779.
G.D. Magoulas, V.P. Plagianakos, and M.N. Vrahatis, (2004). Neural Network-based Colonoscopic Diagnosis Using On-line Learning and Differential Evolution, Applied Soft Computing, 4, 369–379.
G.D. Magoulas, M.N. Vrahatis, and G.S. Androulakis, (1997). Effective Back—propagation with Variable Stepsize, Neural Networks, 10, 69–82.
G.D. Magoulas, M.N. Vrahatis, and G.S. Androulakis, (1997). On the Alleviation of Local Minima in Backpropagation, Nonlinear Analysis, Theory, Methods and Applications, 30, 4545–4550.
G.D. Magoulas, M.N. Vrahatis, and G.S. Androulakis, (1999). Improving the Convergence of the Back-propagation Algorithm Using Learning Rate Adaptation Methods, Neural Computation, 11, 1769–1796.
G.D. Magoulas, M.N. Vrahatis, T.N. Grapsa, and G.S. Androulakis, (1997). Neural Network Supervised Training Based on a Dimension Reducing Method, Mathematics of Neural Networks, Models, Algorithms and Applications, S.W. Ellacott, J.C. Mason, I.J. Anderson Eds., Kluwer Academic Publishers, Boston, 245–249.
Z. Michalewicz, (1996). Genetic Algorithms + Data Structures = Evolution Programs, Springer.
M.F. Möller, (1993). A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning, Neural Networks, 6, 525–533.
J. Nocedal, (1992). Theory of Algorithms for Unconstrained Optimization, Acta Numerica, 1, 199–242.
J.M. Ortega and W.C. Rheinboldt, (1970). Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York.
K.E. Parsopoulos, V.P. Plagianakos, G.D. Magoulas and M.N. Vrahatis, (2001). Objective Function “Stretching” to Alleviate Convergence to Local Minima, Nonlinear Analysis, Theory, Methods and Applications, 47, 3419–3424.
K.E. Parsopoulos and M.N. Vrahatis, (2002). Recent Approaches to Global Optimization Problems Through Particle Swarm Optimization, Natural Computing, 1, 235–306.
K.E. Parsopoulos and M.N. Vrahatis, (2004). On the Computation of All Global Minimizers Through Particle Swarm Optimization, IEEE Transactions on Evolutionary Computation, 8, 211–224.
N.G. Pavlidis, K.E. Parsopoulos and M.N. Vrahatis, (2004). Computing Nash Equilibria Through Computational Intelligence Methods, Journal of Computational and Applied Mathematics, in press.
V.P. Plagianakos, G.D. Magoulas and M.N. Vrahatis, (2002). Deterministic Nonmonotone Strategies for Effective Training of Multi—Layer Perceptrons, IEEE Transactions on Neural Networks, 13, 1268–1284.
V.P. Plagianakos, D.G. Sotiropoulos, and M.N. Vrahatis, (1998). Automatic Adaptation of Learning Rate for Backpropagation Neural Networks, N.E. Mastorakis, (Ed.), Recent Advances in Circuits and Systems 337–341, Singapore, World Scientific.
V.P. Plagianakos and M.N. Vrahatis, (1999). Neural Network Training with Constrained Integer Weights, Proceedings of Congress on Evolutionary Computation (CEC’99), 2007–2013, Washington D.C.
V.P. Plagianakos and M.N. Vrahatis, (2000). Training Neural Networks with Threshold Activation Functions and Constrained Integer Weights, Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN’2000), Vol. 5, pp.161–166, Como, Italy.
V.P. Plagianakos and M.N. Vrahatis, (2002). Parallel Evolutionary Training Algorithms for “Hardware-Friendly” Neural Networks, Natural Computing, 1, 307–322.
V.P. Plagianakos, M.N. Vrahatis, and CD. Magoulas (1999). Nonmonotone Methods for Backpropagation Training with Adaptive Learning Rate, Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN’99), Vol. 3, pp. 1762–1767, Washington D.C.
E. Polak, (1997). Optimization: Algorithms and Consistent Approximations, New York, Springer-Verlag.
M. Raydan, (1997). The Barzilai and Borwein Gradient Method for the Large Scale Unconstrained Minimization Problem, SIAM Journal on Optimization, 7, 26–33.
A.K. Rigler, J.M. Irvine, and T.P. Vogl, (1991). Rescaling of Variables in Backpropagation Learning, Neural Networks, 4, 225–229.
D.E. Rumelhart, G.E. Hinton, and R.J. Williams, (1986). Learning Internal Representations by Error Propagation, Parallel Distributed Processing: Explorations in the Micro structure of Cognition 1, D.E. Rumelhart, J.L. McClelland Eds., MIT Press, 318–362.
S. Saarinen, R. Bramley, and G. Cybenko, (1993). Ill-conditioning in Neural Network Training Problems, SIAM Journal on Scientific Computing, 14, 693–714.
F. Silva and L. Almeida, (1990). Acceleration Techniques for the Back-propagation Algorithm, Lecture Notes in Computer Science, 412, 110–119, Berlin, Springer-Verlag.
R. Storn and K. Price, (1997). Differential Evolution — A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces, Journal of Global Optimization, 11, 341–359.
P.P. Van der Smagt, (1994). Minimisation Methods for Training Feedforward Neural Networks, Neural Networks, 7, 1–11.
T.P. Vogl, J.K. Mangis, A.K. Rigler, W.T. Zink, and D.L. Alkon, (1988). Accelerating the Convergence of the Back-propagation Method, Biological Cybernetics, 59, 257–263.
M.N. Vrahatis, G.S. Androulakis, J.N. Lambrinos and G.D. Magoulas, (2000). A Class of Gradient Unconstrained Minimization Algorithms with Adaptive Stepsize, Journal of Computational and Applied Mathematics, 114, 367–386.
M.N. Vrahatis, G.D. Magoulas and V.P. Plagianakos, (2000). Globally Convergent Modification of the Quickprop Method, Neural Processing Letters, 12, 159–169.
M.N. Vrahatis, G.D. Magoulas and V.P. Plagianakos, (2003). Prom Linear to Nonlinear Iterative Methods, Applied Numerical Mathematics, 45, 59–77.
S.T. Weslstead, (1994). Neural Network and Fuzzy Logic Applications in C/C++, Wiley.
X.-H. Yu, G.-A. Chen, (1995). On the Local Minima Free Condition of Backpropagation Learning, IEEE Transactions on Neural Networks, 6, 1300–1303.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Plagianakos, V.P., Magoulas, G.D., Vrahatis, M.N. (2006). Improved Learning of Neural Nets Through Global Search. In: Pintér, J.D. (eds) Global Optimization. Nonconvex Optimization and Its Applications, vol 85. Springer, Boston, MA . https://doi.org/10.1007/0-387-30927-6_15
Download citation
DOI: https://doi.org/10.1007/0-387-30927-6_15
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30408-3
Online ISBN: 978-0-387-30927-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)