Gradient-only approaches to avoid spurious local minima in unconstrained optimization

Wilke, Daniel Nicolas; Kok, Schalk; Snyman, Johannes Arnoldus; Groenwold, Albert A.

doi:10.1007/s11081-011-9178-7

Gradient-only approaches to avoid spurious local minima in unconstrained optimization

Published: 26 November 2011

Volume 14, pages 275–304, (2013)
Cite this article

Optimization and Engineering Aims and scope Submit manuscript

Daniel Nicolas Wilke¹,
Schalk Kok²,
Johannes Arnoldus Snyman¹ &
…
Albert A. Groenwold³

654 Accesses
4 Citations
Explore all metrics

Abstract

We reflect on some theoretical aspects of gradient-only optimization for the unconstrained optimization of objective functions containing non-physical step or jump discontinuities. This kind of discontinuity arises when the optimization problem is based on the solutions of systems of partial differential equations, in combination with variable discretization techniques (e.g. remeshing in spatial domains, and/or variable time stepping in temporal domains). These discontinuities, which may cause local minima, are artifacts of the numerical strategies used and should not influence the solution to the optimization problem. Although the discontinuities imply that the gradient field is not defined everywhere, the gradient field associated with the computational scheme can nevertheless be computed everywhere; this field is denoted the associated gradient field.

We demonstrate that it is possible to overcome attraction to the local minima if only associated gradient information is used. Various gradient-only algorithmic options are discussed. A salient feature of our approach is that variable discretization strategies, so important in the numerical solution of partial differential equations, can be combined with efficient local optimization algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Gradient-Based Globalization Strategy for the Newton Method

Inexact Reduced Gradient Methods in Nonconvex Optimization

Article 19 October 2023

Discrete Gradient Methods

References

Allaire G, Jouve F, Toader A-M (2004) Structural optimization using sensitivity analysis and a level-set method. J Comput Phys 194(1):363–393
Article MathSciNet MATH Google Scholar
Bazaraa MS, Sherali HD, Shetty CM (1993) Nonlinear programming—theory and algorithms, 2nd edn. Wiley, New York
MATH Google Scholar
Berberian SK (1994) A first course in real analysis. Springer, New York
Book MATH Google Scholar
Brandstatter BR, Ring W, Magele C, Richter KR (1998) Shape design with great geometrical deformations using continuously moving finite element nodes. IEEE Trans Magn 34(5):2877–2880
Article Google Scholar
Conn AR, Mongeau M (1998) Discontinuous piecewise linear optimization. Math Program 80:315–380
MathSciNet MATH Google Scholar
Cook RD, Malkus DS, Plesha ME, Witt RJ (2002) Concepts and applications of finite element analysis, 4th edn. Wiley, New York
Google Scholar
Garcia MJ, Gonzalez CA (2004) Shape optimisation of continuum structures via evolution strategies and fixed grid finite element analysis. Struct Multidiscip Optim V26(1):92–98
Article MathSciNet Google Scholar
Groenwold AA, Etman LFP, Snyman JA, Rooda JE (2007) Incomplete series expansion for function approximation. Struct Multidiscip Optim 34:21–40
Article MathSciNet Google Scholar
Kodiyalam S, Thanedar PB (1993) Some practical aspects of shape optimization and its influence on intermediate mesh refinement. Finite Elem Anal Des 15(2):125–133
Article Google Scholar
Kroshko D (2006) OpenOpt—Free GNU GPL2 MATLAB/Octave optimization toolbox, version 0.36, http://openopt.org
Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn. in the series Operation research and financial engineering. Springer, Berlin
MATH Google Scholar
Olhoff N, Rasmussen J, Lund E (1993) A method of exact numerical differentiation for error elimination in finite element based semi-analytical shape sensitivity analysis. Mech Struct Mach 21:1–66
Article MathSciNet Google Scholar
Peressini A, Sullivan F, Uhl J (1988) The mathematics of nonlinear programming. Springer, New York
Book MATH Google Scholar
Rardin RL (1998) Optimization in operations research. Prentice Hall, Upper Saddle River
Google Scholar
Schleupen A, Maute K, Ramm E (2000) Adaptive FE-procedures in shape optimization. Struct Multidiscip Optim 4:282–302
Article Google Scholar
Shor NZ, Kiwiel KC, Ruszczyński A (1985) Minimization methods for non-differentiable functions. Springer, New York
Book MATH Google Scholar
Snyman JA (2005a) A gradient-only line search method for the conjugate gradient method applied to constrained optimization problems with severe noise in the objective function. Int J Numer Methods Eng 62(1):72–82
Article MathSciNet MATH Google Scholar
Snyman JA (2005b) Practical mathematical optimization: an introduction to basic optimization theory and classical and new gradient-based algorithms, 2nd edn. Applied optimization, vol. 97. Springer, New York
Google Scholar
Snyman JA, Hay AM (2001) The spherical quadratic steepest descent (sqsd) method for unconstrained minimization with no explicit line searches. Comput Math Appl 42:169–178
Article MathSciNet MATH Google Scholar
Snyman JA, Hay AM (2002) The Dynamic-Q optimization method: an alternative to SQP? Comput Math Appl 44:1589–1598
Article MathSciNet MATH Google Scholar
Svanberg K (2002) A class of globally convergent optimization methods based on conservative convex separable approximations. SIAM J Optim 12:555–573
Article MathSciNet MATH Google Scholar
Van Miegroet L, Moës N, Fleury C, Duysinx P (2005) Generalized shape optimization based on the level set method. In: 6th Congr Struct Multidiscip Optim, pp 1–10, paper no 711
Google Scholar
Wilke DN, Kok S, Groenwold AA (2006) A quadratically convergent unstructured remeshing strategy for shape optimization. Int J Numer Methods Eng 65(1):1–17
Article MATH Google Scholar
Wilke DN, Kok S, Groenwold AA (2010) The application of gradient-only optimization methods for problems discretized using non-constant methods. Struct Multidiscip Optim 40:433–451
Article MathSciNet Google Scholar
Zang I (1981) Discontinuous optimization by smoothing. Math Oper Res 6(1):140–152
Article MathSciNet MATH Google Scholar

Download references

Acknowledgement

The first author gratefully acknowledges financial assistance from the National Research Foundation (NRF) of South Africa.

Author information

Authors and Affiliations

Department of Mechanical and Aeronautical Engineering, University of Pretoria, Pretoria, 0002, South Africa
Daniel Nicolas Wilke & Johannes Arnoldus Snyman
Advanced Mathematical Modelling, CSIR Modelling and Digital Science, P.O. Box 395, Pretoria, 0001, South Africa
Schalk Kok
Department of Mechanical Engineering, University of Stellenbosch, Stellenbosch, 7602, South Africa
Albert A. Groenwold

Authors

Daniel Nicolas Wilke
View author publications
You can also search for this author in PubMed Google Scholar
Schalk Kok
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Arnoldus Snyman
View author publications
You can also search for this author in PubMed Google Scholar
Albert A. Groenwold
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Nicolas Wilke.

Appendix A: Proofs of convergence for derivative descent sequences

Before we present proofs of convergence of (conservative) associated derivative descent sequences we include two gradient-only definitions of the well-known concepts in classical mathematical programming to simplify our proofs of convergence. First, we present a definition of coercive functions based solely on the associated gradient of a function (Peressini et al. 1988). Although this definition does not bear a strict analogy with the conventional coercive definition it suffices for our purposes.

Definition A.1

Let x ¹,x ²∈ℝⁿ. Then a real valued function f:X⊂ℝⁿ→ℝ with associated gradient field ∇_A f(x) that is uniquely defined for every x∈X, is associated derivative coercive if there exist a positive number R _M such that $\nabla_{A}^{\mathrm{T}}f(\boldsymbol{x}^{2})(\boldsymbol{x}^{2}-\boldsymbol{x}^{1}) > \epsilon$ with ϵ>0∈ℝ for non perpendicular ∇_A f(x ²) and (x ²−x ¹), whenever ∥x ²∥≥R _M and ∥x ¹∥<R _M.

Secondly, we present definitions for univariate and multivariate associated gradient unimodality based solely on the associated gradient field of a real valued function (Bazaraa et al. 1993).

Definition A.2

A univariate function f:X⊂ℝ→ℝ with associated derivative $f^{\prime_{A}}(\lambda)$ uniquely defined for every λ∈X, is (resp., strictly) associated derivative unimodal over X if there exists a $x^{*}_{g}\in X$ such that

(A.1)

We now consider (resp., strictly) associated derivative unimodality for multivariate functions (Rardin 1998).

Definition A.3

A multivariate function f:X⊂ℝⁿ→ℝ is (resp., strictly) associated derivative unimodal over X if for all x ¹ and x ² ∈X and x ¹≠x ², every corresponding univariate function

$$F(\lambda) = f(\boldsymbol{x}^1 +\lambda(\boldsymbol{x}^2-\boldsymbol{x}^1)),\quad \lambda\in[0,1]\subset\mathbb{R}$$

is (resp., strictly) associated derivative unimodal according to Definition A.2.

1.1 A.1 Univariate functions

Now that we have an associated derivative based definition of unimodality for univariate functions we present a proof of convergence for strict univariate associated derivative unimodal functions when associated derivative descent sequences are considered.

Theorem A.4

Let f:Λ⊆ℝ→]−∞,∞] be a univariate function that is strictly associated derivative unimodal as defined in Definition A.2, with first associated derivative ${f^{\prime_{A}}}:\varLambda \rightarrow\,]-\infty,\infty[$ uniquely defined everywhere on Λ. If λ ^{0}∈Λ and {λ ^{k}} is an associated derivative descent sequence, as defined in Definition 3.8, for f with initial point λ ^{0}, then every subsequence of {λ ^{k}} converges. The limit of any convergent subsequence of {λ ^{k}} is a strict non-negative associated gradient projection point (S-NN-GPP), as defined in Definition 3.4, of f.

Proof

Our assertion that f is strict associated derivative unimodal as defined in Definition A.2 implies that f has only one S-NN-GPS $S_{S\mbox{-}NN}\subset \varLambda $ as defined in Definition 3.7 at λ ^∗∈Λ. Let $\lambda^{r} \in S_{S\mbox{-}NN}$ such that |λ ^{k}−λ ^r| is a maximum. Consider a sequence of 1-balls {B(b _k,ϵ _k)} defined around $b_{k} =\frac{1}{2}(\lambda^{\{k\}} + \lambda^{r})$ with radius of $\frac{1}{2}|\lambda^{\{k\}} - \lambda^{r}|$. Then every λ ^{k+1}∈B(b _k,ϵ _k), since {λ ^{k}} is an associated derivative descent sequence as defined in Definition 3.8 and f is strict associated derivative unimodal as defined in Definition A.2. Therefore, k→∞ implies |λ ^{k}−λ ^r|→0. It follows from the Cauchy criterion for sequences that {λ ^{k}} is convergent, which completes the proof of our first assertion.

Now let $\{{\lambda}^{\{k\}_{m}}\}$ be a convergent subsequence of {λ ^{k}} and let λ ^m∗ be its limit. Suppose, contrary to the second assertion of the theorem, that λ ^m∗ is not a S-NN-GPP as defined in Definition 3.4 of f. Since we assume that λ ^m∗ is not a S-NN-GPP, and by Definition 3.8, there exist a λ ^m∗+δ for δ≠0∈ℝ such that ${f^{\prime_{A}}}(\lambda^{m*}+\delta)<0$, which contradicts our assumption that λ ^m∗ is the limit of the subsequence $\{{\lambda}^{\{k\}_{m}}\}$. Therefore, for λ ^m∗ to be the limit of an associated derivative descent subsequence $\{\lambda^{\{k\}_{m}}\}$, $\lambda^{m*} \in S_{S\mbox{-}NN}$, which completes the proof. □

We now proceed with a proof of convergence for generalized univariate associated derivative unimodal functions when associated derivative descent sequences are considered.

Theorem A.5

Let f:Λ⊆ℝ→ ]−∞,∞] be a univariate function that is associated derivative unimodal, as defined in Definition A.2, with firstassociated derivative $f^{\prime_{A}}:\varLambda \rightarrow\,]-\infty,\infty[$ uniquely defined everywhere on Λ. If λ ^{0}∈Λ and {λ ^{k}} is an associated derivative descent sequence, as defined in Definition 3.8, for f with initial point λ ^{0}, then every subsequence of {λ ^{k}} converges. The limit of any convergent subsequence of {λ ^{k}} is a generalized G-NN-GPP, as defined in Definition 3.2, of f.

Proof

Our assertion that f is associated derivative unimodal as defined in Definition A.2 implies that f has at least one G-NN-GPS $S_{G\mbox{-}NN}\in \varLambda $ as defined in Definition 3.7. Let S⊂Λ be the union of G-NN-GPSs $S_{G\mbox{-}NN}$. Consider the jth sequence of 1-balls {B(b _k,ϵ _k)}_j defined around $b_{k} =\frac{1}{2}(\lambda^{\{k\}} + (\lambda_{j}^{*}\in S))$ and with radius $\epsilon_{k} = \frac{1}{2}|\lambda^{\{k\}} - (\lambda_{j}^{*}\in S)|$. Then λ ^{k+1}∈B(b _k,ϵ _k)_j for every sequence j since {λ ^{k}} is a associated derivative descent sequence as defined in Definition 3.8 and f is associated derivative unimodal as defined in Definition A.2. Therefore k→∞ implies $|\lambda^{\{k\}} - (\lambda_{j}^{*}\in S)|\rightarrow a_{j}$ with a _j a constant. Since $|\lambda^{\{k\}} -(\lambda_{j}^{*}\in S)|- a_{j} \rightarrow0$ for every j it follows from the Cauchy criterion for sequences that {λ ^{k}} is convergent, which completes the proof of our first assertion.

Now let $\{{\lambda}^{\{k\}_{m}}\}$ be a convergent subsequence of {λ ^{k}} and let λ ^m∗ be its limit. Suppose, contrary to the second assertion of the theorem, that λ ^m∗ is not a G-NN-GPP as defined in Definition 3.2 of f. Since we assume that λ ^m∗ is not a G-NN-GPP, and by Definition 3.8, there exist a λ ^m∗+δ for δ≠0∈ℝ such that ${f^{\prime_{A}}}(\lambda^{m*}+\delta)<0$ which contradicts our assumption that λ ^m∗ is the limit of the subsequence $\{{\lambda}^{\{k\}_{m}}\}$. Therefore, for λ ^m∗ to be the limit of an associated derivative descent subsequence (see Definition 3.8) $\{\lambda^{\{k\}_{m}}\}$, λ ^m∗∈S, which completes the proof. □

Now that we have concluded our proofs of (strictly) associated derivative unimodal univariate functions, we present a proof of convergence for univariate associated derivative coercive functions that have at least one S-NN-GPS.

Theorem A.6

Let f:Λ⊆ℝ→ ]−∞,∞] be a univariate associated derivative coercive function, as defined in Definition A.1, with first associated derivative ${f^{\prime_{A}}}:\varLambda \rightarrow\,]-\infty,\infty[$ uniquely defined everywhere on Λ. If λ ^{0}∈Λ and {λ ^{k}} is an associated derivative descent sequence, as defined in Definition 3.8, for f with initial point λ ^{0}, then there exists at least one convergent subsequence of {λ ^{k}}. The limit of any convergent subsequence of {λ ^{k}} is a S-NN-GPP of f.

Proof

Since we only consider associated derivative descent sequences {λ ^{k}} our assertion that f is associated derivative coercive implies the closed interval [a,b]⊂Λ. The sequence {λ ^{k}} is bounded which follows from our premise of f. It follows from the Weierstrass-Bolzano theorem that in a closed interval [a,b], every sequence has a subsequence that converges to a point in the interval (Berberian 1994).

Now let $\{{\lambda}^{\{k\}_{m}}\}$ be a convergent subsequence of {λ ^{k}} and let λ ^m∗∈Λ be its limit. Suppose, contrary to the second assertion of the theorem, that λ ^m∗ is not a S-NN-GPP of f. Since we assume that λ ^m∗ is not a S-NN-GPP, and by Definition 3.8, there exist a λ ^m∗+δ for δ≠0∈ℝ such that ${f^{\prime_{A}}}(\lambda^{m*}+\delta)<0$, which contradicts our assumption that λ ^m∗ is the limit of the subsequence $\{{\lambda}^{\{k\}_{m}}\}$. Therefore, for λ ^m∗ to be the limit of an associated derivative descent sequence (see Definition 3.8) $\{\lambda^{\{k\}_{m}}\}$, $\lambda^{m*} \in S_{S\mbox{-}NN}$ with $S_{S\mbox{-}NN}\subset \varLambda $ which completes the proof. □

1.2 A.2 Multivariate functions

We begin our proof of convergence of associated derivative descent sequences for multivariate functions with C ¹ continuous convex functions (Peressini et al. 1988), whereupon we present proofs of convergence for broader classes of functions.

Theorem A.7

Suppose f:X⊆ℝⁿ→ℝ is a C ¹ continuous convex function with x∈X. If x ^{0}∈X and {x ^{k}} is an associated derivative descent sequence, as defined in Definition 3.8, for f with initial point x ^{0}, then every subsequence of {x ^{k}} converges. The limit of any convergent sequence of {x ^{k}} is a S-NN-GPP as defined in Definition 3.4 of f.

Proof

Our assertion that f is convex and C ¹ continuous ensures that f has a single global gradient projection point $\boldsymbol{x}^{*}_{g}\in X$. Also, by Definition 3.8 and the continuity of the first partial derivatives, we see that {f(x ^{k})} is a decreasing sequence that is bounded below by $f(\boldsymbol{x}^{*}_{g})$. It follows that {x ^{k}} is a bounded sequence since f is convex. The Bolzano-Weierstrass theorem implies that {x ^{k}} has at least one convergent subsequence, which completes the proof of our first assertion (Peressini et al. 1988).

Now let $\{\boldsymbol{x}^{\{k\}_{m}}\}$ be a convergent subsequence of {x ^{k}} and let x ^m∗∈X be its limit. Suppose, contrary to the second assertion of the theorem, that x ^m∗ is not a S-NN-GPP as defined in Definition 3.4 of f which from our continuity assumption implies ∇_A f(x ^m∗)≠0, which in turn implies that there exists a descent direction u ^m∗ at x ^m∗, such that u ^m∗≠0.

Since $\{\boldsymbol{x}^{\{k\}_{m}}\}$ is an associated derivative descent sequence as defined in Definition 3.8 of which the limit x ^m∗ is by assumption not a S-NN-GPP i.e.

$$-\nabla_A^{\textrm{T}} f(\boldsymbol{x}^{m*})\nabla_Af(\boldsymbol{x}^{m*}) < 0.$$

It follows from the continuity assumptions that there exists a small λ>0∈R such that $-\nabla_{A}^{\mathrm{T}}{f}(\boldsymbol{x}^{m*} +\lambda\boldsymbol{u}^{m*})\nabla_{A} f(\boldsymbol{x}^{m*}) < 0$ which contradicts our assumption that x ^m∗ is the limit of the sequence {x ^{km}}. Therefore, for x ^∗m to be the limit of an associated derivative descent sequence {x ^{km}}, ∇_A f(x ^m∗)=0, which in turn implies u ^m∗=0. The limit x ^∗m of an associated derivative descent sequence as defined in Definition 3.8, is therefore a S-NN-GPP as defined in Definition 3.4, which completes the proof. □

Before we proceed to present a proof of convergence for C ¹ continuous associated derivative coercive functions, we show that if a function is associated derivative coercive and C ¹ continuous it has at least one global gradient projection point.

Proposition A.8

Suppose f:X⊆ℝⁿ→ℝ is a C ¹ continuous associated derivative coercive function as defined in Definition A.1 with x∈X, then f has at least one S-NN-GPP as defined in Definition 3.4.

Proof

Let x ¹,x ²,x ³∈ℝⁿ. Since f is associated derivative coercive as defined in Definition A.1, there exists by definition a number R _M such that for every {x ²:∥x ²∥>R _M}, and every {x ¹:∥x ¹∥<R _M}, the following holds: $\nabla_{A}^{\mathrm{T}}f(\boldsymbol{x}^{2})(\boldsymbol{x}^{2}-\boldsymbol{x}^{1}) > 0$, for non perpendicular ∇_A f(x ²) and (x ²−x ¹). In addition, there exists {x ³:∥x ³∥<R _M}, such that $\nabla_{A}^{\mathrm{T}}f(\boldsymbol{x}^{3})(\boldsymbol{x}^{3}-\boldsymbol{x}^{1}) > 0$. Therefore, the set {x:∥x∥<R _M} is closed and bounded, which by the continuity assumption implies that f(x) assumes a minimum value on {x:∥x∥<R _M} at a point $\boldsymbol{x}^{*}_{g}\in X$. From the continuity assumption of the first partial associated derivatives, it follows that $\nabla_{A} f(\boldsymbol{x}^{*}_{g})=\boldsymbol{0}$ (Peressini et al. 1988). It therefore follows from the continuity assumptions that Definition 3.4 holds at $\boldsymbol{x}^{*}_{g}$. □

Theorem A.9

Suppose f:X⊆ℝⁿ→ℝ is a C ¹ continuous associated derivative coercive function, as defined in Definition A.1, with x∈X. If x ^{0}∈X, and {x ^{k}} is a conservative associated derivative descent sequence, as defined in Definition 3.9, for f with initial point x ^{0}, then some subsequence of {x ^{k}} converges. The limit of any convergent sequence of {x ^{k}} is a G-NN-GPP, as defined in Definition 3.2, of f.

Proof

Our assertion that f is continuous and associated derivative coercive ensures that f has a global minimizer $\boldsymbol{x}^{*}_{g}\in X$. Also, by the definition of a conservative associated derivative descent sequence and the continuity of the first partial associated derivatives, we see that {f(x ^{k})} is a decreasing sequence that is bounded below by $f(\boldsymbol{x}^{*}_{g})$. Note that we require conservative associated derivative descent sequences, since derivative descent sequence is not sufficient to guarantee convergence as it may result in oscillatory behavior for n>1. The remainder of the proof is similar to the proof of Theorem A.7. □

We now proceed to functions that are either C ⁰ continuous or discontinuous, but for which the function values and associated gradient field are uniquely defined everywhere. We present classes of C ⁰ continuous or discontinuous functions for which convergence is guaranteed, since associated derivative descent sequences may not converge to NN-GPP when all C ⁰ continuous or discontinuous functions are considered, as is evident from the following example.

Consider the linear programming problem of finding the intersection between two intersecting planes. Since the associated gradient on each plane is constant, a steepest descent sequence that terminates at the intersection of the two planes is an example of a sequence that converges to some point that is not a NN-GPP.

Hence, we now present classes of well-posed discontinuous functions for which convergence is guaranteed.

Definition A.10

We consider the (resp. generalized/strict) gradient-only optimization problem to be well-posed (resp. convex/unimodal) associated derivative when

1.
the associated gradient field is everywhere uniquely defined,
2.
the problem is associated derivative coercive as defined in Definition A.1,
3.
there exits one and only one (resp. G/S)-NN-GPS (resp. $S_{G\mbox{-}NN}/S_{S\mbox{-}NN}$) as defined in Definition 3.7, and
4.
when every associated derivative descent sequence as defined in Definition 3.8 has at least one converging subsequence to a point in (resp. $S_{G\mbox{-}NN}/S_{S\mbox{-}NN}$).

We now present a class of well-posed associated derivative coercive functions; this includes multimodal functions.

Definition A.11

We consider the gradient-only optimization problem to be (resp. proper/generalized) well-posed associated derivative coercive when

1.
the associated gradient field is everywhere uniquely defined,
2.
the problem is associated derivative coercive as defined in Definition A.1,
3.
there exits at least one (resp. G/S)-NN-GPS (resp. $S_{G\mbox{-}NN}/S_{S\mbox{-}NN}$) as defined in Definition 3.7, and
4.
when every conservative associated derivative descent sequence as defined in Definition 3.9 has at least one converging subsequence to a point in (resp. $S_{G\mbox{-}NN}/S_{S\mbox{-}NN}$).

We note that the classes of functions defined in Definitions A.10–A.11 still exclude many problems of practical significance e.g. linear programming problems. Many of these practically significant problems may be accommodated by altering Definitions A.10–A.11 to hold only for specific associated derivative descent sequences.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wilke, D.N., Kok, S., Snyman, J.A. et al. Gradient-only approaches to avoid spurious local minima in unconstrained optimization. Optim Eng 14, 275–304 (2013). https://doi.org/10.1007/s11081-011-9178-7

Download citation

Received: 17 December 2008
Accepted: 26 October 2011
Published: 26 November 2011
Issue Date: June 2013
DOI: https://doi.org/10.1007/s11081-011-9178-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gradient-only approaches to avoid spurious local minima in unconstrained optimization

Abstract

Access this article

Similar content being viewed by others

A Gradient-Based Globalization Strategy for the Newton Method

Inexact Reduced Gradient Methods in Nonconvex Optimization

Discrete Gradient Methods

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Appendix A: Proofs of convergence for derivative descent sequences

Definition A.1

Definition A.2

Definition A.3

1.1 A.1 Univariate functions

Theorem A.4

Proof

Theorem A.5

Proof

Theorem A.6

Proof

1.2 A.2 Multivariate functions

Theorem A.7

Proof

Proposition A.8

Proof

Theorem A.9

Proof

Definition A.10

Definition A.11

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Gradient-only approaches to avoid spurious local minima in unconstrained optimization

Abstract

Access this article

Similar content being viewed by others

A Gradient-Based Globalization Strategy for the Newton Method

Inexact Reduced Gradient Methods in Nonconvex Optimization

Discrete Gradient Methods

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Appendix A: Proofs of convergence for derivative descent sequences

Appendix A: Proofs of convergence for derivative descent sequences

Definition A.1

Definition A.2

Definition A.3

1.1 A.1 Univariate functions

Theorem A.4

Proof

Theorem A.5

Proof

Theorem A.6

Proof

1.2 A.2 Multivariate functions

Theorem A.7

Proof

Proposition A.8

Proof

Theorem A.9

Proof

Definition A.10

Definition A.11

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation