A variation of Broyden class methods using Householder adaptive transforms

Cipolla, S.; Di Fiore, C.; Zellini, P.

doi:10.1007/s10589-020-00209-8

A variation of Broyden class methods using Householder adaptive transforms

Published: 14 July 2020

Volume 77, pages 433–463, (2020)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

218 Accesses
Explore all metrics

Abstract

In this work we introduce and study novel Quasi Newton minimization methods based on a Hessian approximation Broyden Class-type updating scheme, where a suitable matrix $\tilde{B}_k$ is updated instead of the current Hessian approximation $B_k$. We identify conditions which imply the convergence of the algorithm and, if exact line search is chosen, its quadratic termination. By a remarkable connection between the projection operation and Krylov spaces, such conditions can be ensured using low complexity matrices $\tilde{B}_k$ obtained projecting $B_k$ onto algebras of matrices diagonalized by products of two or three Householder matrices adaptively chosen step by step. Experimental tests show that the introduction of the adaptive criterion, which theoretically guarantees the convergence, considerably improves the robustness of the minimization schemes when compared with a non-adaptive choice; moreover, they show that the proposed methods could be particularly suitable to solve large scale problems where L-BFGS is not able to deliver satisfactory performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Diagonal Approximation of the Hessian by Finite Differences for Unconstrained Optimization

Article 10 May 2020

A hybrid quasi-Newton method with application in sparse recovery

Article 14 July 2022

New Basic Hessian Approximations for Large-Scale Nonlinear Least-Squares Optimization

References

Al-Baali, M.: Analysis of a family of self-scaling quasi-Newton methods. Deptartment of Mathematics and Computer Science, United Arab Emirates University, Technical Report (1993)
Al-Baali, M.: Global and superlinear convergence of a restricted class of self-scaling methods with inexact line searches, for convex functions. Comput. Optim. Appl. 9(2), 191–203 (1998)
Article MathSciNet MATH Google Scholar
Andrei, N.: A double-parameter scaling Broyden-Fletcher-Goldfarb-Shanno method based on minimizing the measure function of byrd and nocedal for unconstrained optimization. J. Optim. Theory Appl. 178(1), 191–218 (2018)
Article MathSciNet MATH Google Scholar
Bortoletti, A., Di Fiore, C., Fanelli, S., Zellini, P.: A new class of quasi-Newtonian methods for optimal learning in MLP-networks. IEEE Trans. Neural Netw. 14(2), 263–273 (2003)
Article Google Scholar
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Article MathSciNet MATH Google Scholar
Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optim. 26(2), 1008–1031 (2016)
Article MathSciNet MATH Google Scholar
Byrd, R.H., Nocedal, J.: A tool for the analysis of quasi-Newton methods with application to unconstrained minimization. SIAM J. Numer. Anal. 26(3), 727–739 (1989)
Article MathSciNet MATH Google Scholar
Byrd, R.H., Nocedal, J., Yuan, Y.X.: Global convergence of a class of Quasi-Newton methods on convex problems. SIAM J. Numer. Anal. 24(5), 1171–1190 (1987)
Article MathSciNet MATH Google Scholar
Cai, J.F., Chan, R.H., Di Fiore, C.: Minimization of a detail-preserving regularization functional for impulse noise removal. J. Math. Imaging Vis. 29(1), 79–91 (2007)
Article MathSciNet Google Scholar
Caliciotti, A., Fasano, G., Roma, M.: Novel preconditioners based on quasi-Newton updates for nonlinear conjugate gradient methods. Optim. Lett. 11(4), 835–853 (2017)
Article MathSciNet MATH Google Scholar
Cipolla, S., Di Fiore, C., Tudisco, F.: Euler-Richardson method preconditioned by weakly stochastic matrix algebras: a potential contribution to Pagerank computation. Electron. J. Linear Algebra 32, 254–272 (2017)
Article MathSciNet MATH Google Scholar
Cipolla, S., Di Fiore, C., Tudisco, F., Zellini, P.: Adaptive matrix algebras in unconstrained minimization. Linear Algebra Appl. 471, 544–568 (2015)
Article MathSciNet MATH Google Scholar
Cipolla, S., Di Fiore, C., Zellini, P.: Low complexity matrix projections preserving actions on vectors. Calcolo 56(2), 8 (2019)
Article MathSciNet MATH Google Scholar
Cipolla, S., Durastante, F.: Fractional PDE constrained optimization: an optimize-then-discretize approach with L-BFGS and approximate inverse preconditioning. Appl. Numer. Math. 123, 43–57 (2018)
Article MathSciNet MATH Google Scholar
Di Fiore, C.: Structured matrices in unconstrained minimization methods. In: Contemporary Mathematics, pp. 205–219 (2003)
Di Fiore, C., Fanelli, S., Lepore, F., Zellini, P.: Matrix algebras in Quasi-Newton methods for unconstrained minimization. Numer. Math. 94(3), 479–500 (2003)
Article MathSciNet MATH Google Scholar
Di Fiore, C., Fanelli, S., Zellini, P.: Low-complexity minimization algorithms. Numer. Linear Algebra Appl. 12(8), 755–768 (2005)
Article MathSciNet MATH Google Scholar
Di Fiore, C., Fanelli, S., Zellini, P.: Low complexity secant quasi-Newton minimization algorithms for nonconvex functions. J. Comput. Appl. Math. 210(1–2), 167–174 (2007)
Article MathSciNet MATH Google Scholar
Di Fiore, C., Lepore, F., Zellini, P.: Hartley-type algebras in displacement and optimization strategies. Linear Algebra Appl. 366, 215–232 (2003)
Article MathSciNet MATH Google Scholar
Di Fiore, C., Zellini, P.: Matrix algebras in optimal preconditioning. Linear Algebra Appl. 335(1–3), 1–54 (2001)
Article MathSciNet MATH Google Scholar
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)
Article MathSciNet MATH Google Scholar
Dunlavy, D.M., Kolda, T.G., Acar, E.: Poblano v1. 0: a matlab toolbox for gradient-based optimization. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND2010-1422 (2010)
Ebrahimi, A., Loghmani, G.: B-spline curve fitting by diagonal approximation BFGS methods. Iran. J. Sci. Technol. Trans. A Sci. 1–12
Eldén, L.: Numerical linear algebra in data mining. Acta Numer. 15, 327–384 (2006)
Article MathSciNet MATH Google Scholar
Horn, R.A., Johnson, C.R.: Matrix Analysis, 2nd edn. Cambridge University Press, Cambridge (2013)
MATH Google Scholar
Jiang, L., Byrd, R.H., Eskow, E., Schnabel, R.B.: A preconditioned L-BFGS algorithm with application to molecular energy minimization. Technical Report, Colorado University at Boulder Dept. of Computer Science (2004)
Kolda, T.G., O’leary, D.P., Nazareth, L.: BFGS with update skipping and varying memory. SIAM J. Optim. 8(4), 1060–1083 (1998)
Article MathSciNet MATH Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Li, D.H., Fukushima, M.: A modified BFGS method and its global convergence in nonconvex minimization. J. Comput. Appl. Math. 129(1), 15–35 (2001)
Article MathSciNet MATH Google Scholar
Liu, C., Vander Wiel, S.A.: Statistical Quasi-Newton: a new look at least change. SIAM J. Optim. 18(4), 1266–1285 (2007)
Article MathSciNet MATH Google Scholar
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1–3), 503–528 (1989)
Article MathSciNet MATH Google Scholar
Nazareth, L.: A relationship between the BFGS and conjugate gradient algorithms and its implications for new algorithms. SIAM J. Numer. Anal. 16(5), 794–800 (1979)
Article MathSciNet MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, Berlin (2006)
MATH Google Scholar
Oren, S.S., Luenberger, D.G.: Self-scaling variable metric (SSVM) algorithms: part i: criteria and sufficient conditions for scaling a class of algorithms. Manag. Sci. 20(5), 845–862 (1974)
Article MATH Google Scholar
Powell, M.J.D.: Some global convergence properties of a variable metric algorithm for minimization without exact line searches. Nonlinear Program SIAM-AMS Proc. 9, 53–72 (1976)
MathSciNet MATH Google Scholar
Saad, Y.: Analysis of some Krylov subspace approximations to the matrix exponential operator. SIAM J. Numer. Anal. 29(1), 209–228 (1992)
Article MathSciNet MATH Google Scholar
Saad, Y.: Numerical methods for large eigenvalue problems. SIAM (2011). https://doi.org/10.1137/1.9781611970739
Article MATH Google Scholar

Download references

Acknowledgements

The authors acknowledge anonymous referees for their thorough reading of the manuscript and the many suggestions they gave in order to improve its readability. Moreover, they acknowledge the Associated Editor for his/her valuable commentaries and for suggesting the introduction of the scaling factor as in Sect. 4. S.C. and C.D.F. are members of the INdAM Research group GNCS, which partially supported this work. C.D.F acknowledges the partial support of the Italian mathematics Research Institute INdAM-GNCS and of the MIUR Excellence Department Project awarded to the Dept of Mathematics, Univ. of Rome “Tor Vergata”, CUP E83C18000100006.

Author information

Authors and Affiliations

University of Padua, Via Trieste 63, 35121, Padua, Italy
S. Cipolla
University of Rome “Tor Vergata”, Via Della Ricerca Scientifica 1, 00133, Rome, Italy
C. Di Fiore & P. Zellini

Authors

S. Cipolla
View author publications
You can also search for this author in PubMed Google Scholar
C. Di Fiore
View author publications
You can also search for this author in PubMed Google Scholar
P. Zellini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Cipolla.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Householder matrices

The results contained in this section are borrowed from [13] and we refer the interested reader there for more details.

Definition 1

(Householder Orthogonal Matrix) Given a vector ${\mathbf {p}}\in \mathbb {R}^n$ define

$$\begin{aligned} {\mathcal {H}}({\mathbf {p}}):=I_n-\frac{2}{\Vert {\mathbf {p}}\Vert ^2}{\mathbf {p}}{\mathbf {p}}^T. \end{aligned}$$

Consider two vectors ${\mathbf {v}},\, {\mathbf {z}}\in \mathbb {R}^n$. From direct computation one can check that defining ${\mathbf {p}}= {\mathbf {v}}- \frac{\Vert {\mathbf {v}}\Vert }{\Vert {\mathbf {z}}\Vert }{\mathbf {z}}$ with ${\mathbf {z}}\ne 0,$ we have

$$\begin{aligned} {\mathcal {H}}({\mathbf {p}}){\mathbf {v}}=\frac{\Vert {\mathbf {v}}\Vert }{\Vert {\mathbf {z}}\Vert }{\mathbf {z}}. \end{aligned}$$

Lemma 5

([13]) Consider $W=[{\mathbf {w}}_1|\dots |{\mathbf {w}}_s] \in \mathbb {R}^{n\times s}, V=[{\mathbf {v}}_1|\dots |{\mathbf {v}}_s] \in \mathbb {R}^{n\times s}$of full rank and such that $s \le n$, $W^TW=V^TV$. Then there exist $\,{\mathbf {h}}_1, \dots ,{\mathbf {h}}_{s} \in \mathbb {R}^n$, $\Vert {\mathbf {h}}_i\Vert =\sqrt{2}$, such that the orthogonal matrix $U=\mathcal {H}({\mathbf {h}}_s)\cdots \mathcal {H}({\mathbf {h}}_1)$, product of s Householder matrices, satisfies the following identities

$$\begin{aligned} U {\mathbf {w}}_{i}= {\mathbf {v}}_i \hbox { for all } i \in \{1, \dots , s\}. \end{aligned}$$

The vectors ${\mathbf {h}}_i$for $i\in \{1,\dots ,s\}$can be obtained by setting:

$$\begin{aligned} \begin{aligned}&{ \tilde{{\mathbf {h}}}_i := \mathcal {H}({\mathbf {h}}_{i-1}) \cdots \mathcal {H}({\mathbf {h}}_1) ({\mathbf {w}}_{i}-{\mathbf {w}}_{i-1}) - ({\mathbf {v}}_{i}-{\mathbf {v}}_{i-1}),}\\&{\mathbf {h}}_i:= ( \sqrt{2}/\Vert \tilde{{\mathbf {h}}}_i\Vert )\tilde{{\mathbf {h}}}_i \end{aligned} \end{aligned}$$

(65)

(where we set ${\mathbf {h}}_0={\mathbf {w}}_0={\mathbf {v}}_0=\varvec{0}$). If $s=n$we have ${\mathbf {h}}_n=\mathbf {0}$or ${\mathbf {h}}_n=\frac{\sqrt{2}}{\Vert {\mathbf {v}}_n\Vert }{\mathbf {v}}_n.$The cost of the computation of the ${\mathbf {h}}_i$for $i=1,\dots ,s$is:

$$\begin{aligned} \begin{aligned}&[s(s-1)n+ s(2n+1)] \hbox { multiplications } \\&\quad +[(s(s+2) - 2)n + s(n-1)] \hbox { additions } \\&\quad + s \hbox { square roots.} \end{aligned} \end{aligned}$$

Observe that when ${\mathbf {w}}_i={\mathbf {e}}_{k_i}$for $i=1,\dots ,s$, that is when ${\mathbf {v}}_1, \dots , {\mathbf {v}}_s $are orthonormal and we are interested to construct an orthogonal U with s columns fixed as ${\mathbf {v}}_1, \dots , {\mathbf {v}}_s $, it is possible to save $(s-1)n \hbox { mult.}$and $ (3s-2)n \hbox { add..}$

Proof

The explicit expression of the ${\mathbf {h}}_i$ in (65) is obtained by applying the techniques for their construction introduced in [13]. $\square $

Appendix 2: Details on Theorem 1

In order to prove inequality (34) it is enough to prove the following:

Lemma 6

There exists $c_3$, constant with respect to j and depending only on s and M, such that

$$\begin{aligned} \gamma ((j+1-s)+1)^{n} \le c_3^{j+1-s} \hbox { for all } j \ge s, \hbox { where } \gamma :=\left( \frac{c_1}{n}\right) ^n\frac{1}{\det B_s} \end{aligned}$$

(of course, such $c_3$turns out to be greater than 1).

In fact, once Lemma 6 is proved, the constant $c_2$ (constant with respect to j) for which (34) is verified, will be $c_2=2c_1c_3/(1-\beta )$ (note that $c_2$ depends only s, M, $\beta $ but not on j).

Proof

Fix $\tilde{c}_3>1$. Note that the sequence of positive numbers

$$\begin{aligned} \frac{\gamma ((j+1-s)+1)^n}{\tilde{c}_3^{j+1-s}} \hbox { for } j= s,s+1,\dots \end{aligned}$$

converges to zero as $j \rightarrow +\infty $; thus there exists $j^{*}\ge s$ (depending on s, M and $\tilde{c}_3$) s.t.

$$\begin{aligned} \gamma ((j+1-s)+1)^n \le {\tilde{c}_3^{j+1-s}} \hbox { for all } j \ge j^{*}. \end{aligned}$$

Note also that for all $j \in \{s+1, \dots , j^{*}-1\}$ we have

$$\begin{aligned} \gamma ((j+1-s)+1)^n \le \gamma (j^{*}-s+1)^n \end{aligned}$$

(66)

and consider $\hat{j} \ge j^{*}$ s.t. $\gamma (\hat{j}-s+1)^n>1$ ($\hat{j}$ depends on s, M, $\gamma $ and $\tilde{c}_3$). From (66) we have

$$\begin{aligned} \gamma ((j+1-s)+1)^n \le \gamma (\hat{j}-s+1)^n \le (\gamma (\hat{j}-s+1)^n)^{j+1-s} \end{aligned}$$

for all $j \in \{s,s+1, \dots j^{*}-1\}$.

Collecting the above results, we can conclude that

$$\begin{aligned} \gamma ((j+1-s)+1)^n \le c_3^{j+1-s} \hbox { for all } j \ge s \end{aligned}$$

(67)

where $c_3:=\max \{ \tilde{c}_3, \gamma (\hat{j}-s+1)^n\}$ ($c_3>1$ and depends on s, M and $\tilde{c}_3$).

Finally note that, once $\tilde{c}_3$ is fixed, it is clear that $c_3$ depends only on s, M. $\square $

In order to prove inequality ($34_{1}$), define $a_k:=(1-\phi -\psi _k\phi )\Vert {\mathbf {g}}_k\Vert ^2/{\mathbf {s}}_k^{T}(-{\mathbf {g}}_k)>0$. We know that $\lim _{k \rightarrow +\infty }a_k=+\infty $ and we have to show that there exists $j^{*}\ge s$ such that

$$\begin{aligned} \prod _{k=s}^{j} a_k > c_2^{j+1-s} \hbox { for all } j \ge j^{*}. \qquad \qquad \qquad \qquad \qquad (34_{1}\hbox {bis}) \end{aligned}$$

If $a_k\ge c_2$ for all $k \ge s$, since it must be $a_k>c_2$ for infinite indexes k, then the thesis is obvious. So assume that there exists some index k such that $a_k < c_2$. Let $r \ge s$ be such that $a_k>c_2$ for all $k>r$. Note that $c_2>\min _{k=s, \dots , r}a_k$. Set

$$\begin{aligned} t:=\left( \frac{c_2}{\min _{k=s, \dots , r}a_k} \right) ^{r+1-s} >1. \end{aligned}$$

Let $j^{*}>r+1$ be such that $a_k \ge t c_2$ for all $k\ge j^{*}$. Then we have

$$\begin{aligned} \begin{aligned} \prod _{k=s}^{j^{*}}a_k&=\left( \prod _{k=s}^{r}a_k\right) \left( \prod _{k=r+1}^{j^{*}-1}a_k\right) a_{j^{*}} \\&>\left( \min _{k=s,\dots ,r}a_k\right) ^{r-s+1}c_2^{j^{*}-r-1}tc_2\\&=\left( \min _{k=s,\dots ,r}a_k\right) ^{r-s+1}\left( \frac{c_2}{\min _{k=s, \dots , r}a_k} \right) ^{r-s+1}c_2^{j^{*}-r}=c_2^{j^*-s+1}, \end{aligned} \end{aligned}$$

i.e., $\prod _{k=s}^{j^{*}}a_k>c_2^{j^*-s+1}$. Thus we obtain ($34_{1}\hbox {bis}$) since $a_k\ge tc_2>c_2$ for $k> j^{*}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cipolla, S., Di Fiore, C. & Zellini, P. A variation of Broyden class methods using Householder adaptive transforms. Comput Optim Appl 77, 433–463 (2020). https://doi.org/10.1007/s10589-020-00209-8

Download citation

Received: 17 August 2017
Published: 14 July 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s10589-020-00209-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A variation of Broyden class methods using Householder adaptive transforms

Abstract

Access this article

Similar content being viewed by others

Diagonal Approximation of the Hessian by Finite Differences for Unconstrained Optimization

A hybrid quasi-Newton method with application in sparse recovery

New Basic Hessian Approximations for Large-Scale Nonlinear Least-Squares Optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Householder matrices

Definition 1

Lemma 5

Proof

Appendix 2: Details on Theorem 1

Lemma 6

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A variation of Broyden class methods using Householder adaptive transforms

Abstract

Access this article

Similar content being viewed by others

Diagonal Approximation of the Hessian by Finite Differences for Unconstrained Optimization

A hybrid quasi-Newton method with application in sparse recovery

New Basic Hessian Approximations for Large-Scale Nonlinear Least-Squares Optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Householder matrices

Definition 1

Lemma 5

Proof

Appendix 2: Details on Theorem 1

Lemma 6

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation