Abstract
We devise a new numerical method for solving the minimization problem over the Stiefel manifold, that is, the set of matrices of order \(n \times p\) (here \(p \le n\)) with orthonormal columns. Our approach consists in a nonmonotone feasible arc search along a sufficient descent direction to assure convergence to stationary points, regardless the initial point considered. The feasibility of the iterates is maintained through a variation of the Cayley transform and thus our scheme can be seen as a retraction-based algorithm for minimization with orthogonality constraints. We emphasize that our scheme solves a \(p\times p\) linear system at each iteration and has computational complexity of \(O(np^2) + O(p^3)\), which is interesting when \(p \ll n\). We present a general algorithmic framework for minimization on Stiefel manifold, give its global convergence properties and report numerical experiments on interesting applications.
Similar content being viewed by others
Data availability statement
The data sets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Notes
Available at https://github.com/optsuite/OptM.
using the last 10 iterations in the memory.
Certified by the Matlab routine eigs.
References
Abrudan T, Eriksson J, Koivunen V (2008) Steepest descent algorithms for optimization under unitary matrix constraint. IEEE Trans Signal Process 56(3):1134–1147
Abrudan T, Eriksson J, Koivunen V (2009) Conjugate gradient algorithm for optimization under unitary matrix constraint. Signal Process 89:1704–1714
Absil PA, Malick J (2012) Projection-like retractions on matrix manifolds. SIAM J Optim 22(1):135–158
Absil P-A, Mahony R, Sepulchre R (2004) Riemannian geometry of Grassmann manifolds with a view on algorithmic computation. Acta Appl Math 80:199–220
Absil P-A, Baker CG, Gallivan KA (2007) Trust-region methods on Riemannian manifolds. Found Comput Math 7(3):303–330
Absil PA, Mahony R, Sepulchre R (2008) Optimization algorithms on matrix manifolds. Princeton University Press, Princeton
Barzilai J, Borwein JM (1988) Two point step size gradient methods. IMA J Numer Anal 8:141–148
Bendokat T, Zimmermann R (2021) Efficient quasi-geodesics on the Stiefel manifold. In: Nielsen F, Barbaresco F (eds) Geometric science of information. Springer International Publishing, New York, pp 763–771
Bertsekas DP (2003) Constrained optimization and Lagrange multiplier methods. Massachusetts Institute of Technology, Cambridge
Boumal N, Mishra B, Absil P-A, Sepulchre R (2014) Manopt, a Matlab toolbox for optimization on manifolds. J Mach Learn Res 15:1455–1459
Cancès E, Chakir R, Maday Y (2010) Numerical analysis of nonlinear eigenvalue problems. J Sci Comput 45:90–117
Davis TA, Hu Y (2011) The University of Florida sparse matrix collection. ACM Trans Math Softw 38(1):1–25
Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Progr 91:201–213
Edelman A, Arias TA, Smith ST (1998) The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anal Appl 20(2):303–353
Francisco JB, Viloche Bazán FS (2012) Nonmonotone algorithm for minimization on closed sets with application to minimization on Stiefel manifolds. J Comput Appl Math 236(10):2717–2727
Francisco JB, Bazán FSV, Weber Mendonça M (2017) Non-monotone algorithm for minimization on arbitrary domains with applications to large-scale orthogonal procrustes problem. Appl. Numer. Math 112:51–64
Francisco JB, Gonçalves DS, Bazán FSV, Paredes LLT (2020) Non-monotone inexact restoration method for nonlinear programming. Comput Optim Appl 76:867–888
Francisco JB, Gonçalves DS, Bazán FSV, Paredes LLT (2021) Nonmonotone inexact restoration approach for minimization with orthogonality constraints. Numer Algorithms 86:1651–1684
Gao B, Liu X, Chen X, Yuan Y (2018) A new first-order algorithmic framework for optimization problems with orthogonality constraints. SIAM J Optim 28(1):302–332
Golub GA, Van Loan CF (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, London
Grippo L, Lampariello F, Lucidi S (1986) A nonmonotone line search technique for Newton’s method. SIAM J Numer Anal 23:707–716
Helgaker T, Jørgensen J, Olsen J (2000) Molecular electronic—structure theory. Wiley, Chichester
Hu J, Liu X, Wen Z, Yuan Y (2020) A brief introduction to manifold optimization. J Oper Res Soc China 8:199–248
Huang W, Absil P-A, Gallivan KA (2016) A Riemannian BFGS method for nonconvex optimization problems. Springer International Publishing, Cham, pp 627–634
Iannazzo B, Porcelli M (2018) The Riemannian Barzilai–Borwein method with nonmonotone line search and the matrix geometric mean computation. IMA J Numer Anal 38:495–517
Janin R (1984) Direction derivative of the marginal function in nonlinear programming. Math Progr Study 21:127–138
Jiang B, Dai YH (2015) A framework of constraint preserving update schemes for optimization on Stiefel manifold. Math Progr 153(2):535–575
Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. J Mach Learn Res 11:517–553
Kohn W, Nobel Lecture (1999) Electronic structure of matter-wave functions and density functionals. Rev Mod Phys 71(5):1253–1266
Nishimori Y, Akaho S (2005) Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold. Neurocomputing 67:106–135
Oviedo H, Dalmau O, Lara H (2021) Two adaptive scaled gradient projection methods for Stiefel manifold constrained optimization. Numer Algorithms 87:1107–1127
Raydan M (1997) The Barzilai and Borwein gradient method for large scale unconstrained minimization problem. SIAM J Optim 7:26–33
Shariff M (1995) A constrained conjugate gradient method and the solution of linear equations. Comput Math Appl 30(11):25–37
Trendafilov N, Gallo M (2021) Multivariate data analysis on matrix manifolds. Springer series in the data sciences, Springer, Cham
Turaga P, Veeraraghavan A, Chellappa R (2008) Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision. In: IEEE conference on computer vision and pattern recognition, pp 1–8
Wen Z, Yin W (2013) A feasible method for optimization with orthogonality constraints. Math Progr 142:397–434
Zhang H, Hager W (2004) A nonmonotone line search technique and its application to unconstrained optimization. SIAM J Optim 14(4):1043–1056
Zhao Z, Bai Z-J, Jin X-Q (2015) A Riemannian Newton algorithm for nonlinear eigenvalue problems. SIAM J Matrix Anal Appl 36(2):752–774
Zhu X (2015) A feasible filter method for the nearest low-rank correlation matrix problem. Numer Algorithms 69:763–784
Funding
This work was partially supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq - Brasil, Grant no. 305213/2021-0.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests that are relevant to the content of this article.
Additional information
Communicated by Orizon Pereira Ferreira.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Floating point arithmetic considerations
Appendix A: Floating point arithmetic considerations
We remark that in Theorem 1 the calculations of \({{\mathcal {R}}}_T\), U and V are assumed to be computed with high precision, as well as the solution of the linear system (A2), in such a way that \(X^\textrm{T}X = I\) throughout the proof. Indeed, this assumption is even observed in our numerical experiments. Nevertheless, occasionally, such precision can be lost and such a consideration is no longer true. In this direction, we also provide a result analogous to Theorem 1 but considering the feasibility residual \({{{\mathcal {R}}}}_V = I - X^\textrm{T}X\) into calculations. Therefore, in this context of loss of numerical precision, we recommend to replace \({{\mathcal {R}}}_1\) and \({{\mathcal {R}}}_T\) in (20) by \({{\mathcal {R}}}_1^\textrm{Full}\) and \({{\mathcal {R}}}_T^\textrm{Full}\), respectively, according to the next theorem.
Theorem 3
Let \(X \in \Gamma \), \(Z \in {\mathbb {R}}^{n\times p}\) and define \(W(t)\in {\mathbb {R}}^{2p \times p}\), \({{\mathcal {R}}}_1\) and \({{\mathcal {R}}}_T\) as in Theorem 1. For all \(t\in {\mathbb {R}}\), if we consider \({{\mathcal {R}}}_V = I - X^\textrm{T}X\),
and \({{\mathcal {R}}}_T^\textrm{Full} = {{\mathcal {R}}}_T X^\textrm{T}X\), we have that
Proof
From Theorem 1, we have that \(X^{+}(t) = X + tUW(t)\), with W(t) solution of
Now, using definition of \({{\mathcal {R}}}_V\) and \({{\mathcal {R}}}_T\), we have, from definition of \({{\mathcal {R}}}_1^\textrm{Full}\), that
and
Thus, linear system (A2) leads to
Now, by assuming that \(\Vert \frac{t}{2}{{\mathcal {R}}}_V X^\textrm{T}Z\Vert < 1\), from Bannach’s Lemma (Golub and Van Loan 1996), \((I - \frac{t}{2}{{\mathcal {R}}}_V X^\textrm{T}Z)^{-1} = I + \frac{t}{2}{{\mathcal {R}}}_V X^\textrm{T}Z + O(\Vert {{\mathcal {R}}}_V\Vert ^2)\). Then, previous linear system becomes
from where the proof follows. \(\square \)
It is worth observing that when \({{\mathcal {R}}}_V =0\), Theorem 3 is reduced to Theorem 1. In addition, since \({{\mathcal {R}}}_V \approx 0\) (since X is practically feasible), the term \(O(\Vert {{\mathcal {R}}}_V\Vert ^2)\) can be disregarded in (A1).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Francisco, J.B., Gonçalves, D.S. Nonmonotone feasible arc search algorithm for minimization on Stiefel manifold. Comp. Appl. Math. 42, 175 (2023). https://doi.org/10.1007/s40314-023-02310-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40314-023-02310-0