Abstract
This study deals with Riemannian optimization on the unit sphere in terms of p-norm with general \(p> 1\). As a Riemannian submanifold of the Euclidean space, the geometry of the sphere with p-norm is investigated, and several geometric tools used for Riemannian optimization, such as retractions and vector transports, are proposed and analyzed. Applications to Riemannian optimization on the sphere with nonnegative constraints and \(\textit{L}_{\textit{p}}\)-regularization-related optimization are also discussed. As practical examples, the former includes nonnegative principal component analysis, and the latter is closely related to the Lasso regression and box-constrained problems. Numerical experiments verify that Riemannian optimization on the sphere with p-norm has substantial potential for such applications, and the proposed framework provides a theoretical basis for such optimization.
Similar content being viewed by others
Data availibility
The datasets generated during the current study are available from the corresponding author upon request.
Notes
We write \(p \in [1, \infty ]\) to indicate that \(p \in [1, \infty )\) or \(p = \infty \) holds.
The statement can be rewritten as follows: for any positive integer k, \(S^{n-1}_p\) is a \(C^{2k-1}\) submanifold of \({\mathbb {R}}^n\) if \(2k-1< p < 2k\), \(C^\infty \) submanifold if \(p = 2k\), and \(C^{2k}\) submanifold if \(2k < p \le 2k+1\).
Since the tangent bundle \(T {\mathcal {M}}\) of a \(C^r\) manifold is a \(C^{r-1}\) manifold, we say that a map defined on \(T{\mathcal {M}}\) is smooth if it is of class \(C^{r-1}\).
As discussed before, if (21) holds for some \(\alpha \in {\mathbb {R}}\), then \(\alpha \) should be nonnegative.
The original Problem (47) has an advantage that it is a convex optimization problem; however, Problem (48) is not since the sphere \(\{w \in {\mathbb {R}}^n \mid \Vert w\Vert _p = C\}\) is not a convex set. Nevertheless, Problem (48) can be solved by unconstrained Riemannian optimization methods. Although the projection of an unconstrained optimal solution, i.e., \(w^{\textrm{unconst}} \in {\mathbb {R}}^n\) that minimizes L without constraint, onto \(S^{n-1}_p\) is a naive feasible solution to (47), it may not be a good solution (see also the numerical experiments in Sects. 7.2.2 and 7.2.3). This is a motivation for exploring Problem (48) as an alternative to Problem (47).
When \(n = 2\), we regard \(\log _2 n / (\log _2 n - 1)\) as \(\infty \). Indeed, \(2^{1-1/p} - 1 \in (0, 1)\) holds for any \(p > 1\).
When \(n = 2\), we interpret this condition as \(p \in (1, \infty )\).
If \(l_i = u_i\) for some i, then \(l_i\) is the only value that the corresponding \(w_i\) can take. By eliminating such a constant variable in advance, if necessary, we can assume \(l < u\) without loss of generality.
References
Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008)
Absil, P.-A., Malick, J.: Projection-like retractions on matrix manifolds. SIAM J. Optim. 22(1), 135–158 (2012)
Adler, R.L., Dedieu, J.-P., Margulies, J.Y., Martens, M., Shub, M.: Newton’s method on Riemannian manifolds and a geometric model for the human spine. IMA J. Numer. Anal. 22(3), 359–390 (2002)
Boumal, N., Mishra, B., Absil, P.-A., Sepulchre, R.: Manopt, a Matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15(1), 1455–1459 (2014)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Dong, S., Absil, P.-A., Gallivan, K.A.: Graph learning for regularized low-rank matrix completion. In: 23rd International Symposium on Mathematical Theory of Networks and Systems (MTNS 2018), pp. 460–467 (2018)
Fukuda, E.H., Fukushima, M.: A note on the squared slack variables technique for nonlinear optimization. J. Oper. Res. Soc. Jpn. 60(3), 262–270 (2017)
Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press, Boca Raton (2015)
Huang, W., Gallivan, K.A., Absil, P.-A.: A Broyden class of quasi-Newton methods for Riemannian optimization. SIAM J. Optim. 25(3), 1660–1685 (2015)
Huang, W., Absil, P.-A., Gallivan, K.A.: A Riemannian BFGS method without differentiated retraction for nonconvex optimization problems. SIAM J. Optim. 28(1), 470–495 (2018)
Hurley, N., Rickard, S.: Comparing measures of sparsity. IEEE Trans. Inf. Theory 55(10), 4723–4741 (2009)
Ivanov, A.: The Theory of Approximate Methods and Their Applications to the Numerical Solution of Singular Integral Equations, vol. 2. Springer, Berlin (1976)
Khuzani, M.B., Li, N.: Stochastic primal-dual method on Riemannian manifolds of bounded sectional curvature. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 133–140. IEEE (2017)
Liu, C., Boumal, N.: Simple algorithms for optimization on Riemannian manifolds with constraints. Appl. Math. Optim. 82(3), 949–981 (2020)
Nocedal, J., Wright, S.: Numerical Optimization, 2nd edn. Springer, Berlin (2006)
Oneto, L., Ridella, S., Anguita, D.: Tikhonov, Ivanov and Morozov regularization for support vector machine learning. Mach. Learn. 103(1), 103–136 (2016)
Ring, W., Wirth, B.: Optimization methods on Riemannian manifolds and their application to shape space. SIAM J. Optim. 22(2), 596–627 (2012)
Sakai, H., Iiduka, H.: Sufficient descent Riemannian conjugate gradient methods. J. Optim. Theory Appl. 190(1), 130–150 (2021)
Sakai, H., Sato, H., Iiduka, H.: Global convergence of Hager–Zhang type Riemannian conjugate gradient method. Appl. Math. Comput. 441, 127685 (2023)
Sato, H.: A Dai–Yuan-type Riemannian conjugate gradient method with the weak Wolfe conditions. Comput. Optim. Appl. 64(1), 101–118 (2016)
Sato, H.: Riemannian Optimization and Its Applications. Springer Nature, Berlin (2021)
Sato, H.: Riemannian conjugate gradient methods: general framework and specific algorithms with convergence analyses. SIAM J. Optim. 32(4), 2690–2717 (2022)
Sato, H., Iwai, T.: A new, globally convergent Riemannian conjugate gradient method. Optimization 64(4), 1011–1031 (2015)
Sato, H., Kasai, H., Bamdev, M.: Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport. SIAM J. Optim. 29(2), 1444–1472 (2019)
Shub, M.: Some remarks on dynamical systems and numerical analysis. In: Dynamical Systems and Partial Differential Equations: Proceedings of VII ELAM, pp. 69–92 (1986)
Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-posed Problems. V. H. Winston & Sons, Washington, DC (1977)
Tu, L.W.: An Introduction to Manifolds. Springer, New York (2010)
Zass, R., Shashua, A.: Nonnegative sparse PCA. Adv. Neural Inf. Process. Syst. 19, 1561–1568 (2007)
Zhou, P., Yuan, X.-T., Yan, S., Feng, J.: Faster first-order methods for stochastic non-convex optimization on Riemannian manifolds. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 459–472 (2019)
Zhu, X., Sato, H.: Riemannian conjugate gradient methods with inverse retraction. Comput. Optim. Appl. 77(3), 779–810 (2020)
Acknowledgements
The author would like to thank the editor and anonymous referee for the constructive comments that helped improve the paper.
Funding
This work was supported by JSPS KAKENHI Grant Number JP20K14359.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Proof of Proposition 10
Appendix A: Proof of Proposition 10
In this section, we provide a complete proof of Proposition 10 for self-containedness.
Proof of Proposition 10
First, we fix \(\lambda \ge 0\) and let \(w_*\) be an optimal solution to Problem (46). Then, for any \(w \in {\mathbb {R}}^n\), we have
We show that \(w_*\) is an optimal solution to Problem (47) with \(C :=\Vert w_*\Vert _p\). For any feasible solution \(w \in {\mathbb {R}}^n\) to Problem (47), we have \(\Vert w\Vert _p \le C = \Vert w_*\Vert _p\). Combining this and (A1), we have \(L(w_*) + \lambda \Vert w_*\Vert _p \le L(w) + \lambda \Vert w_*\Vert _p\), which means \(L(w_*) \le L(w)\). Furthermore, \(w_*\) is clearly a feasible solution to Problem (47) since \(\Vert w_*\Vert _p = C\). Therefore, \(w_*\) is an optimal solution to Problem (47).
Conversely, we fix \(C \ge 0\) and let \(w_*\) be an optimal solution to Problem (47). Here, we additionally consider the Lagrange dual problem of (47):
If \(C > 0\), then Slater’s condition for Problem (47), which is that there exists \(w \in {\mathbb {R}}^n\) with \(\Vert w\Vert _p < C\), clearly holds with \(w = 0\). If \(C = 0\), then the constraint \(\Vert w\Vert _p \le C\) in Problem (47) is rewritten as the equality constraint \(w = 0\), and Slater’s condition (which in this case is that a feasible solution exists) holds by taking \(w = 0\). In each case, Slater’s condition for Problem (47) holds. Furthermore, Problem (47) is a convex optimization problem. Therefore, it follows from Slater’s theorem [5, Section 5.2.3] that strong duality holds. Hence, the optimal value \(L(w_*)\) of Problem (47) and the optimal value of the dual problem (A2) coincide. Letting \(\mu _* \ge 0\) be an optimal solution to (A2), we have
Since \(\Vert w_*\Vert _p \le C\) and \(\mu _* \ge 0\), we obtain \(L(w_*) \le L(w_*) + \mu _*(\Vert w_*\Vert _p - C) \le L(w_*)\). Thus, \(L(w_*) = L(w_*) + \mu _*(\Vert w_*\Vert _p - C)\) holds, and \(w = w_*\) attains the minimum value of \(L(w) + \mu _*(\Vert w\Vert _p - C)\) over all \(w \in {\mathbb {R}}^n\). Since \(\mu _* C\) is a constant, \(w = w_*\) also attains the minimum value of \(L(w) + \mu _*\Vert w\Vert _p\) over \({\mathbb {R}}^n\). This implies that \(w_*\) is an optimal solution to Problem (46) with \(\lambda = \mu _*\), thereby completing the proof. \(\square \)
Remark 8
Although we focus on the p-norm here, Proposition 10 can be further straightforwardly generalized to the case with a general norm in \({\mathbb {R}}^n\). Indeed, in the proof of Proposition 10, we do not need any specific property of the p-norm but properties of a general norm.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sato, H. Riemannian optimization on unit sphere with p-norm and its applications. Comput Optim Appl 85, 897–935 (2023). https://doi.org/10.1007/s10589-023-00477-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-023-00477-0