Skip to main content
Log in

Riemannian optimization on unit sphere with p-norm and its applications

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

This study deals with Riemannian optimization on the unit sphere in terms of p-norm with general \(p> 1\). As a Riemannian submanifold of the Euclidean space, the geometry of the sphere with p-norm is investigated, and several geometric tools used for Riemannian optimization, such as retractions and vector transports, are proposed and analyzed. Applications to Riemannian optimization on the sphere with nonnegative constraints and \(\textit{L}_{\textit{p}}\)-regularization-related optimization are also discussed. As practical examples, the former includes nonnegative principal component analysis, and the latter is closely related to the Lasso regression and box-constrained problems. Numerical experiments verify that Riemannian optimization on the sphere with p-norm has substantial potential for such applications, and the proposed framework provides a theoretical basis for such optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data availibility

The datasets generated during the current study are available from the corresponding author upon request.

Notes

  1. We write \(p \in [1, \infty ]\) to indicate that \(p \in [1, \infty )\) or \(p = \infty \) holds.

  2. The statement can be rewritten as follows: for any positive integer k, \(S^{n-1}_p\) is a \(C^{2k-1}\) submanifold of \({\mathbb {R}}^n\) if \(2k-1< p < 2k\), \(C^\infty \) submanifold if \(p = 2k\), and \(C^{2k}\) submanifold if \(2k < p \le 2k+1\).

  3. Since the tangent bundle \(T {\mathcal {M}}\) of a \(C^r\) manifold is a \(C^{r-1}\) manifold, we say that a map defined on \(T{\mathcal {M}}\) is smooth if it is of class \(C^{r-1}\).

  4. As discussed before, if (21) holds for some \(\alpha \in {\mathbb {R}}\), then \(\alpha \) should be nonnegative.

  5. The original Problem (47) has an advantage that it is a convex optimization problem; however, Problem (48) is not since the sphere \(\{w \in {\mathbb {R}}^n \mid \Vert w\Vert _p = C\}\) is not a convex set. Nevertheless, Problem (48) can be solved by unconstrained Riemannian optimization methods. Although the projection of an unconstrained optimal solution, i.e., \(w^{\textrm{unconst}} \in {\mathbb {R}}^n\) that minimizes L without constraint, onto \(S^{n-1}_p\) is a naive feasible solution to (47), it may not be a good solution (see also the numerical experiments in Sects. 7.2.2 and 7.2.3). This is a motivation for exploring Problem (48) as an alternative to Problem (47).

  6. When \(n = 2\), we regard \(\log _2 n / (\log _2 n - 1)\) as \(\infty \). Indeed, \(2^{1-1/p} - 1 \in (0, 1)\) holds for any \(p > 1\).

  7. When \(n = 2\), we interpret this condition as \(p \in (1, \infty )\).

  8. If \(l_i = u_i\) for some i, then \(l_i\) is the only value that the corresponding \(w_i\) can take. By eliminating such a constant variable in advance, if necessary, we can assume \(l < u\) without loss of generality.

References

  1. Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008)

    Book  MATH  Google Scholar 

  2. Absil, P.-A., Malick, J.: Projection-like retractions on matrix manifolds. SIAM J. Optim. 22(1), 135–158 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  3. Adler, R.L., Dedieu, J.-P., Margulies, J.Y., Martens, M., Shub, M.: Newton’s method on Riemannian manifolds and a geometric model for the human spine. IMA J. Numer. Anal. 22(3), 359–390 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  4. Boumal, N., Mishra, B., Absil, P.-A., Sepulchre, R.: Manopt, a Matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15(1), 1455–1459 (2014)

    MATH  Google Scholar 

  5. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  6. Dong, S., Absil, P.-A., Gallivan, K.A.: Graph learning for regularized low-rank matrix completion. In: 23rd International Symposium on Mathematical Theory of Networks and Systems (MTNS 2018), pp. 460–467 (2018)

  7. Fukuda, E.H., Fukushima, M.: A note on the squared slack variables technique for nonlinear optimization. J. Oper. Res. Soc. Jpn. 60(3), 262–270 (2017)

    MathSciNet  MATH  Google Scholar 

  8. Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press, Boca Raton (2015)

    Book  MATH  Google Scholar 

  9. Huang, W., Gallivan, K.A., Absil, P.-A.: A Broyden class of quasi-Newton methods for Riemannian optimization. SIAM J. Optim. 25(3), 1660–1685 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  10. Huang, W., Absil, P.-A., Gallivan, K.A.: A Riemannian BFGS method without differentiated retraction for nonconvex optimization problems. SIAM J. Optim. 28(1), 470–495 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  11. Hurley, N., Rickard, S.: Comparing measures of sparsity. IEEE Trans. Inf. Theory 55(10), 4723–4741 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  12. Ivanov, A.: The Theory of Approximate Methods and Their Applications to the Numerical Solution of Singular Integral Equations, vol. 2. Springer, Berlin (1976)

    Google Scholar 

  13. Khuzani, M.B., Li, N.: Stochastic primal-dual method on Riemannian manifolds of bounded sectional curvature. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 133–140. IEEE (2017)

  14. Liu, C., Boumal, N.: Simple algorithms for optimization on Riemannian manifolds with constraints. Appl. Math. Optim. 82(3), 949–981 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  15. Nocedal, J., Wright, S.: Numerical Optimization, 2nd edn. Springer, Berlin (2006)

    MATH  Google Scholar 

  16. Oneto, L., Ridella, S., Anguita, D.: Tikhonov, Ivanov and Morozov regularization for support vector machine learning. Mach. Learn. 103(1), 103–136 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  17. Ring, W., Wirth, B.: Optimization methods on Riemannian manifolds and their application to shape space. SIAM J. Optim. 22(2), 596–627 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  18. Sakai, H., Iiduka, H.: Sufficient descent Riemannian conjugate gradient methods. J. Optim. Theory Appl. 190(1), 130–150 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  19. Sakai, H., Sato, H., Iiduka, H.: Global convergence of Hager–Zhang type Riemannian conjugate gradient method. Appl. Math. Comput. 441, 127685 (2023)

    MathSciNet  MATH  Google Scholar 

  20. Sato, H.: A Dai–Yuan-type Riemannian conjugate gradient method with the weak Wolfe conditions. Comput. Optim. Appl. 64(1), 101–118 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  21. Sato, H.: Riemannian Optimization and Its Applications. Springer Nature, Berlin (2021)

    Book  MATH  Google Scholar 

  22. Sato, H.: Riemannian conjugate gradient methods: general framework and specific algorithms with convergence analyses. SIAM J. Optim. 32(4), 2690–2717 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  23. Sato, H., Iwai, T.: A new, globally convergent Riemannian conjugate gradient method. Optimization 64(4), 1011–1031 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  24. Sato, H., Kasai, H., Bamdev, M.: Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport. SIAM J. Optim. 29(2), 1444–1472 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  25. Shub, M.: Some remarks on dynamical systems and numerical analysis. In: Dynamical Systems and Partial Differential Equations: Proceedings of VII ELAM, pp. 69–92 (1986)

  26. Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-posed Problems. V. H. Winston & Sons, Washington, DC (1977)

    MATH  Google Scholar 

  27. Tu, L.W.: An Introduction to Manifolds. Springer, New York (2010)

    Google Scholar 

  28. Zass, R., Shashua, A.: Nonnegative sparse PCA. Adv. Neural Inf. Process. Syst. 19, 1561–1568 (2007)

    Google Scholar 

  29. Zhou, P., Yuan, X.-T., Yan, S., Feng, J.: Faster first-order methods for stochastic non-convex optimization on Riemannian manifolds. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 459–472 (2019)

    Article  Google Scholar 

  30. Zhu, X., Sato, H.: Riemannian conjugate gradient methods with inverse retraction. Comput. Optim. Appl. 77(3), 779–810 (2020)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The author would like to thank the editor and anonymous referee for the constructive comments that helped improve the paper.

Funding

This work was supported by JSPS KAKENHI Grant Number JP20K14359.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroyuki Sato.

Ethics declarations

Conflict of interest

The author declares no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Proof of Proposition 10

Appendix A: Proof of Proposition 10

In this section, we provide a complete proof of Proposition 10 for self-containedness.

Proof of Proposition 10

First, we fix \(\lambda \ge 0\) and let \(w_*\) be an optimal solution to Problem (46). Then, for any \(w \in {\mathbb {R}}^n\), we have

$$\begin{aligned} L(w_*) + \lambda \Vert w_*\Vert _p \le L(w) + \lambda \Vert w\Vert _p. \end{aligned}$$
(A1)

We show that \(w_*\) is an optimal solution to Problem (47) with \(C :=\Vert w_*\Vert _p\). For any feasible solution \(w \in {\mathbb {R}}^n\) to Problem (47), we have \(\Vert w\Vert _p \le C = \Vert w_*\Vert _p\). Combining this and (A1), we have \(L(w_*) + \lambda \Vert w_*\Vert _p \le L(w) + \lambda \Vert w_*\Vert _p\), which means \(L(w_*) \le L(w)\). Furthermore, \(w_*\) is clearly a feasible solution to Problem (47) since \(\Vert w_*\Vert _p = C\). Therefore, \(w_*\) is an optimal solution to Problem (47).

Conversely, we fix \(C \ge 0\) and let \(w_*\) be an optimal solution to Problem (47). Here, we additionally consider the Lagrange dual problem of (47):

$$\begin{aligned}&\text{ maximize }\qquad \inf _{w \in {\mathbb {R}}^n} (L(w) + \mu (\Vert w\Vert _p - C))\nonumber \\ {}&\text{ subject } \text{ to }\qquad \mu \ge 0, \; \mu \in {\mathbb {R}}. \end{aligned}$$
(A2)

If \(C > 0\), then Slater’s condition for Problem (47), which is that there exists \(w \in {\mathbb {R}}^n\) with \(\Vert w\Vert _p < C\), clearly holds with \(w = 0\). If \(C = 0\), then the constraint \(\Vert w\Vert _p \le C\) in Problem (47) is rewritten as the equality constraint \(w = 0\), and Slater’s condition (which in this case is that a feasible solution exists) holds by taking \(w = 0\). In each case, Slater’s condition for Problem (47) holds. Furthermore, Problem (47) is a convex optimization problem. Therefore, it follows from Slater’s theorem [5, Section 5.2.3] that strong duality holds. Hence, the optimal value \(L(w_*)\) of Problem (47) and the optimal value of the dual problem (A2) coincide. Letting \(\mu _* \ge 0\) be an optimal solution to (A2), we have

$$\begin{aligned} L(w_*) = \inf _{w \in {\mathbb {R}}^n}(L(w) + \mu _*(\Vert w\Vert _p - C)). \end{aligned}$$

Since \(\Vert w_*\Vert _p \le C\) and \(\mu _* \ge 0\), we obtain \(L(w_*) \le L(w_*) + \mu _*(\Vert w_*\Vert _p - C) \le L(w_*)\). Thus, \(L(w_*) = L(w_*) + \mu _*(\Vert w_*\Vert _p - C)\) holds, and \(w = w_*\) attains the minimum value of \(L(w) + \mu _*(\Vert w\Vert _p - C)\) over all \(w \in {\mathbb {R}}^n\). Since \(\mu _* C\) is a constant, \(w = w_*\) also attains the minimum value of \(L(w) + \mu _*\Vert w\Vert _p\) over \({\mathbb {R}}^n\). This implies that \(w_*\) is an optimal solution to Problem (46) with \(\lambda = \mu _*\), thereby completing the proof. \(\square \)

Remark 8

Although we focus on the p-norm here, Proposition 10 can be further straightforwardly generalized to the case with a general norm in \({\mathbb {R}}^n\). Indeed, in the proof of Proposition 10, we do not need any specific property of the p-norm but properties of a general norm.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sato, H. Riemannian optimization on unit sphere with p-norm and its applications. Comput Optim Appl 85, 897–935 (2023). https://doi.org/10.1007/s10589-023-00477-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-023-00477-0

Keywords

Navigation