Riemannian optimization on unit sphere with p-norm and its applications

Sato, Hiroyuki

doi:10.1007/s10589-023-00477-0

Riemannian optimization on unit sphere with p-norm and its applications

Published: 02 May 2023

Volume 85, pages 897–935, (2023)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Hiroyuki Sato ORCID: orcid.org/0000-0003-1399-8140¹

275 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

This study deals with Riemannian optimization on the unit sphere in terms of p-norm with general $p> 1$. As a Riemannian submanifold of the Euclidean space, the geometry of the sphere with p-norm is investigated, and several geometric tools used for Riemannian optimization, such as retractions and vector transports, are proposed and analyzed. Applications to Riemannian optimization on the sphere with nonnegative constraints and $\textit{L}_{\textit{p}}$-regularization-related optimization are also discussed. As practical examples, the former includes nonnegative principal component analysis, and the latter is closely related to the Lasso regression and box-constrained problems. Numerical experiments verify that Riemannian optimization on the sphere with p-norm has substantial potential for such applications, and the proposed framework provides a theoretical basis for such optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Article 03 April 2024

A Brief Introduction to Manifold Optimization

Article Open access 04 April 2020

Clustering, coding, and the concept of similarity

Article 19 March 2024

Data availibility

The datasets generated during the current study are available from the corresponding author upon request.

Notes

We write $p \in [1, \infty ]$ to indicate that $p \in [1, \infty )$ or $p = \infty $ holds.
The statement can be rewritten as follows: for any positive integer k, $S^{n-1}_p$ is a $C^{2k-1}$ submanifold of ${\mathbb {R}}^n$ if $2k-1< p < 2k$, $C^\infty $ submanifold if $p = 2k$, and $C^{2k}$ submanifold if $2k < p \le 2k+1$.
Since the tangent bundle $T {\mathcal {M}}$ of a $C^r$ manifold is a $C^{r-1}$ manifold, we say that a map defined on $T{\mathcal {M}}$ is smooth if it is of class $C^{r-1}$.
As discussed before, if (21) holds for some $\alpha \in {\mathbb {R}}$, then $\alpha $ should be nonnegative.
The original Problem (47) has an advantage that it is a convex optimization problem; however, Problem (48) is not since the sphere $\{w \in {\mathbb {R}}^n \mid \Vert w\Vert _p = C\}$ is not a convex set. Nevertheless, Problem (48) can be solved by unconstrained Riemannian optimization methods. Although the projection of an unconstrained optimal solution, i.e., $w^{\textrm{unconst}} \in {\mathbb {R}}^n$ that minimizes L without constraint, onto $S^{n-1}_p$ is a naive feasible solution to (47), it may not be a good solution (see also the numerical experiments in Sects. 7.2.2 and 7.2.3). This is a motivation for exploring Problem (48) as an alternative to Problem (47).
When $n = 2$, we regard $\log _2 n / (\log _2 n - 1)$ as $\infty $. Indeed, $2^{1-1/p} - 1 \in (0, 1)$ holds for any $p > 1$.
When $n = 2$, we interpret this condition as $p \in (1, \infty )$.
If $l_i = u_i$ for some i, then $l_i$ is the only value that the corresponding $w_i$ can take. By eliminating such a constant variable in advance, if necessary, we can assume $l < u$ without loss of generality.

References

Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008)
Book MATH Google Scholar
Absil, P.-A., Malick, J.: Projection-like retractions on matrix manifolds. SIAM J. Optim. 22(1), 135–158 (2012)
Article MathSciNet MATH Google Scholar
Adler, R.L., Dedieu, J.-P., Margulies, J.Y., Martens, M., Shub, M.: Newton’s method on Riemannian manifolds and a geometric model for the human spine. IMA J. Numer. Anal. 22(3), 359–390 (2002)
Article MathSciNet MATH Google Scholar
Boumal, N., Mishra, B., Absil, P.-A., Sepulchre, R.: Manopt, a Matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15(1), 1455–1459 (2014)
MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Dong, S., Absil, P.-A., Gallivan, K.A.: Graph learning for regularized low-rank matrix completion. In: 23rd International Symposium on Mathematical Theory of Networks and Systems (MTNS 2018), pp. 460–467 (2018)
Fukuda, E.H., Fukushima, M.: A note on the squared slack variables technique for nonlinear optimization. J. Oper. Res. Soc. Jpn. 60(3), 262–270 (2017)
MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press, Boca Raton (2015)
Book MATH Google Scholar
Huang, W., Gallivan, K.A., Absil, P.-A.: A Broyden class of quasi-Newton methods for Riemannian optimization. SIAM J. Optim. 25(3), 1660–1685 (2015)
Article MathSciNet MATH Google Scholar
Huang, W., Absil, P.-A., Gallivan, K.A.: A Riemannian BFGS method without differentiated retraction for nonconvex optimization problems. SIAM J. Optim. 28(1), 470–495 (2018)
Article MathSciNet MATH Google Scholar
Hurley, N., Rickard, S.: Comparing measures of sparsity. IEEE Trans. Inf. Theory 55(10), 4723–4741 (2009)
Article MathSciNet MATH Google Scholar
Ivanov, A.: The Theory of Approximate Methods and Their Applications to the Numerical Solution of Singular Integral Equations, vol. 2. Springer, Berlin (1976)
Google Scholar
Khuzani, M.B., Li, N.: Stochastic primal-dual method on Riemannian manifolds of bounded sectional curvature. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 133–140. IEEE (2017)
Liu, C., Boumal, N.: Simple algorithms for optimization on Riemannian manifolds with constraints. Appl. Math. Optim. 82(3), 949–981 (2020)
Article MathSciNet MATH Google Scholar
Nocedal, J., Wright, S.: Numerical Optimization, 2nd edn. Springer, Berlin (2006)
MATH Google Scholar
Oneto, L., Ridella, S., Anguita, D.: Tikhonov, Ivanov and Morozov regularization for support vector machine learning. Mach. Learn. 103(1), 103–136 (2016)
Article MathSciNet MATH Google Scholar
Ring, W., Wirth, B.: Optimization methods on Riemannian manifolds and their application to shape space. SIAM J. Optim. 22(2), 596–627 (2012)
Article MathSciNet MATH Google Scholar
Sakai, H., Iiduka, H.: Sufficient descent Riemannian conjugate gradient methods. J. Optim. Theory Appl. 190(1), 130–150 (2021)
Article MathSciNet MATH Google Scholar
Sakai, H., Sato, H., Iiduka, H.: Global convergence of Hager–Zhang type Riemannian conjugate gradient method. Appl. Math. Comput. 441, 127685 (2023)
MathSciNet MATH Google Scholar
Sato, H.: A Dai–Yuan-type Riemannian conjugate gradient method with the weak Wolfe conditions. Comput. Optim. Appl. 64(1), 101–118 (2016)
Article MathSciNet MATH Google Scholar
Sato, H.: Riemannian Optimization and Its Applications. Springer Nature, Berlin (2021)
Book MATH Google Scholar
Sato, H.: Riemannian conjugate gradient methods: general framework and specific algorithms with convergence analyses. SIAM J. Optim. 32(4), 2690–2717 (2022)
Article MathSciNet MATH Google Scholar
Sato, H., Iwai, T.: A new, globally convergent Riemannian conjugate gradient method. Optimization 64(4), 1011–1031 (2015)
Article MathSciNet MATH Google Scholar
Sato, H., Kasai, H., Bamdev, M.: Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport. SIAM J. Optim. 29(2), 1444–1472 (2019)
Article MathSciNet MATH Google Scholar
Shub, M.: Some remarks on dynamical systems and numerical analysis. In: Dynamical Systems and Partial Differential Equations: Proceedings of VII ELAM, pp. 69–92 (1986)
Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-posed Problems. V. H. Winston & Sons, Washington, DC (1977)
MATH Google Scholar
Tu, L.W.: An Introduction to Manifolds. Springer, New York (2010)
Google Scholar
Zass, R., Shashua, A.: Nonnegative sparse PCA. Adv. Neural Inf. Process. Syst. 19, 1561–1568 (2007)
Google Scholar
Zhou, P., Yuan, X.-T., Yan, S., Feng, J.: Faster first-order methods for stochastic non-convex optimization on Riemannian manifolds. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 459–472 (2019)
Article Google Scholar
Zhu, X., Sato, H.: Riemannian conjugate gradient methods with inverse retraction. Comput. Optim. Appl. 77(3), 779–810 (2020)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The author would like to thank the editor and anonymous referee for the constructive comments that helped improve the paper.

Funding

This work was supported by JSPS KAKENHI Grant Number JP20K14359.

Author information

Authors and Affiliations

Department of Applied Mathematics and Physics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, 6068501, Japan
Hiroyuki Sato

Authors

Hiroyuki Sato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroyuki Sato.

Ethics declarations

Conflict of interest

The author declares no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Proof of Proposition 10

In this section, we provide a complete proof of Proposition 10 for self-containedness.

Proof of Proposition 10

First, we fix $\lambda \ge 0$ and let $w_*$ be an optimal solution to Problem (46). Then, for any $w \in {\mathbb {R}}^n$, we have

$$\begin{aligned} L(w_*) + \lambda \Vert w_*\Vert _p \le L(w) + \lambda \Vert w\Vert _p. \end{aligned}$$

(A1)

We show that $w_*$ is an optimal solution to Problem (47) with $C :=\Vert w_*\Vert _p$. For any feasible solution $w \in {\mathbb {R}}^n$ to Problem (47), we have $\Vert w\Vert _p \le C = \Vert w_*\Vert _p$. Combining this and (A1), we have $L(w_*) + \lambda \Vert w_*\Vert _p \le L(w) + \lambda \Vert w_*\Vert _p$, which means $L(w_*) \le L(w)$. Furthermore, $w_*$ is clearly a feasible solution to Problem (47) since $\Vert w_*\Vert _p = C$. Therefore, $w_*$ is an optimal solution to Problem (47).

Conversely, we fix $C \ge 0$ and let $w_*$ be an optimal solution to Problem (47). Here, we additionally consider the Lagrange dual problem of (47):

$$\begin{aligned}&\text{ maximize }\qquad \inf _{w \in {\mathbb {R}}^n} (L(w) + \mu (\Vert w\Vert _p - C))\nonumber \\ {}&\text{ subject } \text{ to }\qquad \mu \ge 0, \; \mu \in {\mathbb {R}}. \end{aligned}$$

(A2)

If $C > 0$, then Slater’s condition for Problem (47), which is that there exists $w \in {\mathbb {R}}^n$ with $\Vert w\Vert _p < C$, clearly holds with $w = 0$. If $C = 0$, then the constraint $\Vert w\Vert _p \le C$ in Problem (47) is rewritten as the equality constraint $w = 0$, and Slater’s condition (which in this case is that a feasible solution exists) holds by taking $w = 0$. In each case, Slater’s condition for Problem (47) holds. Furthermore, Problem (47) is a convex optimization problem. Therefore, it follows from Slater’s theorem [5, Section 5.2.3] that strong duality holds. Hence, the optimal value $L(w_*)$ of Problem (47) and the optimal value of the dual problem (A2) coincide. Letting $\mu _* \ge 0$ be an optimal solution to (A2), we have

$$\begin{aligned} L(w_*) = \inf _{w \in {\mathbb {R}}^n}(L(w) + \mu _*(\Vert w\Vert _p - C)). \end{aligned}$$

Since $\Vert w_*\Vert _p \le C$ and $\mu _* \ge 0$, we obtain $L(w_*) \le L(w_*) + \mu _*(\Vert w_*\Vert _p - C) \le L(w_*)$. Thus, $L(w_*) = L(w_*) + \mu _*(\Vert w_*\Vert _p - C)$ holds, and $w = w_*$ attains the minimum value of $L(w) + \mu _*(\Vert w\Vert _p - C)$ over all $w \in {\mathbb {R}}^n$. Since $\mu _* C$ is a constant, $w = w_*$ also attains the minimum value of $L(w) + \mu _*\Vert w\Vert _p$ over ${\mathbb {R}}^n$. This implies that $w_*$ is an optimal solution to Problem (46) with $\lambda = \mu _*$, thereby completing the proof. $\square $

Remark 8

Although we focus on the p-norm here, Proposition 10 can be further straightforwardly generalized to the case with a general norm in ${\mathbb {R}}^n$. Indeed, in the proof of Proposition 10, we do not need any specific property of the p-norm but properties of a general norm.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sato, H. Riemannian optimization on unit sphere with p-norm and its applications. Comput Optim Appl 85, 897–935 (2023). https://doi.org/10.1007/s10589-023-00477-0

Download citation

Received: 23 February 2022
Accepted: 13 March 2023
Published: 02 May 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s10589-023-00477-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Riemannian optimization on unit sphere with p-norm and its applications

Abstract

Access this article

Similar content being viewed by others

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

A Brief Introduction to Manifold Optimization

Clustering, coding, and the concept of similarity

Data availibility

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: Proof of Proposition 10

Proof of Proposition 10

Remark 8

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Riemannian optimization on unit sphere with p-norm and its applications

Abstract

Access this article

Similar content being viewed by others

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

A Brief Introduction to Manifold Optimization

Clustering, coding, and the concept of similarity

Data availibility

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: Proof of Proposition 10

Appendix A: Proof of Proposition 10

Proof of Proposition 10

Remark 8

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation