An exact penalty approach for optimization with nonnegative orthogonality constraints

Jiang, Bo; Meng, Xiang; Wen, Zaiwen; Chen, Xiaojun

doi:10.1007/s10107-022-01794-8

An exact penalty approach for optimization with nonnegative orthogonality constraints

Full Length Paper
Series A
Published: 25 March 2022

Volume 198, pages 855–897, (2023)
Cite this article

Mathematical Programming Submit manuscript

1319 Accesses
6 Citations
Explore all metrics

Abstract

Optimization with nonnegative orthogonality constraints has wide applications in machine learning and data sciences. It is NP-hard due to some combinatorial properties of the constraints. We first propose an equivalent optimization formulation with nonnegative and multiple spherical constraints and an additional single nonlinear constraint. Various constraint qualifications, the first- and second-order optimality conditions of the equivalent formulation are discussed. By establishing a local error bound of the feasible set, we design a class of (smooth) exact penalty models via keeping the nonnegative and multiple spherical constraints. The penalty models are exact if the penalty parameter is sufficiently large but finite. A practical penalty algorithm with postprocessing is then developed to approximately solve a series of subproblems with nonnegative and multiple spherical constraints. We study the asymptotic convergence and establish that any limit point is a weakly stationary point of the original problem and becomes a stationary point under some additional mild conditions. Extensive numerical results on the problem of computing the orthogonal projection onto nonnegative orthogonality constraints, the orthogonal nonnegative matrix factorization problems and the K-indicators model show the effectiveness of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Relaxed Interior Point Method for Low-Rank Semidefinite Programming Problems with Applications to Matrix Completion

Article Open access 11 October 2021

Inexact penalty decomposition methods for optimization problems with geometric constraints

Article Open access 22 March 2023

On the Curse of Dimensionality in the Ritz Method

Article 03 September 2015

References

Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)
MATH Google Scholar
Andreani, R., Haeser, G., Secchin, L.D., Silva, P.J.: New sequential optimality conditions for mathematical programs with complementarity constraints and algorithmic consequences. SIAM J. Optim. 29(4), 3201–3230 (2019)
MathSciNet MATH Google Scholar
Andreani, R., Martínez, J.M., Ramos, A., Silva, P.J.: A cone-continuity constraint qualification and algorithmic consequences. SIAM J. Optim. 26(1), 96–110 (2016)
MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1), 91–129 (2013)
MathSciNet MATH Google Scholar
Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)
MathSciNet MATH Google Scholar
Bergmann, R., Herzog, R.: Intrinsic formulation of KKT conditions and constraint qualifications on smooth manifolds. SIAM J. Optim. 29(4), 2423–2444 (2019)
MathSciNet MATH Google Scholar
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Belmont (1996)
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
MATH Google Scholar
Bioucasdias, J.M., Plaza, A., Dobigeon, N., Parente, M., Du, Q., Gader, P., Chanussot, J.: Hyperspectral unmixing overview: geometrical, statistical, and sparse regression-based approaches. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 5(2), 354–379 (2012)
Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
MathSciNet MATH Google Scholar
Boumal, N.: An introduction to optimization on smooth manifolds. Available online, Aug (2020)
Boumal, N., Absil, P.A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds. IMA J. Numer. Anal. 39(1), 1–33 (2019)
MathSciNet MATH Google Scholar
Boutsidis, C., Drineas, P., Mahoney, M.W.: Unsupervised feature selection for the $k$-means clustering problem. In: NeurIPS, pp. 153–161 (2009)
Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for nonnegative matrix factorization. Pattern Recogn. 41(4), 1350–1362 (2008)
MATH Google Scholar
Byrd, R.H., Lopez-Calva, G., Nocedal, J.: A line search exact penalty method using steering rules. Math. Program. 133(1–2), 39–73 (2012)
MathSciNet MATH Google Scholar
Cai, D., Mei, Q., Han, J., Zhai, C.: Modeling hidden topics on document manifold. In: Proceedings of the 17th ACM CIKM, pp. 911–920. ACM (2008)
Carson, T., Mixon, D.G., Villar, S.: Manifold optimization for k-means clustering. In: SampTA, pp. 73–77. IEEE (2017)
Chang, K.C., Pearson, K., Zhang, T.: Perron-Frobenius theorem for nonnegative tensors. Commun. Math. Sci. 6(2), 507–520 (2008)
MathSciNet MATH Google Scholar
Chen, F., Yang, Y., Xu, L., Zhang, T., Zhang, Y.: Big-data clustering. K-means or k-indicators? arXiv:1906.00938 (2019)
Chen, X., Lu, Z., Pong, T.K.: Penalty methods for a class of non-Lipschitz optimization problems. SIAM J. Optim. 26(3), 1465–1492 (2016)
MathSciNet MATH Google Scholar
Di Pillo, G: Exact penalty methods. In: Spedicato, E. (ed.) Algorithms for Continuous Optimization: The State of the Art. Springer Netherlands, Dordrecht, pp. 209–253 (1994). https://doi.org/10.1007/978-94-009-0369-2_8
Di Pillo, G., Grippo, L.: A continuously differentiable exact penalty function for nonlinear programming problems with inequality constraints. SIAM J. Control Optim. 23(1), 72–84 (1985)
MathSciNet MATH Google Scholar
Di Pillo, G., Grippo, L.: An exact penalty function method with global convergence properties for nonlinear programming problems. Math. Program. 36(1), 1–18 (1986)
MathSciNet MATH Google Scholar
Di Pillo, G., Lucidi, S.: An augmented Lagrangian function with improved exactness properties. SIAM J. Optim. 12(2), 376–406 (2002)
MathSciNet MATH Google Scholar
Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD, pp. 126–135. ACM (2006)
Estrin, R., Friedlander, M.P., Orban, D., Saunders, M.A.: Implementing a smooth exact penalty function for general constrained nonlinear optimization. SIAM J. Sci. Comput. 42(3), A1836–A1859 (2020)
MathSciNet MATH Google Scholar
Friedlander, M.P., Tseng, P.: Exact regularization of convex programs. SIAM J. Optim. 18(4), 1326–1350 (2008)
MathSciNet MATH Google Scholar
Gao, B., Liu, X., Yuan, Y.: Parallelizable algorithms for optimization problems with orthogonality constraints. SIAM J. Sci. Comput. 41(3), A1949–A1983 (2019)
MathSciNet MATH Google Scholar
Hiriart-Urruty, J.B., Seeger, A.: A variational approach to copositive matrices. SIAM Rev. 52(4), 593–629 (2010)
MathSciNet MATH Google Scholar
Hu, J., Jiang, B., Lin, L., Wen, Z., Yuan, Y.: Structured quasi-Newton methods for optimization with orthogonality constraints. SIAM J. Sci. Comput. 41(4), A2239–A2269 (2019)
MathSciNet MATH Google Scholar
Hu, J., Milzarek, A., Wen, Z., Yuan, Y.: Adaptive quadratically regularized Newton method for Riemannian optimization. SIAM J. Matrix Anal. Appl. 39(3), 1181–1207 (2018)
MathSciNet MATH Google Scholar
Jiang, B., Liu, Y.F., Wen, Z.: $l_p$-norm regularization algorithms for optimization over permutation matrices. SIAM J. Optim. 26(4), 2284–2313 (2016)
MathSciNet MATH Google Scholar
Jiang, B., Meng, X., Wen, Z., Chen, X.: An exact penalty approach for optimization with nonnegative orthogonality constraints. arXiv: 1907.12424v2 (2020)
Keshava, N., Mustard, J.F.: Spectral unmixing. IEEE Signal Process. Mag. 19(1), 44–57 (2002)
Google Scholar
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Kuang, D., Ding, C., Park, H.: Symmetric nonnegative matrix factorization for graph clustering. In: Proceedings of the 2012 SDM, pp. 106–117. SIAM (2012)
Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)
MathSciNet MATH Google Scholar
Li, B., Zhou, G., Cichocki, A.: Two efficient algorithms for approximately orthogonal nonnegative matrix factorization. IEEE Signal Process. Lett. 22(7), 843–846 (2015)
Google Scholar
Li, X., Sun, D., Toh, K.C.: On the efficient computation of a generalized Jacobian of the projector over the Birkhoff polytope. Math. Program. 179, 419–446 (2020)
MathSciNet MATH Google Scholar
Liu, C., Boumal, N.: Simple algorithms for optimization on Riemannian manifolds with constraints. Appl. Math. Opt. 82, 949–981 (2020)
MathSciNet MATH Google Scholar
Luo, D., Ding, C., Huang, H., Li, T.: Non-negative Laplacian embedding. In: 2009 Ninth ICDM, pp. 337–346. IEEE (2009)
Luo, Z.Q., Pang, J.S., Ralph, D.: Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge (1996)
MATH Google Scholar
Luo, Z.Q., Pang, J.S., Ralph, D., Wu, S.Q.: Exact penalization and stationarity conditions of mathematical programs with equilibrium constraints. Math. Program. 75(1), 19–76 (1996)
MathSciNet MATH Google Scholar
Luo, Z.Q., Sturm, J.F.: Error bounds for quadratic systems. In: Frenk, H., Roos, K., Terlaky, T., Zhang, S. (eds.) High Performance Optimization. Springer US, Boston, MA, pp. 383–404 (2000). https://doi.org/10.1007/978-1-4757-3216-0_16
Milzarek, A., Xiao, X., Cen, S., Wen, Z., Ulbrich, M.: A stochastic semismooth Newton method for nonsmooth nonconvex optimization. SIAM J. Optim. 29(4), 2916–2948 (2019)
MathSciNet MATH Google Scholar
Nene, S.A., Nayar, S.K., Murase, H.: Columbia object image library (coil-100) (1996)
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR’06, vol. 2, pp. 2161–2168. IEEE (2006)
Pan, J., Ng, M.K.: Orthogonal nonnegative matrix factorization by sparsity and nuclear norm optimization. SIAM J. Matrix Anal. Appl. 39(2), 856–875 (2018)
MathSciNet MATH Google Scholar
Pompili, F., Gillis, N., Absil, P.A., Glineur, F.: Two algorithms for orthogonal nonnegative matrix factorization with application to clustering. Neurocomputing 141, 15–25 (2014)
Google Scholar
Povh, J., Rendl, F.: A copositive programming approach to graph partitioning. SIAM J. Optim. 18(1), 223–241 (2007)
MathSciNet MATH Google Scholar
Sieranoja, S., Fränti, P.: Fast and general density peaks clustering. Pattern Recogn. Lett. 128, 551–558 (2019)
Google Scholar
Sun, W., Yuan, Y.: Optimization Theory and Methods: Nonlinear Programming, vol. 1. Springer Science & Business Media, New York (2006)
Wang, S., Chang, T.H., Cui, Y., Pang, J.S.: Clustering by orthogonal non-negative matrix factorization: a sequential non-convex penalty approach. In: ICASSP, pp. 5576–5580 (2019)
Wang, S., Chang, T.H., Cui, Y., Pang, J.S.: Clustering by orthogonal NMF model and non-convex penalty optimization. IEEE Trans. Signal Process. 69, 5273–5288 (2021)
MathSciNet MATH Google Scholar
Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1), 397–434 (2013)
MathSciNet MATH Google Scholar
Xiao, X., Li, Y., Wen, Z., Zhang, L.: A regularized semi-smooth Newton method with projection steps for composite convex programs. J. Sci. Comput. 76, 364–389 (2016)
MathSciNet MATH Google Scholar
Yang, L.: Proximal gradient method with extrapolation and line search for a class of nonconvex and nonsmooth problems. arXiv:1711.06831 (2017)
Yang, W.H., Zhang, L.H., Song, R.: Optimality conditions for the nonlinear programming problems on Riemannian manifolds. Pac. J. Optim. 10(2), 415–434 (2014)
MathSciNet MATH Google Scholar
Yang, Y., Yang, Y., Shen, H.T., Zhang, Y., Du, X., Zhou, X.: Discriminative nonnegative spectral clustering with out-of-sample extension. IEEE Trans. Knowl. Data Eng. 25(8), 1760–1771 (2012)
Google Scholar
Yang, Z., Oja, E.: Linear and nonlinear projective nonnegative matrix factorization. IEEE Trans. Neural Netw. 21(5), 734–749 (2010)
Google Scholar
Yoo, J., Choi, S.: Orthogonal nonnegative matrix factorization: multiplicative updates on Stiefel manifolds. In: IDEAL, pp. 140–147. Springer (2008)
Zass, R., Shashua, A.: Nonnegative sparse PCA. In: NeurIPS, pp. 1561–1568 (2007)
Zhang, H., Hager, W.W.: A nonmonotone line search technique and its application to unconstrained optimization. SIAM J. Optim. 14(4), 1043–1056 (2004)
MathSciNet MATH Google Scholar
Zhang, J., Liu, H., Wen, Z., Zhang, S.: A sparse completely positive relaxation of the modularity maximization for community detection. SIAM J. Sci. Comput. 40(5), A3091–A3120 (2018)
MathSciNet MATH Google Scholar
Zhang, K., Zhang, S., Liu, J., Wang, J., Zhang, J.: Greedy orthogonal pivoting algorithm for non-negative matrix factorization. In: ICML, pp. 7493–7501. PMLR (2019)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Disc. 1(2), 141–182 (1997)
Google Scholar
Zhu, F., Wang, Y., Fan, B., Xiang, S., Meng, G., Pan, C.: Spectral unmixing via data-guided sparsity. IEEE Trans. Image Process. 23(12), 5412–5427 (2014)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors are grateful to the Co-Editor Dr. Adrian Lewis, the Associate Editor and the anonymous reviewers for their valuable comments and suggestions that helped to improve the quality of our manuscript. The work of B. Jiang was supported by the Young Elite Scientists Sponsorship Program by CAST (2017QNRC001), the NSFC grants 11971239 and 11671036. The work of Z. Wen was supported by the NSFC grant 11831002. The work of X. Chen was supported by the Hong Kong Research Grant Council PolyU153001/18P.

Author information

Authors and Affiliations

Key Laboratory for NSLSCS of Jiangsu Province, School of Mathematical Sciences, Nanjing Normal University, Nanjing, China
Bo Jiang
Operations Research Center, Massachusetts Institute of Technology, Cambridge, USA
Xiang Meng
Beijing International Center for Mathematical Research, College of Engineering and International Center for Machine Learning Research, Peking University, Beijing, China
Zaiwen Wen
Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
Xiaojun Chen

Authors

Bo Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Meng
View author publications
You can also search for this author in PubMed Google Scholar
Zaiwen Wen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zaiwen Wen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Construction of problem (4.1) with unique solution

Proposition 1

Choose $X^* \in {\mathcal {S}}^{n,k}_+$ and $L \in {\mathbb {R}}^{k \times k}$ with positive diagonal elements satisfying $L_{ii}L_{jj} > \max \{L_{ij}, L_{ji}, 0\}^2\ \forall i, j \in [k], i \ne j.$ Then the optimal solution of (4.1) with $C = X^*L^{\top }$ is unique and equals to $X^*$.

Proof

For simplicity of notation, we use $\sum _i$ to denote $\sum _{i \in [k]}$ in the proof. Since problem (4.1) is equivalent to $\max _{X \in {\mathcal {S}}^{n,k}_+} \, \langle C, X \rangle $, we only need to show that $\langle C,Y\rangle <\langle C,X^*\rangle =\sum _{i}L_{ii}$, $\forall \ {\mathcal {S}}^{n,k}_+\ni Y \ne X^*$. Let $Z = \mathsf {sgn}(Y)$ and $P = \varPi _{{\mathbb {R}}_+^n}(L)$. We have

$$\begin{aligned} \langle C,Y \rangle = \mathrm {tr}(L(X^*)^{\top }Y) = \sum \nolimits _{i} \sum \nolimits _j L_{ji} \mathbf {y}_i^{\top } \mathbf {x}^*_j \le \sum \nolimits _{i} \sum \nolimits _j P_{ji}\mathbf {y}_i^T(\mathbf {x}^*_j\circ \mathbf {z}_i).\nonumber \\ \end{aligned}$$

(A. 1)

Define $ w_{ji} = \Vert \mathbf {x}_j^* \circ \mathbf {z}_i\Vert ^2$. With $X^* \in {\mathcal {S}}^{n,k}_+$, we have $\Vert \sum _{j} P_{ji} (\mathbf {x}_j^* \circ \mathbf {z}_i)\Vert $ = $(\sum _{i} P_{ji}^2 w_{ji})^{1/2}$. Using the Cauchy-Schwarz inequality, $\Vert \mathbf {y}_i\Vert =1$ and the requirements on L, we have

$$\begin{aligned} \sum \nolimits _{j} P_{ji}\mathbf {y}_i^T(\mathbf {x}^*_j\circ \mathbf {z}_i) \le \Big (\sum \nolimits _{j} P^2_{ji}w_{ji}\Big )^{\frac{1}{2}} \le P_{ii} \Big (\sum \nolimits _{j} \frac{P_{jj}}{P_{ii}}w_{ji}\Big )^{\frac{1}{2}}. \end{aligned}$$

(A. 2)

With (A. 1) and $\langle C, X^*\rangle = \sum _{i}L_{ii} = \sum _{i} P_{ii}$, we further have

$$\begin{aligned} \langle C, Y \rangle \le \sum \nolimits _{i} P_{ii} \Big (\sum \nolimits _{j} \frac{P_{jj}}{P_{ii}}w_{ji}\Big )^{\frac{1}{2}} {\le } \Big (\sum \nolimits _{i} P_{ii} \Big )^{\frac{1}{2}} \Big ( \sum \nolimits _{i} \sum \nolimits _{j} P_{jj} w_{ji}\Big )^{\frac{1}{2}} {\le } \langle C, X^* \rangle ,\nonumber \\ \end{aligned}$$

(A. 3)

where the second inequality uses the fact that $\sum _{i} a_i x_i^{1/2} \le (\sum _{i} a_i)^{1/2}(\sum _{i} a_i x_i)^{1/2}$ for $a_i>0$ and $x_i\ge 0$, and the third inequality uses $\sum _{i} w_{ji} \le 1$. Obviously, the equalities in (A. 2) and (A. 3) hold if and only if $Y=X^*$. The proof is completed. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, B., Meng, X., Wen, Z. et al. An exact penalty approach for optimization with nonnegative orthogonality constraints. Math. Program. 198, 855–897 (2023). https://doi.org/10.1007/s10107-022-01794-8

Download citation

Received: 17 January 2021
Accepted: 23 February 2022
Published: 25 March 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10107-022-01794-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An exact penalty approach for optimization with nonnegative orthogonality constraints

Abstract

Access this article

Similar content being viewed by others

A Relaxed Interior Point Method for Low-Rank Semidefinite Programming Problems with Applications to Matrix Completion

Inexact penalty decomposition methods for optimization problems with geometric constraints

On the Curse of Dimensionality in the Ritz Method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A Construction of problem (4.1) with unique solution

Proposition 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

An exact penalty approach for optimization with nonnegative orthogonality constraints

Abstract

Access this article

Similar content being viewed by others

A Relaxed Interior Point Method for Low-Rank Semidefinite Programming Problems with Applications to Matrix Completion

Inexact penalty decomposition methods for optimization problems with geometric constraints

On the Curse of Dimensionality in the Ritz Method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A Construction of problem (4.1) with unique solution

A Construction of problem (4.1) with unique solution

Proposition 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation