Skip to main content
Log in

An exact penalty approach for optimization with nonnegative orthogonality constraints

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

Optimization with nonnegative orthogonality constraints has wide applications in machine learning and data sciences. It is NP-hard due to some combinatorial properties of the constraints. We first propose an equivalent optimization formulation with nonnegative and multiple spherical constraints and an additional single nonlinear constraint. Various constraint qualifications, the first- and second-order optimality conditions of the equivalent formulation are discussed. By establishing a local error bound of the feasible set, we design a class of (smooth) exact penalty models via keeping the nonnegative and multiple spherical constraints. The penalty models are exact if the penalty parameter is sufficiently large but finite. A practical penalty algorithm with postprocessing is then developed to approximately solve a series of subproblems with nonnegative and multiple spherical constraints. We study the asymptotic convergence and establish that any limit point is a weakly stationary point of the original problem and becomes a stationary point under some additional mild conditions. Extensive numerical results on the problem of computing the orthogonal projection onto nonnegative orthogonality constraints, the orthogonal nonnegative matrix factorization problems and the K-indicators model show the effectiveness of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)

    MATH  Google Scholar 

  2. Andreani, R., Haeser, G., Secchin, L.D., Silva, P.J.: New sequential optimality conditions for mathematical programs with complementarity constraints and algorithmic consequences. SIAM J. Optim. 29(4), 3201–3230 (2019)

    MathSciNet  MATH  Google Scholar 

  3. Andreani, R., Martínez, J.M., Ramos, A., Silva, P.J.: A cone-continuity constraint qualification and algorithmic consequences. SIAM J. Optim. 26(1), 96–110 (2016)

    MathSciNet  MATH  Google Scholar 

  4. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    MathSciNet  MATH  Google Scholar 

  5. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1), 91–129 (2013)

    MathSciNet  MATH  Google Scholar 

  6. Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)

    MathSciNet  MATH  Google Scholar 

  7. Bergmann, R., Herzog, R.: Intrinsic formulation of KKT conditions and constraint qualifications on smooth manifolds. SIAM J. Optim. 29(4), 2423–2444 (2019)

    MathSciNet  MATH  Google Scholar 

  8. Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Belmont (1996)

  9. Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

  10. Bioucasdias, J.M., Plaza, A., Dobigeon, N., Parente, M., Du, Q., Gader, P., Chanussot, J.: Hyperspectral unmixing overview: geometrical, statistical, and sparse regression-based approaches. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 5(2), 354–379 (2012)

    Google Scholar 

  11. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)

    MathSciNet  MATH  Google Scholar 

  12. Boumal, N.: An introduction to optimization on smooth manifolds. Available online, Aug (2020)

  13. Boumal, N., Absil, P.A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds. IMA J. Numer. Anal. 39(1), 1–33 (2019)

    MathSciNet  MATH  Google Scholar 

  14. Boutsidis, C., Drineas, P., Mahoney, M.W.: Unsupervised feature selection for the \(k\)-means clustering problem. In: NeurIPS, pp. 153–161 (2009)

  15. Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for nonnegative matrix factorization. Pattern Recogn. 41(4), 1350–1362 (2008)

    MATH  Google Scholar 

  16. Byrd, R.H., Lopez-Calva, G., Nocedal, J.: A line search exact penalty method using steering rules. Math. Program. 133(1–2), 39–73 (2012)

    MathSciNet  MATH  Google Scholar 

  17. Cai, D., Mei, Q., Han, J., Zhai, C.: Modeling hidden topics on document manifold. In: Proceedings of the 17th ACM CIKM, pp. 911–920. ACM (2008)

  18. Carson, T., Mixon, D.G., Villar, S.: Manifold optimization for k-means clustering. In: SampTA, pp. 73–77. IEEE (2017)

  19. Chang, K.C., Pearson, K., Zhang, T.: Perron-Frobenius theorem for nonnegative tensors. Commun. Math. Sci. 6(2), 507–520 (2008)

    MathSciNet  MATH  Google Scholar 

  20. Chen, F., Yang, Y., Xu, L., Zhang, T., Zhang, Y.: Big-data clustering. K-means or k-indicators? arXiv:1906.00938 (2019)

  21. Chen, X., Lu, Z., Pong, T.K.: Penalty methods for a class of non-Lipschitz optimization problems. SIAM J. Optim. 26(3), 1465–1492 (2016)

    MathSciNet  MATH  Google Scholar 

  22. Di Pillo, G: Exact penalty methods. In: Spedicato, E. (ed.) Algorithms for Continuous Optimization: The State of the Art. Springer Netherlands, Dordrecht, pp. 209–253 (1994). https://doi.org/10.1007/978-94-009-0369-2_8

  23. Di Pillo, G., Grippo, L.: A continuously differentiable exact penalty function for nonlinear programming problems with inequality constraints. SIAM J. Control Optim. 23(1), 72–84 (1985)

    MathSciNet  MATH  Google Scholar 

  24. Di Pillo, G., Grippo, L.: An exact penalty function method with global convergence properties for nonlinear programming problems. Math. Program. 36(1), 1–18 (1986)

    MathSciNet  MATH  Google Scholar 

  25. Di Pillo, G., Lucidi, S.: An augmented Lagrangian function with improved exactness properties. SIAM J. Optim. 12(2), 376–406 (2002)

    MathSciNet  MATH  Google Scholar 

  26. Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD, pp. 126–135. ACM (2006)

  27. Estrin, R., Friedlander, M.P., Orban, D., Saunders, M.A.: Implementing a smooth exact penalty function for general constrained nonlinear optimization. SIAM J. Sci. Comput. 42(3), A1836–A1859 (2020)

    MathSciNet  MATH  Google Scholar 

  28. Friedlander, M.P., Tseng, P.: Exact regularization of convex programs. SIAM J. Optim. 18(4), 1326–1350 (2008)

    MathSciNet  MATH  Google Scholar 

  29. Gao, B., Liu, X., Yuan, Y.: Parallelizable algorithms for optimization problems with orthogonality constraints. SIAM J. Sci. Comput. 41(3), A1949–A1983 (2019)

    MathSciNet  MATH  Google Scholar 

  30. Hiriart-Urruty, J.B., Seeger, A.: A variational approach to copositive matrices. SIAM Rev. 52(4), 593–629 (2010)

    MathSciNet  MATH  Google Scholar 

  31. Hu, J., Jiang, B., Lin, L., Wen, Z., Yuan, Y.: Structured quasi-Newton methods for optimization with orthogonality constraints. SIAM J. Sci. Comput. 41(4), A2239–A2269 (2019)

    MathSciNet  MATH  Google Scholar 

  32. Hu, J., Milzarek, A., Wen, Z., Yuan, Y.: Adaptive quadratically regularized Newton method for Riemannian optimization. SIAM J. Matrix Anal. Appl. 39(3), 1181–1207 (2018)

    MathSciNet  MATH  Google Scholar 

  33. Jiang, B., Liu, Y.F., Wen, Z.: \(l_p\)-norm regularization algorithms for optimization over permutation matrices. SIAM J. Optim. 26(4), 2284–2313 (2016)

    MathSciNet  MATH  Google Scholar 

  34. Jiang, B., Meng, X., Wen, Z., Chen, X.: An exact penalty approach for optimization with nonnegative orthogonality constraints. arXiv: 1907.12424v2 (2020)

  35. Keshava, N., Mustard, J.F.: Spectral unmixing. IEEE Signal Process. Mag. 19(1), 44–57 (2002)

    Google Scholar 

  36. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)

  37. Kuang, D., Ding, C., Park, H.: Symmetric nonnegative matrix factorization for graph clustering. In: Proceedings of the 2012 SDM, pp. 106–117. SIAM (2012)

  38. Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)

    MathSciNet  MATH  Google Scholar 

  39. Li, B., Zhou, G., Cichocki, A.: Two efficient algorithms for approximately orthogonal nonnegative matrix factorization. IEEE Signal Process. Lett. 22(7), 843–846 (2015)

    Google Scholar 

  40. Li, X., Sun, D., Toh, K.C.: On the efficient computation of a generalized Jacobian of the projector over the Birkhoff polytope. Math. Program. 179, 419–446 (2020)

    MathSciNet  MATH  Google Scholar 

  41. Liu, C., Boumal, N.: Simple algorithms for optimization on Riemannian manifolds with constraints. Appl. Math. Opt. 82, 949–981 (2020)

    MathSciNet  MATH  Google Scholar 

  42. Luo, D., Ding, C., Huang, H., Li, T.: Non-negative Laplacian embedding. In: 2009 Ninth ICDM, pp. 337–346. IEEE (2009)

  43. Luo, Z.Q., Pang, J.S., Ralph, D.: Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge (1996)

    MATH  Google Scholar 

  44. Luo, Z.Q., Pang, J.S., Ralph, D., Wu, S.Q.: Exact penalization and stationarity conditions of mathematical programs with equilibrium constraints. Math. Program. 75(1), 19–76 (1996)

    MathSciNet  MATH  Google Scholar 

  45. Luo, Z.Q., Sturm, J.F.: Error bounds for quadratic systems. In: Frenk, H., Roos, K., Terlaky, T., Zhang, S. (eds.) High Performance Optimization. Springer US, Boston, MA, pp. 383–404 (2000). https://doi.org/10.1007/978-1-4757-3216-0_16

  46. Milzarek, A., Xiao, X., Cen, S., Wen, Z., Ulbrich, M.: A stochastic semismooth Newton method for nonsmooth nonconvex optimization. SIAM J. Optim. 29(4), 2916–2948 (2019)

    MathSciNet  MATH  Google Scholar 

  47. Nene, S.A., Nayar, S.K., Murase, H.: Columbia object image library (coil-100) (1996)

  48. Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR’06, vol. 2, pp. 2161–2168. IEEE (2006)

  49. Pan, J., Ng, M.K.: Orthogonal nonnegative matrix factorization by sparsity and nuclear norm optimization. SIAM J. Matrix Anal. Appl. 39(2), 856–875 (2018)

    MathSciNet  MATH  Google Scholar 

  50. Pompili, F., Gillis, N., Absil, P.A., Glineur, F.: Two algorithms for orthogonal nonnegative matrix factorization with application to clustering. Neurocomputing 141, 15–25 (2014)

    Google Scholar 

  51. Povh, J., Rendl, F.: A copositive programming approach to graph partitioning. SIAM J. Optim. 18(1), 223–241 (2007)

    MathSciNet  MATH  Google Scholar 

  52. Sieranoja, S., Fränti, P.: Fast and general density peaks clustering. Pattern Recogn. Lett. 128, 551–558 (2019)

    Google Scholar 

  53. Sun, W., Yuan, Y.: Optimization Theory and Methods: Nonlinear Programming, vol. 1. Springer Science & Business Media, New York (2006)

  54. Wang, S., Chang, T.H., Cui, Y., Pang, J.S.: Clustering by orthogonal non-negative matrix factorization: a sequential non-convex penalty approach. In: ICASSP, pp. 5576–5580 (2019)

  55. Wang, S., Chang, T.H., Cui, Y., Pang, J.S.: Clustering by orthogonal NMF model and non-convex penalty optimization. IEEE Trans. Signal Process. 69, 5273–5288 (2021)

    MathSciNet  MATH  Google Scholar 

  56. Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1), 397–434 (2013)

    MathSciNet  MATH  Google Scholar 

  57. Xiao, X., Li, Y., Wen, Z., Zhang, L.: A regularized semi-smooth Newton method with projection steps for composite convex programs. J. Sci. Comput. 76, 364–389 (2016)

    MathSciNet  MATH  Google Scholar 

  58. Yang, L.: Proximal gradient method with extrapolation and line search for a class of nonconvex and nonsmooth problems. arXiv:1711.06831 (2017)

  59. Yang, W.H., Zhang, L.H., Song, R.: Optimality conditions for the nonlinear programming problems on Riemannian manifolds. Pac. J. Optim. 10(2), 415–434 (2014)

    MathSciNet  MATH  Google Scholar 

  60. Yang, Y., Yang, Y., Shen, H.T., Zhang, Y., Du, X., Zhou, X.: Discriminative nonnegative spectral clustering with out-of-sample extension. IEEE Trans. Knowl. Data Eng. 25(8), 1760–1771 (2012)

    Google Scholar 

  61. Yang, Z., Oja, E.: Linear and nonlinear projective nonnegative matrix factorization. IEEE Trans. Neural Netw. 21(5), 734–749 (2010)

    Google Scholar 

  62. Yoo, J., Choi, S.: Orthogonal nonnegative matrix factorization: multiplicative updates on Stiefel manifolds. In: IDEAL, pp. 140–147. Springer (2008)

  63. Zass, R., Shashua, A.: Nonnegative sparse PCA. In: NeurIPS, pp. 1561–1568 (2007)

  64. Zhang, H., Hager, W.W.: A nonmonotone line search technique and its application to unconstrained optimization. SIAM J. Optim. 14(4), 1043–1056 (2004)

    MathSciNet  MATH  Google Scholar 

  65. Zhang, J., Liu, H., Wen, Z., Zhang, S.: A sparse completely positive relaxation of the modularity maximization for community detection. SIAM J. Sci. Comput. 40(5), A3091–A3120 (2018)

    MathSciNet  MATH  Google Scholar 

  66. Zhang, K., Zhang, S., Liu, J., Wang, J., Zhang, J.: Greedy orthogonal pivoting algorithm for non-negative matrix factorization. In: ICML, pp. 7493–7501. PMLR (2019)

  67. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Disc. 1(2), 141–182 (1997)

    Google Scholar 

  68. Zhu, F., Wang, Y., Fan, B., Xiang, S., Meng, G., Pan, C.: Spectral unmixing via data-guided sparsity. IEEE Trans. Image Process. 23(12), 5412–5427 (2014)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the Co-Editor Dr. Adrian Lewis, the Associate Editor and the anonymous reviewers for their valuable comments and suggestions that helped to improve the quality of our manuscript. The work of B. Jiang was supported by the Young Elite Scientists Sponsorship Program by CAST (2017QNRC001), the NSFC grants 11971239 and 11671036. The work of Z. Wen was supported by the NSFC grant 11831002. The work of X. Chen was supported by the Hong Kong Research Grant Council PolyU153001/18P.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zaiwen Wen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Construction of problem (4.1) with unique solution

A Construction of problem (4.1) with unique solution

Proposition 1

Choose \(X^* \in {\mathcal {S}}^{n,k}_+\) and \(L \in {\mathbb {R}}^{k \times k}\) with positive diagonal elements satisfying \(L_{ii}L_{jj} > \max \{L_{ij}, L_{ji}, 0\}^2\ \forall i, j \in [k], i \ne j.\) Then the optimal solution of (4.1) with \(C = X^*L^{\top }\) is unique and equals to \(X^*\).

Proof

For simplicity of notation, we use \(\sum _i\) to denote \(\sum _{i \in [k]}\) in the proof. Since problem (4.1) is equivalent to \(\max _{X \in {\mathcal {S}}^{n,k}_+} \, \langle C, X \rangle \), we only need to show that \(\langle C,Y\rangle <\langle C,X^*\rangle =\sum _{i}L_{ii}\), \(\forall \ {\mathcal {S}}^{n,k}_+\ni Y \ne X^*\). Let \(Z = \mathsf {sgn}(Y)\) and \(P = \varPi _{{\mathbb {R}}_+^n}(L)\). We have

$$\begin{aligned} \langle C,Y \rangle = \mathrm {tr}(L(X^*)^{\top }Y) = \sum \nolimits _{i} \sum \nolimits _j L_{ji} \mathbf {y}_i^{\top } \mathbf {x}^*_j \le \sum \nolimits _{i} \sum \nolimits _j P_{ji}\mathbf {y}_i^T(\mathbf {x}^*_j\circ \mathbf {z}_i).\nonumber \\ \end{aligned}$$
(A. 1)

Define \( w_{ji} = \Vert \mathbf {x}_j^* \circ \mathbf {z}_i\Vert ^2\). With \(X^* \in {\mathcal {S}}^{n,k}_+\), we have \(\Vert \sum _{j} P_{ji} (\mathbf {x}_j^* \circ \mathbf {z}_i)\Vert \) = \((\sum _{i} P_{ji}^2 w_{ji})^{1/2}\). Using the Cauchy-Schwarz inequality, \(\Vert \mathbf {y}_i\Vert =1\) and the requirements on L, we have

$$\begin{aligned} \sum \nolimits _{j} P_{ji}\mathbf {y}_i^T(\mathbf {x}^*_j\circ \mathbf {z}_i) \le \Big (\sum \nolimits _{j} P^2_{ji}w_{ji}\Big )^{\frac{1}{2}} \le P_{ii} \Big (\sum \nolimits _{j} \frac{P_{jj}}{P_{ii}}w_{ji}\Big )^{\frac{1}{2}}. \end{aligned}$$
(A. 2)

With (A. 1) and \(\langle C, X^*\rangle = \sum _{i}L_{ii} = \sum _{i} P_{ii}\), we further have

$$\begin{aligned} \langle C, Y \rangle \le \sum \nolimits _{i} P_{ii} \Big (\sum \nolimits _{j} \frac{P_{jj}}{P_{ii}}w_{ji}\Big )^{\frac{1}{2}} {\le } \Big (\sum \nolimits _{i} P_{ii} \Big )^{\frac{1}{2}} \Big ( \sum \nolimits _{i} \sum \nolimits _{j} P_{jj} w_{ji}\Big )^{\frac{1}{2}} {\le } \langle C, X^* \rangle ,\nonumber \\ \end{aligned}$$
(A. 3)

where the second inequality uses the fact that \(\sum _{i} a_i x_i^{1/2} \le (\sum _{i} a_i)^{1/2}(\sum _{i} a_i x_i)^{1/2}\) for \(a_i>0\) and \(x_i\ge 0\), and the third inequality uses \(\sum _{i} w_{ji} \le 1\). Obviously, the equalities in (A. 2) and (A. 3) hold if and only if \(Y=X^*\). The proof is completed. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, B., Meng, X., Wen, Z. et al. An exact penalty approach for optimization with nonnegative orthogonality constraints. Math. Program. 198, 855–897 (2023). https://doi.org/10.1007/s10107-022-01794-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-022-01794-8

Keywords

Mathematics Subject Classification

Navigation