Abstract
Optimization with nonnegative orthogonality constraints has wide applications in machine learning and data sciences. It is NP-hard due to some combinatorial properties of the constraints. We first propose an equivalent optimization formulation with nonnegative and multiple spherical constraints and an additional single nonlinear constraint. Various constraint qualifications, the first- and second-order optimality conditions of the equivalent formulation are discussed. By establishing a local error bound of the feasible set, we design a class of (smooth) exact penalty models via keeping the nonnegative and multiple spherical constraints. The penalty models are exact if the penalty parameter is sufficiently large but finite. A practical penalty algorithm with postprocessing is then developed to approximately solve a series of subproblems with nonnegative and multiple spherical constraints. We study the asymptotic convergence and establish that any limit point is a weakly stationary point of the original problem and becomes a stationary point under some additional mild conditions. Extensive numerical results on the problem of computing the orthogonal projection onto nonnegative orthogonality constraints, the orthogonal nonnegative matrix factorization problems and the K-indicators model show the effectiveness of our proposed approach.
Similar content being viewed by others
References
Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)
Andreani, R., Haeser, G., Secchin, L.D., Silva, P.J.: New sequential optimality conditions for mathematical programs with complementarity constraints and algorithmic consequences. SIAM J. Optim. 29(4), 3201–3230 (2019)
Andreani, R., Martínez, J.M., Ramos, A., Silva, P.J.: A cone-continuity constraint qualification and algorithmic consequences. SIAM J. Optim. 26(1), 96–110 (2016)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1), 91–129 (2013)
Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)
Bergmann, R., Herzog, R.: Intrinsic formulation of KKT conditions and constraint qualifications on smooth manifolds. SIAM J. Optim. 29(4), 2423–2444 (2019)
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Belmont (1996)
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
Bioucasdias, J.M., Plaza, A., Dobigeon, N., Parente, M., Du, Q., Gader, P., Chanussot, J.: Hyperspectral unmixing overview: geometrical, statistical, and sparse regression-based approaches. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 5(2), 354–379 (2012)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Boumal, N.: An introduction to optimization on smooth manifolds. Available online, Aug (2020)
Boumal, N., Absil, P.A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds. IMA J. Numer. Anal. 39(1), 1–33 (2019)
Boutsidis, C., Drineas, P., Mahoney, M.W.: Unsupervised feature selection for the \(k\)-means clustering problem. In: NeurIPS, pp. 153–161 (2009)
Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for nonnegative matrix factorization. Pattern Recogn. 41(4), 1350–1362 (2008)
Byrd, R.H., Lopez-Calva, G., Nocedal, J.: A line search exact penalty method using steering rules. Math. Program. 133(1–2), 39–73 (2012)
Cai, D., Mei, Q., Han, J., Zhai, C.: Modeling hidden topics on document manifold. In: Proceedings of the 17th ACM CIKM, pp. 911–920. ACM (2008)
Carson, T., Mixon, D.G., Villar, S.: Manifold optimization for k-means clustering. In: SampTA, pp. 73–77. IEEE (2017)
Chang, K.C., Pearson, K., Zhang, T.: Perron-Frobenius theorem for nonnegative tensors. Commun. Math. Sci. 6(2), 507–520 (2008)
Chen, F., Yang, Y., Xu, L., Zhang, T., Zhang, Y.: Big-data clustering. K-means or k-indicators? arXiv:1906.00938 (2019)
Chen, X., Lu, Z., Pong, T.K.: Penalty methods for a class of non-Lipschitz optimization problems. SIAM J. Optim. 26(3), 1465–1492 (2016)
Di Pillo, G: Exact penalty methods. In: Spedicato, E. (ed.) Algorithms for Continuous Optimization: The State of the Art. Springer Netherlands, Dordrecht, pp. 209–253 (1994). https://doi.org/10.1007/978-94-009-0369-2_8
Di Pillo, G., Grippo, L.: A continuously differentiable exact penalty function for nonlinear programming problems with inequality constraints. SIAM J. Control Optim. 23(1), 72–84 (1985)
Di Pillo, G., Grippo, L.: An exact penalty function method with global convergence properties for nonlinear programming problems. Math. Program. 36(1), 1–18 (1986)
Di Pillo, G., Lucidi, S.: An augmented Lagrangian function with improved exactness properties. SIAM J. Optim. 12(2), 376–406 (2002)
Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD, pp. 126–135. ACM (2006)
Estrin, R., Friedlander, M.P., Orban, D., Saunders, M.A.: Implementing a smooth exact penalty function for general constrained nonlinear optimization. SIAM J. Sci. Comput. 42(3), A1836–A1859 (2020)
Friedlander, M.P., Tseng, P.: Exact regularization of convex programs. SIAM J. Optim. 18(4), 1326–1350 (2008)
Gao, B., Liu, X., Yuan, Y.: Parallelizable algorithms for optimization problems with orthogonality constraints. SIAM J. Sci. Comput. 41(3), A1949–A1983 (2019)
Hiriart-Urruty, J.B., Seeger, A.: A variational approach to copositive matrices. SIAM Rev. 52(4), 593–629 (2010)
Hu, J., Jiang, B., Lin, L., Wen, Z., Yuan, Y.: Structured quasi-Newton methods for optimization with orthogonality constraints. SIAM J. Sci. Comput. 41(4), A2239–A2269 (2019)
Hu, J., Milzarek, A., Wen, Z., Yuan, Y.: Adaptive quadratically regularized Newton method for Riemannian optimization. SIAM J. Matrix Anal. Appl. 39(3), 1181–1207 (2018)
Jiang, B., Liu, Y.F., Wen, Z.: \(l_p\)-norm regularization algorithms for optimization over permutation matrices. SIAM J. Optim. 26(4), 2284–2313 (2016)
Jiang, B., Meng, X., Wen, Z., Chen, X.: An exact penalty approach for optimization with nonnegative orthogonality constraints. arXiv: 1907.12424v2 (2020)
Keshava, N., Mustard, J.F.: Spectral unmixing. IEEE Signal Process. Mag. 19(1), 44–57 (2002)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Kuang, D., Ding, C., Park, H.: Symmetric nonnegative matrix factorization for graph clustering. In: Proceedings of the 2012 SDM, pp. 106–117. SIAM (2012)
Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)
Li, B., Zhou, G., Cichocki, A.: Two efficient algorithms for approximately orthogonal nonnegative matrix factorization. IEEE Signal Process. Lett. 22(7), 843–846 (2015)
Li, X., Sun, D., Toh, K.C.: On the efficient computation of a generalized Jacobian of the projector over the Birkhoff polytope. Math. Program. 179, 419–446 (2020)
Liu, C., Boumal, N.: Simple algorithms for optimization on Riemannian manifolds with constraints. Appl. Math. Opt. 82, 949–981 (2020)
Luo, D., Ding, C., Huang, H., Li, T.: Non-negative Laplacian embedding. In: 2009 Ninth ICDM, pp. 337–346. IEEE (2009)
Luo, Z.Q., Pang, J.S., Ralph, D.: Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge (1996)
Luo, Z.Q., Pang, J.S., Ralph, D., Wu, S.Q.: Exact penalization and stationarity conditions of mathematical programs with equilibrium constraints. Math. Program. 75(1), 19–76 (1996)
Luo, Z.Q., Sturm, J.F.: Error bounds for quadratic systems. In: Frenk, H., Roos, K., Terlaky, T., Zhang, S. (eds.) High Performance Optimization. Springer US, Boston, MA, pp. 383–404 (2000). https://doi.org/10.1007/978-1-4757-3216-0_16
Milzarek, A., Xiao, X., Cen, S., Wen, Z., Ulbrich, M.: A stochastic semismooth Newton method for nonsmooth nonconvex optimization. SIAM J. Optim. 29(4), 2916–2948 (2019)
Nene, S.A., Nayar, S.K., Murase, H.: Columbia object image library (coil-100) (1996)
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR’06, vol. 2, pp. 2161–2168. IEEE (2006)
Pan, J., Ng, M.K.: Orthogonal nonnegative matrix factorization by sparsity and nuclear norm optimization. SIAM J. Matrix Anal. Appl. 39(2), 856–875 (2018)
Pompili, F., Gillis, N., Absil, P.A., Glineur, F.: Two algorithms for orthogonal nonnegative matrix factorization with application to clustering. Neurocomputing 141, 15–25 (2014)
Povh, J., Rendl, F.: A copositive programming approach to graph partitioning. SIAM J. Optim. 18(1), 223–241 (2007)
Sieranoja, S., Fränti, P.: Fast and general density peaks clustering. Pattern Recogn. Lett. 128, 551–558 (2019)
Sun, W., Yuan, Y.: Optimization Theory and Methods: Nonlinear Programming, vol. 1. Springer Science & Business Media, New York (2006)
Wang, S., Chang, T.H., Cui, Y., Pang, J.S.: Clustering by orthogonal non-negative matrix factorization: a sequential non-convex penalty approach. In: ICASSP, pp. 5576–5580 (2019)
Wang, S., Chang, T.H., Cui, Y., Pang, J.S.: Clustering by orthogonal NMF model and non-convex penalty optimization. IEEE Trans. Signal Process. 69, 5273–5288 (2021)
Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1), 397–434 (2013)
Xiao, X., Li, Y., Wen, Z., Zhang, L.: A regularized semi-smooth Newton method with projection steps for composite convex programs. J. Sci. Comput. 76, 364–389 (2016)
Yang, L.: Proximal gradient method with extrapolation and line search for a class of nonconvex and nonsmooth problems. arXiv:1711.06831 (2017)
Yang, W.H., Zhang, L.H., Song, R.: Optimality conditions for the nonlinear programming problems on Riemannian manifolds. Pac. J. Optim. 10(2), 415–434 (2014)
Yang, Y., Yang, Y., Shen, H.T., Zhang, Y., Du, X., Zhou, X.: Discriminative nonnegative spectral clustering with out-of-sample extension. IEEE Trans. Knowl. Data Eng. 25(8), 1760–1771 (2012)
Yang, Z., Oja, E.: Linear and nonlinear projective nonnegative matrix factorization. IEEE Trans. Neural Netw. 21(5), 734–749 (2010)
Yoo, J., Choi, S.: Orthogonal nonnegative matrix factorization: multiplicative updates on Stiefel manifolds. In: IDEAL, pp. 140–147. Springer (2008)
Zass, R., Shashua, A.: Nonnegative sparse PCA. In: NeurIPS, pp. 1561–1568 (2007)
Zhang, H., Hager, W.W.: A nonmonotone line search technique and its application to unconstrained optimization. SIAM J. Optim. 14(4), 1043–1056 (2004)
Zhang, J., Liu, H., Wen, Z., Zhang, S.: A sparse completely positive relaxation of the modularity maximization for community detection. SIAM J. Sci. Comput. 40(5), A3091–A3120 (2018)
Zhang, K., Zhang, S., Liu, J., Wang, J., Zhang, J.: Greedy orthogonal pivoting algorithm for non-negative matrix factorization. In: ICML, pp. 7493–7501. PMLR (2019)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Disc. 1(2), 141–182 (1997)
Zhu, F., Wang, Y., Fan, B., Xiang, S., Meng, G., Pan, C.: Spectral unmixing via data-guided sparsity. IEEE Trans. Image Process. 23(12), 5412–5427 (2014)
Acknowledgements
The authors are grateful to the Co-Editor Dr. Adrian Lewis, the Associate Editor and the anonymous reviewers for their valuable comments and suggestions that helped to improve the quality of our manuscript. The work of B. Jiang was supported by the Young Elite Scientists Sponsorship Program by CAST (2017QNRC001), the NSFC grants 11971239 and 11671036. The work of Z. Wen was supported by the NSFC grant 11831002. The work of X. Chen was supported by the Hong Kong Research Grant Council PolyU153001/18P.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Construction of problem (4.1) with unique solution
A Construction of problem (4.1) with unique solution
Proposition 1
Choose \(X^* \in {\mathcal {S}}^{n,k}_+\) and \(L \in {\mathbb {R}}^{k \times k}\) with positive diagonal elements satisfying \(L_{ii}L_{jj} > \max \{L_{ij}, L_{ji}, 0\}^2\ \forall i, j \in [k], i \ne j.\) Then the optimal solution of (4.1) with \(C = X^*L^{\top }\) is unique and equals to \(X^*\).
Proof
For simplicity of notation, we use \(\sum _i\) to denote \(\sum _{i \in [k]}\) in the proof. Since problem (4.1) is equivalent to \(\max _{X \in {\mathcal {S}}^{n,k}_+} \, \langle C, X \rangle \), we only need to show that \(\langle C,Y\rangle <\langle C,X^*\rangle =\sum _{i}L_{ii}\), \(\forall \ {\mathcal {S}}^{n,k}_+\ni Y \ne X^*\). Let \(Z = \mathsf {sgn}(Y)\) and \(P = \varPi _{{\mathbb {R}}_+^n}(L)\). We have
Define \( w_{ji} = \Vert \mathbf {x}_j^* \circ \mathbf {z}_i\Vert ^2\). With \(X^* \in {\mathcal {S}}^{n,k}_+\), we have \(\Vert \sum _{j} P_{ji} (\mathbf {x}_j^* \circ \mathbf {z}_i)\Vert \) = \((\sum _{i} P_{ji}^2 w_{ji})^{1/2}\). Using the Cauchy-Schwarz inequality, \(\Vert \mathbf {y}_i\Vert =1\) and the requirements on L, we have
With (A. 1) and \(\langle C, X^*\rangle = \sum _{i}L_{ii} = \sum _{i} P_{ii}\), we further have
where the second inequality uses the fact that \(\sum _{i} a_i x_i^{1/2} \le (\sum _{i} a_i)^{1/2}(\sum _{i} a_i x_i)^{1/2}\) for \(a_i>0\) and \(x_i\ge 0\), and the third inequality uses \(\sum _{i} w_{ji} \le 1\). Obviously, the equalities in (A. 2) and (A. 3) hold if and only if \(Y=X^*\). The proof is completed. \(\square \)
Rights and permissions
About this article
Cite this article
Jiang, B., Meng, X., Wen, Z. et al. An exact penalty approach for optimization with nonnegative orthogonality constraints. Math. Program. 198, 855–897 (2023). https://doi.org/10.1007/s10107-022-01794-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-022-01794-8
Keywords
- Exact penalty
- Nonnegative orthogonality constraint
- Second-order method
- Constraint qualification
- Optimality condition