Abstract
The nonnegative rank of an entrywise nonnegative matrix \(A \in \mathbb {R}^{m \times n}_+\) is the smallest integer \(r\) such that \(A\) can be written as \(A=UV\) where \(U \in \mathbb {R}^{m \times r}_+\) and \(V \in \mathbb {R}^{r \times n}_+\) are both nonnegative. The nonnegative rank arises in different areas such as combinatorial optimization and communication complexity. Computing this quantity is NP-hard in general and it is thus important to find efficient bounding techniques especially in the context of the aforementioned applications. In this paper we propose a new lower bound on the nonnegative rank which, unlike most existing lower bounds, does not solely rely on the matrix sparsity pattern and applies to nonnegative matrices with arbitrary support. The idea involves computing a certain nuclear norm with nonnegativity constraints which allows to lower bound the nonnegative rank, in the same way the standard nuclear norm gives lower bounds on the standard rank. Our lower bound is expressed as the solution of a copositive programming problem and can be relaxed to obtain polynomial-time computable lower bounds using semidefinite programming. We compare our lower bound with existing ones, and we show examples of matrices where our lower bound performs better than currently known ones.
Similar content being viewed by others
Notes
Throughout the paper, a nonnegative matrix is a matrix whose entries are all nonnegative.
Note that the trivial representation of \(P\) uses \(f\) linear inequalities, where \(f\) is the number of facets of \(P\). However by introducing new variables (i.e., allowing projections) one can sometimes reduce dramatically the number of inequalities needed to represent \(P\). For example the cross-polytope \(P = \{x \in \mathbb {R}^n : w^\mathsf T x \le 1\; \forall w \in \{-1,1\}^n\}\) has \(2^n\) facets but can be represented using only \(2n\) linear inequalities after introducing \(n\) additional variables: \(P = \{x \in \mathbb {R}^n \; : \; \exists y \in \mathbb {R}^n \; y \ge x, \; y \ge -x, \; 1^\mathsf T y = 1\}\).
Since we are interested in obtaining a lower bound for the specific perturbation \(A_{\varepsilon }\) of \(A\), we can obtain a better bound than the one of Theorem 3 by normalizing by \(\Vert A_{\varepsilon }\Vert _F^2\) directly.
Indeed, observe first that if \(W\) is feasible for (18) then \(W^\mathsf T \) is also feasible and has the same objective value. Thus by averaging one can assume that \(W\) is symmetric. Then note that for any feasible symmetric \(W\) and any permutation \(\sigma \), the new matrix \(W'_{i,j} = W_{\sigma (i),\sigma (j)}\) is also feasible and has the same objective value as \(W\). Hence again by averaging we can assume \(W\) to be constant on the diagonal and constant on the off-diagonal.
References
Arora, S., Ge, R., Kannan, R., Moitra, A.: Computing a nonnegative matrix factorization—provably. In: Proceedings of the Forty-fourth Annual ACM Symposium on Theory of Computing, STOC ’12, pp. 145–162. ACM (2012)
Bernstein, D.S.: Matrix Mathematics: Theory, Facts, and Formulas, 2nd edn. Princeton University Press, Princeton (2009)
Braun, G., Jain, R., Lee, T., Pokutta, S.: Information-theoretic approximations of the nonnegative rank. ECCC preprint TR13-158 (2013)
Berman, A., Shaked-Monderer, N.: Completely Positive Matrices. World Scientific Pub Co Inc, Singapore (2003)
Cohen, J.E., Rothblum, U.G.: Nonnegative ranks, decompositions, and factorizations of nonnegative matrices. Linear Algebra Appl. 190, 149–168 (1993)
Dür, M.: Copositive programming – a survey. In: Diehl, M., Glineur, F., Jarlebring, E., Michiels, W. (eds.) Recent Advances in Optimization and its Applications in Engineering, pp. 3–20. Springer, Berlin (2010). doi:10.1007/978-3-642-12598-0_1
Doan, X., Vavasis, S.: Finding approximately rank-one submatrices with the nuclear norm and \(\ell _1\)-norm. SIAM J. Optim. 23(4), 2502–2540 (2013)
Fiorini, S., Kaibel, V., Pashkovich, K., Theis, D.O.: Combinatorial bounds on nonnegative rank and extended formulations. Discret. Math. 313(1), 67–83 (2013)
Fawzi, H., Parrilo, P.A.: Self-scaled bounds for atomic cone ranks: applications to nonnegative rank and cp-rank. arXiv:1404.3240 (2014)
Gillis, N., Glineur, F.: On the geometric interpretation of the nonnegative rank. Linear Algebra Appl. 437(11), 2685–2712 (2012)
Gillis, N.: Sparse and unique nonnegative matrix factorization through data preprocessing. J. Mach. Learn. Res. 13, 3349–3386 (2012)
Goemans, M.X.: Smallest compact formulation for the permutahedron. Math. Program. 1–7 (2014). doi:10.1007/s10107-014-0757-1
Gatermann, K., Parrilo, P.A.: Symmetry groups, semidefinite programs, and sums of squares. J. Pure Appl. Algebra 192(1), 95–128 (2004)
Gouveia, J., Parrilo, P.A., Thomas, R.R.: Approximate cone factorizations and lifts of polytopes. arXiv:1308.2162 (2013)
Gouveia, J., Parrilo, P.A., Thomas, R.R.: Lifts of convex sets and cone factorizations. Math. Oper. Res. 38(2), 248–264 (2013)
Gouveia, J., Robinson, R.Z., Thomas, R.R.: Polytopes of minimum positive semidefinite rank. Discret. Comput. Geom. 50(3), 679–699 (2013)
Jameson, G.J.O.: Summing and Nuclear Norms in Banach Space Theory, vol. 8. Cambridge University Press, Cambridge (1987)
Jain, R., Shi, Y., Wei, Z., Zhang, S.: Efficient protocols for generating bipartite classical distributions and quantum states. IEEE Trans. Inf. Theory 59(8), 5171–5178 (2013)
Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press, New York (2006)
Klerk, E., Pasechnik, D.V.: Approximation of the stability number of a graph via copositive programming. SIAM J. Optim. 12(4), 875–892 (2002)
Löfberg, J.: YALMIP: a toolbox for modeling and optimization in MATLAB. In: Proceedings of the CACSD Conference, Taipei, Taiwan (2004)
Löfberg, J.: Pre- and post-processing sum-of-squares programs in practice. IEEE Trans. Autom. Control 54(5), 1007–1011 (2009)
Lovász, L.: Communication complexity: a survey. In: Korte, B., Promel, H.J., Graham, R.L. (eds.) Paths, Flows, and VLSI-Layout, pp. 235–265. Springer, New York (1990)
Lee, T., Shraibman, A.: Lower Bounds in Communication Complexity Found. Trends Theor. Comput. Sci. 3(4), 263–399 (2009)
Parrilo, P.A.: Structured Semidefinite Programs and Semialgebraic Geometry Methods in Robustness and Optimization. PhD thesis, California Institute of Technology (2000)
Rothvoss, T.: The matching polytope has exponential extension complexity. In: Proceedings of the 46th Annual ACM Symposium on Theory of Computing, STOC ’14, pp. 263–272, ACM (2014)
Vavasis, S.A.: On the complexity of nonnegative matrix factorization. SIAM J. Optim. 20(3), 1364–1377 (2009)
Yannakakis, M.: Expressing combinatorial optimization problems by linear programs. J. Comput. Syst. Sci. 43(3), 441–466 (1991)
Zhang, S.: Quantum strategic game theory. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pp. 39–59, ACM (2012)
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was funded in part by AFOSR FA9550-11-1-0305.
Appendix: Proof of Proposition 3: Slack matrix of hypercube
Appendix: Proof of Proposition 3: Slack matrix of hypercube
In this appendix we prove Proposition 3 concerning the nonnegative rank of the slack matrix of the hypercube. We restate the proposition here for convenience:
Proposition
Let \(C_n = [0,1]^n\) be the hypercube in \(n\) dimensions and let \(S(C_n) \in \mathbb {R}^{2n \times 2^n}\) be its slack matrix. Then
Proof
The facets of the hypercube \(C_n = [0,1]^n\) are given by the linear inequalities \(\{x_k \ge 0\}, \; k=1,\ldots ,n\) and \(\{x_k \le 1\}, \; k=1,\ldots ,n\), and the vertices of \(C_n\) are given by the set \(\{0,1\}^n\) of binary words of length \(n\). It is easy to see that the slack matrix of the hypercube is a 0/1 matrix: in fact, for a given facet \(F\) and vertex \(V\), the \((F,V)\)’th entry of \(S(C_n)\) is given by:
Since the slack matrix of the hypercube \(S(C_n)\) has \(2n\) rows we clearly have
To show that we indeed have equality, we exhibit a particular feasible point \(W\) of the semidefinite program (3) and we show that this \(W\) satisfies \((\langle S(C_n), W \rangle / \Vert S(C_n)\Vert _F)^2\) \( = 2n\).
Let
Observe that \(W\) is the matrix obtained from \(S(C_n)\) by changing the ones into \(\frac{1}{\sqrt{2^{n-1}}}\) and zeros into \(-\frac{1}{\sqrt{2^{n-1}}}\). For this \(W\) we verify using a simple calculation that
It thus remains to prove that \(W\) is indeed a feasible point of the semidefinite program (3) and that the matrix
can be written as the sum of a nonnegative matrix and a positive semidefinite one. This is the main part of the proof and for this we introduce some notations. Let \(\fancyscript{F}\) be the set of facets of the hypercube, and \(\fancyscript{V}=\{0,1\}^n\) be the set of vertices. If \(F \in \fancyscript{F}\) is a facet of the hypercube, we denote by \(\bar{F}\) the opposite facet to \(F\) (namely, if \(F\) is given by \(x_k \ge 0\), then \(\bar{F}\) is the facet \(x_k \le 1\) and vice-versa). Similarly for a vertex \(V \in \fancyscript{V}\) we denote by \(\bar{V}\) the opposite vertex obtained by complementing the binary word \(V\). Denote by \(N_{\fancyscript{F}}:\mathbb {R}^{\fancyscript{F}}\rightarrow \mathbb {R}^{\fancyscript{F}}\) and \(N_{\fancyscript{V}}:\mathbb {R}^{\fancyscript{V}}\rightarrow \mathbb {R}^{\fancyscript{V}}\) the “negation” maps, so that we have:
Note that in a suitable ordering of the facets and the vertices, the matrix representation of \(N_{\fancyscript{F}}\) and \(N_{\fancyscript{V}}\) take the following antidiagonal form (\(N_{\fancyscript{F}}\) is of size \(2n\times 2n\) and \(N_{\fancyscript{V}}\) is of size \(2^n \times 2^n\)):
Consider now the following decomposition of the matrix \({\begin{bmatrix} I&\quad -W\\ -W^\mathsf T&\quad I\end{bmatrix}}\):
Clearly the first matrix in the decomposition is nonnegative. The next lemma states that the second matrix is actually positive semidefinite:
Lemma 2
Let \(\mathcal F\) and \(\mathcal V\) be respectively the set of facets and vertices of the hypercube \(C_n=[0,1]^n\) and let \(\widehat{W} \in \mathbb {R}^{\fancyscript{F}\times \fancyscript{V}}\) be the matrix:
Then the matrix
is positive semidefinite for \(\gamma = 1/\sqrt{2^{n-1}}\) (where \(N_{\fancyscript{F}}\) and \(N_{\fancyscript{V}}\) are defined in (20) and (21)).
Proof
We use the Schur complement to show that the matrix (23) is positive semidefinite. In fact we show that
-
1.
\(I-N_{\fancyscript{V}} \succeq 0\)
-
2.
\({{\mathrm{range}}}(\widehat{W}^\mathsf T ) \subseteq {{\mathrm{range}}}(I-N_{\fancyscript{V}})\), and
-
3.
\(I-N_{\fancyscript{F}} - \gamma ^2 \widehat{W} (I-N_{\fancyscript{V}})^{-1} \widehat{W}^\mathsf T \succeq 0\).
where \((I-N_{\fancyscript{V}})^{-1}\) denotes the pseudo-inverse of \(I-N_{\fancyscript{V}}\).
Observe that for any \(k \in \mathbb {N}\), the \(2k \times 2k\) matrix given by:
is positive semidefinite: in fact one can see that \(\frac{1}{2}(I_{2k}-N_{2k})\) is the orthogonal projection onto the subspace spanned by its columns (i.e., the subspace of dimension \(k\) spanned by \(\{e_i - e_{2k-i}:\;i=1,\ldots ,n\}\) where \(e_i\) is the \(i\)’th unit vector). Hence this shows that \(I-N_{\fancyscript{V}}\) is positive semidefinite, and it also shows that \((I-N_{\fancyscript{V}})^{-1} = \frac{1}{4} (I-N_{\fancyscript{V}})\).
Now we show that \({{\mathrm{range}}}(\widehat{W}^\mathsf T ) \subseteq {{\mathrm{range}}}(I-N_{\fancyscript{V}})\). For any \(F \in \fancyscript{F}\), the \(F\)’th column of \(\widehat{W}^\mathsf T \) satisfies \((\widehat{W}^\mathsf T )_{V,F} = -(\widehat{W}^\mathsf T )_{\bar{V},F}\) for any \(V \in \fancyscript{V}\), and thus \({{\mathrm{range}}}(\widehat{W}^\mathsf T ) \subseteq {{\mathrm{span}}}(e_{V} - e_{\bar{V}}, \; : \; V \in \fancyscript{V}) = {{\mathrm{range}}}(I-N_{\fancyscript{V}})\).
It thus remains to show that
First note that since \(\frac{1}{2} (I-N_{\fancyscript{V}})\) is an orthogonal projection and that \({{\mathrm{range}}}(\widehat{W}^\mathsf T ) \subseteq {{\mathrm{range}}}(I-N_{\fancyscript{V}})\), we have \((I-N_{\fancyscript{V}})^{-1} \widehat{W}^\mathsf T = \frac{1}{2} \widehat{W}^\mathsf T \). Thus we now have to show that
The main observation here is that the matrix \(\widehat{W}\widehat{W}^\mathsf T \) is actually equal to \(2^{n} (I-N_{\fancyscript{F}})\). For any \(F,G \in \fancyscript{F}\), we have:
First it is clear that if \(F=G\), then \((\widehat{W}\widehat{W}^\mathsf T )_{F,G} = 2^n\). Also if \(F=\bar{G}\) then \((\widehat{W}\widehat{W}^\mathsf T )_{F,G} = -2^n\) since if \(F=\bar{G}\) then \(a \in F \Leftrightarrow a \notin G\) hence \(\widehat{W}_{F,a} \widehat{W}_{G,a} = -1\) for all \(a \in \fancyscript{V}\). In the case that \(F \ne G\) and \(F \ne \bar{G}\), it is easy to verify by simple counting that \(\sum \nolimits _{a \in \fancyscript{V}}\widehat{W}_{F,a} \widehat{W}_{G,a} = 0\).
Hence we have \(I-N_{\fancyscript{F}} - \gamma ^2 \widehat{W} (I-N_{\fancyscript{V}})^{-1} \widehat{W}^\mathsf T = I-N_{\fancyscript{F}} - \frac{\gamma ^2}{2} \widehat{W}\widehat{W}^\mathsf T = (1 - \frac{\gamma ^2}{2}2^n) (I-N_{\fancyscript{F}})\) which is positive semidefinite for \(\gamma = 1/\sqrt{2^{n-1}}\).
Using this lemma, Eq. (22) shows that the matrix \(W\) is feasible for the semidefinite program (3), and thus that \(\nu _+^{[0]}(S(C_n)) \ge \langle S(C_n), W \rangle = \sqrt{2^{n-1}} \cdot 2n\). Hence since \(\Vert S(C_n)\Vert _F = \sqrt{2^{n-1} \cdot 2n}\) we get that
which completes the proof.
Rights and permissions
About this article
Cite this article
Fawzi, H., Parrilo, P.A. Lower bounds on nonnegative rank via nonnegative nuclear norms. Math. Program. 153, 41–66 (2015). https://doi.org/10.1007/s10107-014-0837-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-014-0837-2