Lower bounds on matrix factorization ranks via noncommutative polynomial optimization

We use techniques from (tracial noncommutative) polynomial optimization to formulate hierarchies of semidefinite programming lower bounds on matrix factorization ranks. In particular, we consider the nonnegative rank, the positive semidefinite rank, and their symmetric analogues: the completely positive rank and the completely positive semidefinite rank. We study the convergence properties of our hierarchies, compare them extensively to known lower bounds, and provide some (numerical) examples.


Matrix factorization ranks
A factorization of a matrix A ∈ R m×n over a sequence of cones {K d }, each equipped with an inner product, is a decomposition of the form A ij = X i , Y j with X i , Y j ∈ K d for all i, j. The smallest d for which such a factorization exists is called the cone factorization rank of A over {K d } [GPT13]. An important example is the nonnegative rank, denoted rank + (A), where K d is the nonnegative orthant R d + with the standard inner product. Another important example is the positive semidefinite rank, denoted psd-rank R (A), where K d is the positive semidefinite cone S d + with the trace inner product X, Y = Tr(X T Y ). Both the nonnegative rank and positive semidefinite rank are defined whenever A is entrywise nonnegative.
The study of the nonnegative rank is largely motivated by the groundbreaking work of Yannakakis [Yan91], who showed that the linear extension complexity of a polytope P is given by the nonnegative rank of its slack matrix. The linear extension complexity of P is the smallest integer d for which P is the linear image of an affine section of the nonnegative orthant R d + , and the slack matrix of P is given by (b i − a T i v) v∈V,i∈I when P = conv(V ) = {x : a T i x ≤ b i (i ∈ I)}. Analogously, the semidefinite extension complexity of P is the smallest d such that P is the linear image of an affine section of the cone S d + and it is given by the positive semidefinite rank of its slack matrix [GPT13]. The motivation to study extension complexity is that polytopes with small extension complexity admit efficient algorithms for linear optimization. Well-known examples include spanning tree polytopes [Mar91] and permutahedra [Goe15], which have polynomial linear extension complexity, and the stable set polytope of perfect graphs, which has polynomial semidefinite extension complexity [MGS81] (see, e.g., the surveys [CCZ10, FGP + 15]). The above connection to matrix factorization ranks can be used to show that a polytope does not admit a small extended formulation. Recently this connection was used to show that the linear extension complexities of the traveling salesman, cut, and stable set polytopes are exponential in the number of nodes [FMP + 15], and this result was extended to their semidefinite extension complexities in [LRS15]. Surprisingly, the linear extension complexity of the matching polytope is also exponential [Rot14], even though linear optimization over this set is polynomial time solvable [Edm65]. It is an open question whether the semidefinite extension complexity of the matching polytope is exponential.
Besides applications in extension complexity, the nonnegative rank also finds applications in probability and communication complexity, and the positive semidefinite rank has applications in quantum information theory and quantum communication complexity (see, e.g., [MSvS03,FFGT15,JSWZ13,FMP + 15]).
We are also interested in symmetric analogues of the above matrix factorization ranks, where we require the same factors for the rows and columns (m = n and X i = Y i for all i). Analogously to the nonnegative rank, we consider the completely positive rank, denoted cp-rank(A), where We also consider the completely positive semidefinite rank, denoted cpsd-rank K (A), where H d + is the cone of complex Hermitian positive semidefinite matrices with the trace inner product X, Y = Tr(X * Y ). The symmetric matrices for which these parameters are well defined form convex cones known as the completely positive cone, denoted CP n , and the completely positive semidefinite cone, denoted CS n + . We have the inclusions CP n ⊆ CS n + ⊆ S n + , which are strict for n ≥ 5. For details on these cones see [BSM03,LP15].
Motivation for the cones CP n and CS n + comes in particular from their use to model classical and quantum information optimization problems. Graph parameters such as the stability number and the chromatic number can be written as linear optimization problems over the completely positive cone [dKP02] and the same holds, more generally, for any quadratic problem with mixed binary variables [Bur09]. The cp-rank is widely studied in the linear algebra community; see, e.g., [BSM03,SMBJS13,SMBB + 15,BSU14]. The completely positive semidefinite cone was first studied in [LP15] to describe quantum analogues of the stability number and chromatic number of a graph, which was later extended to general graph homomorphisms [SV17] and graph isomorphism [AMR + 16]. An additional connection between the completely positive semidefinite cone and the set of quantum correlations is shown in [MR14,SV17]. This also gives a relation between the completely positive semidefinite rank and the minimal entanglement dimension necessary to realize a quantum correlation. This connection has been used in [PSVW16,GdLL17a,PV17] to construct matrices whose completely positive semidefinite rank is exponentially large in the matrix size. For the special case of synchronous quantum correlations the minimum entanglement dimension is given by the completely positive semidefinite rank of a certain matrix (see [GdLL17b]).
We may also consider the parameter psd-rank C (A), obtained by using asymmetric factorizations by complex Hermitian positive semidefinite matrices. We have psd-rank C (A) ≤ psd-rank R (A) ≤ rank + (A) ≤ min{m, n} and cp-rank(A) ≤ n+1 2 , but the situation for the cpsd-rank is very different. Exploiting the connection between the completely positive semidefinite cone and quantum correlations it follows from results in [Slo17] that the cone CS n + is not closed for n ≥ 1942, so there does not exist an upper bound on the cpsd-rank as a function of the matrix size. For small matrix sizes very little is known. It is an open problem whether CS 5 + is closed, and we do not know how to construct a 5 × 5 matrix whose cpsd-rank exceeds 5.
The rank + , cp-rank, and psd-rank are known to be computable (using results from [Ren92], since upper bounds exist on these factorization ranks; see [BR06] for cp-rank). But computing the nonnegative rank is NP-complete [Vav09] and the psd-rank is existential theory of the reals complete [Shi16]. For the cp-rank and cpsd-rank no such results are known, but there is no reason to assume they are any easier. In fact it is not even clear whether the cpsdrank is computable in general. To obtain upper bounds one can employ heuristics that try to construct small factorizations. Many such heuristics exist for the nonnegative rank (see the overview [Gil17] and references therein), factorization algorithms exist for structured completely positive matrices (see [DD12]), and algorithms to compute positive semidefinite factorizations are presented in the recent work [VGG17]. In this paper we want to compute lower bounds, which we achieve by employing a relaxation approach based on (noncommutative) polynomial optimization.

Contributions and connections to existing bounds
In this work we provide a unified approach to obtain lower bounds on the above mentioned matrix factorization ranks, based on tools from (noncommutative) polynomial optimization.
We introduce in Section 3 our approach for the completely positive semidefinite rank. We start by defining a hierarchy of lower bounds where ξ cpsd t (A), for t ∈ N, is given as the optimal value of a semidefinite program whose size increases with t. Not much is known about lower bounds for the cpsd-rank in the literature. The inequality rank(A) ≤ cpsd-rank C (A) follows by viewing a Hermitian d × d matrix as a d 2 -dimensional real vector, and an analytic lower bound is given in [PSVW16]. We show ξ cpsd 1 (A) is at least as good as this analytic lower bound and we give a small example where a strengthening of ξ cpsd 2 (A) is strictly better then both above mentioned generic lower bounds. Currently we lack evidence that the lower bounds ξ cpsd t (A) can be larger than, for example, the matrix size, but this could be because small matrices with large cpsd-rank are hard to construct or might not exist. We also introduce several ideas leading to strengthenings of the basic bounds ξ cpsd t (A). We then adapt these ideas to the other three matrix factorization ranks discussed above, where for each of them we obtain analogous hierarchies of bounds.
For the nonnegative rank and completely positive rank much more is known about lower bounds. The best known generic lower bounds are due to Fawzi and Parrilo [FP15,FP16].
In [FP16] the parameters τ + (A) and τ cp (A) are defined, which, respectively, lower bound the nonnegative rank and the cp-rank, along with their computable semidefinite programming relaxations τ sos + (A) and τ sos cp (A). In [FP16] it is also shown that τ + (A) is at least as good as certain norm-based lower bounds. In particular, τ + is at least as good as the ∞ norm-based lower bound, which was used by Rothvoß [Rot14] to show that the matching polytope has exponential linear extension complexity. In [FP15] it is shown that for the Frobenius norm, the square of the norm-based bound is still a lower bound on the nonnegative rank, but it is not known how this lower bound compares to τ + .
Fawzi and Parrilo [FP16] use the atomicity of the nonnegative and completely positive ranks to derive the parameters τ + (A) and τ cp (A); i.e., they use the fact that the nonnegative rank (cp-rank) is equal to the smallest d for which A can be written as A = d k=1 R k , where each R k is a nonnegative (positive semidefinite) rank one matrix. As the psd-rank and cpsd-rank are not known to admit atomic formulations, the techniques from [FP16] do not extend to these factorization ranks. However, our approach via polynomial optimization permits to capture these factorization ranks as well.
In Sections 4 and 5 we construct semidefinite programming hierarchies of lower bounds ξ cp t (A) and ξ + t (A) on cp-rank(A) and rank + (A). We show ξ + t (A) converges to τ + (A) as t → ∞. The basic hierarchy ξ cp t (A) for the cp-rank does not converge to τ cp (A), but we provide two types of additional constraints that can be added to ξ cp t (A) to ensure convergence to τ cp (A). First, we show how a generalization of the tensor constraints that are used in the definition of the parameter τ sos cp (A) can be used for this, and we also give a more efficient (using smaller matrix blocks) description of these constraints. This strengthening of ξ cp 2 (A) is then at least as strong as τ sos cp (A), but requires matrix variables of roughly half the size. Alternatively, we show that for every ε > 0 there is a finite number of additional linear constraints that can be added to the basic hierarchy ξ cp t (A) so that the limit of the sequence of these new lower bounds is at least τ cp (A) − ε. We give numerical results on small matrices studied in the literature, and we show ξ + 3 (A) can improve over τ sos + (A). Finally, in Section 6 we derive a hierarchy ξ psd t (A) of lower bounds on the psd-rank. We compare the bounds ξ psd t (A) to a bound from [LWdW17] and we provide some numerical examples illustrating the performance of our bounds.
We provide two implementations of all lower bounds introduced in this paper, at the arXiv submission of this paper. One implementation uses Matlab and the CVX package [GB14], and the other uses Julia [BEKS17]. The implementations support various SDP solvers, for our numerical examples we used Mosek [ApS17].

Basic approach and notation
We use (noncommutative) polynomial optimization to define hierarchies of relaxations for the problem of finding smallest possible matrix factorization ranks.
In classical polynomial optimization the problem is to find the global minimum of a polynomial over a semialgebraic set of the form {x ∈ R n : [Las01] and Parrilo [Par00] have proposed hierarchies of semidefinite programming relaxations based on the theory of moments and the dual theory of sums of squares polynomials, which can be used to compute successively better lower bounds converging to the global minimum (under an Archimedean condition). This approach has been used in a wide range of applications and there is extensive literature (see, e.g., [AL12,Las09,Lau09]). Most relevant to this work, it is used in [Las14] to design conic approximations of the completely positive cone and in [Nie14] to check membership in the completely positive cone. This approach has also been extended to eigenvalue optimization [PNA10,NPA12], and later to tracial optimization [BCKP13,KP16]. Here we briefly explain the tracial optimization problem, which is most relevant to our work, for which we first need some notation. We denote the set of all words in the symbols x 1 , . . . , x n by x = x 1 , . . . , x n , where the empty word is denoted by 1. This is a semigroup with involution, where the binary operation is concatenation, and the involution of a word w is the word w * obtained by reversing the order of the symbols in w. The * -algebra of all real linear combinations of these words is denoted by R x , and its elements are called noncommutative polynomials. The involution extends to R x by linearity. A polynomial p ∈ R x is called symmetric if p * = p and Sym R x denotes the set of symmetric polynomials. We can evaluate a noncommutative polynomial at a matrix tuple X = (X 1 , . . . , X n ) ∈ (H d ) n . In tracial optimization the problem is to minimize the normalized trace tr(f (X)) over the matrix positivity domain where S ⊆ Sym R x . Throughout tr(·) denotes the normalized trace, while Tr(·) denotes the (non-normalized) trace, so that tr(I) = 1 and Tr(I) = d for the identity matrix I ∈ H d . As in the classical (commutative) case, semidefinite programming hierarchies have been constructed to obtain lower bounds on the minimal normalized trace. Here the distinguishing feature is the dimension independence: the optimization is over all possible matrix sizes. Perhaps counterintuitively, in this paper we use these techniques to compute lower bounds on factorization dimensions.
To explain the basic idea of how we obtain lower bounds we consider the cpsd-rank case. Given a minimal factorization A = (Tr(X i , X j )), with X = (X 1 , . . . , X n ) ∈ (H d + ) n and d = cpsd-rank C (A), consider the linear form L X on R x defined by L X (p) = Tr(p(X 1 , . . . , X n )) for p ∈ R x , and consider its real part L = Re(L X ) ∈ R x * . Then we have A = (L(x i x j )) and cpsd-rank C (A) = d = L(1). To obtain lower bounds on cpsd-rank C (A) we minimize L(1) over a set of linear functionals L that satisfy certain computationally tractable properties of L. Note that this idea of minimizing L(1) has been used recently [TS15,Nie16] in the commutative setting to derive a converging hierarchy to the nuclear norm of a symmetric tensor. In order to explain how we give a semidefinite programming hierarchy using this idea we first need more notation.
Given t ∈ N ∪ {∞}, we let x t be the set of words w of degree |w| ≤ t, so that x ∞ = x , and R x t is the real vector space of noncommutative polynomials p of degree deg(p) ≤ t. Given t ∈ N, we let x =t be the set of words of degree exactly equal to t. Let R x * t be the space of real-valued linear functionals on R x t . A linear functional L ∈ R x * t is symmetric if L(w) = L(w * ) for all w ∈ x t and tracial if L(ww ) = L(w w) for all w, w ∈ x t . A linear functional L ∈ R x * 2t is said to be positive if L(p * p) ≥ 0 for all p ∈ R x t . Given a set S ⊆ Sym R x and t ∈ N ∪ {∞}, the truncated quadratic module at degree 2t, denoted by M 2t (S), is the cone generated by all polynomials p * gp ∈ R x 2t with g ∈ S ∪ {1}: The linear functional L X as defined above is symmetric and tracial. Moreover it satisfies some positivity conditions, since we have L X (q) ≥ 0 whenever q(X) is positive semidefinite. It follows that L X (p * p) ≥ 0 and, as will be explained later, L X satisfies the following localizing The bound ξ cpsd t (A) is computationally tractable (for small t). Indeed the localizing constraint (L ≥ 0 on M 2t (S)) can be enforced by requiring certain matrices, whose entries are determined by L, to be positive semidefinite. This makes the problem defining ξ cpsd t (A) into a semidefinite program. These localizing conditions ensure the Archimedean property for the quadratic module, which permits to show some convergence properties of the bounds ξ cpsd t (A). The above idea extends naturally for the other matrix factorization ranks, using the following two basic ideas. First, since the cp-rank and the nonnegative rank deal with factorizations by diagonal matrices, we use linear functionals acting on classical commutative polynomials. Second, the asymmetric factorization ranks (psd-rank and nonnegative rank) can be seen as analogs of the symmetric ranks in the partial matrix setting, where we know only the values of L on the quadratic monomials corresponding to entries in the off-diagonal blocks (this will require scaling of the factors in order to be able to define localizing constraints ensuring the Archimedean property). A main advantage of our approach is that it applies to all four matrix factorization ranks, after easy suitable adaptations.

Organization
We start by providing the necessary background about commutative and tracial polynomial optimization in Section 2, with proofs relegated to the appendix. We then introduce in Section 3 the basic hierarchy of lower bounds on the cpsd-rank, including a discussion of its properties and of some ways to improve the 'basic' bounds. In Sections 4 and 5 we present the adapted hierarchies for the completely positive rank and the nonnegative rank, respectively. As mentioned above, we compare our results to known bounds in [FP16], and we also provide computational examples illustrating the behaviour of the various bounds. In Section 6 we discuss our hierarchy for the positive semidefinite rank, and we also provide some computational examples.

Commutative and tracial polynomial optimization
In this section we discuss known convergence and flatness results for commutative and tracial optimization. We present these results in such a way that we can later use them for our hierarchies to lower bound matrix factorization ranks. Although the commutative case was developed first, here we treat the commutative and tracial cases together. We will provide proofs (in the appendices) working only on the "moment side", that is, relying on properties of linear functionals rather than using real algebraic results on sums of squares. Tracial optimization is an adaptation of eigenvalue optimization as developed in [PNA10], but here we only discuss the commutative and tracial cases, as these are the most relevant to our approach.

Flat extensions and representations of linear forms
We start by defining what it means for a linear functional L ∈ R x * 2t to be flat. For this we can use the null space of L, which is the vector space We also use the notation N (L) = N ∞ (L). Given an integer 1 ≤ δ ≤ t, the functional L is said Many properties satisfied by a linear functional L ∈ R x * 2t can be conveniently expressed in terms of its moment matrix (aka Hankel matrix) M t (L), which is the matrix whose rows and columns are indexed by the words in x t , with entries Indeed, the linear form L is symmetric if and only if M t (L) is symmetric, and L is symmetric and positive if and only if M t (L) is positive semidefinite. Moreover, the null space of L can be identified with the kernel of M t (L): A polynomial p = w c w w belongs to N t (L) if and only if its coefficient vector (c w ) belongs to the kernel of M t (L). Finally, the form L is δ-flat if and only if the rank of M t (L) is equal to the rank of its principal submatrix indexed by the words in x t−δ ; that is, if rank M t (L) = rank M t−δ (L).
One can express nonnegativity of a tracial linear form L ∈ R x * 2t on a truncated quadratic module M 2t (S) as defined in (1) in terms of certain associated positive semidefinite moment matrices: Given a polynomial g ∈ R x , define gL ∈ R x * 2t−deg(g) by (gL)(p) = L(gp), so that Important examples of positive symmetric tracial linear forms on R x arise from trace evaluation maps. Given a tuple X = (X 1 , . . . , X n ) ∈ (H d ) n of Hermitian d × d matrices, we define the trace evaluation map L X on R x by setting L X (p) = Tr(p(X 1 , . . . , X n )) for all p ∈ R x .
Recall Tr(·) is the matrix trace that satisfies Tr(I) = d for I ∈ H d . If we instead use the normalized trace tr(·), then L X is called a normalized trace evaluation map and L X (1) = tr(I) = Tr(I)/d = 1 holds. A (normalized) trace evaluation map is symmetric, tracial, and positive, which follows from Tr(X * ) = Tr(X), Tr(XY ) = Tr(Y X), Tr(X * X) ≥ 0 for all X, Y ∈ H d .
As X consists of complex Hermitian matrices, the trace evaluation L X takes complex values and we may consider its real part Re(L X ), which is also symmetric and tracial, and positive whenever L X is positive.
The matrix positivity domain of a subset S ⊆ Sym R x of symmetric polynomials is defined as D(S) = d≥1 X = (X 1 , . . . , X n ) ∈ (H d ) n : g(X) 0 for all g ∈ S . (2) Hence the trace evaluation map L X for X ∈ D(S) is nonnegative on the quadratic module We will now discuss several key results about extending and representing linear forms that are nonnegative on a (truncated) quadratic module. These results are already (mostly) known, but as they will be used extensively in our paper, we provide all proofs in the appendix for the reader's convenience. For the proofs we will work only on the "moment side", which is the natural setting for the applications treated in this paper, thus without using algebraic results about sums of squares of polynomials. In addition we will formulate our results making the link to C * -algebras explicit whenever relevant, since this will be useful for later results in this paper and also in the work [GdLL17b]. As infinite dimensional analogs of matrix algebras, C * -algebras arise naturally when considering tracial polynomial optimization problems.
We start with some background on C * -algebras; see, e.g., [Bla06] for details. For our purposes we define a C * -algebra to be a norm closed * -subalgebra over C of the space B(H) of bounded operators on a complex Hilbert space H. Such an algebra A is unital if it contains the identity operator (denoted 1). By a fundamental result of Artin-Wedderburn, a finite dimensional C * -algebra is * -isomorphic to a direct sum M m=1 C dm×dm of full complex matrix algebras [BEK78,Wed64]. In particular, any finite dimensional C * -algebra is unital. The positive elements in a C * -algebra A are those of the form a * a for a ∈ A, and A is equipped with a norm · (induced from the operator norm of B(H)) such that a * a = a 2 . A state τ on a unital C * -algebra A is a linear form that is positive, i.e., τ (a * a) ≥ 0 for all a ∈ A, and satisfies τ (1) = 1. Since A is a complex algebra, every state τ is Hermitian: τ (a) = τ (a * ) for all a ∈ A. The state τ is tracial if τ (ab) = τ (ba) for all a, b ∈ A and faithful if τ (a * a) = 0 implies a = 0. A useful fact is that the normalized matrix trace is the unique tracial state on the full matrix algebra C d×d (see, e.g., [BK12]). Theorems 2.1 and 2.2 use a Gelfand-Naimark-Segal (GNS) construction to represent a positive linear functional as a tracial state on a unital C * -algebra. Given a unital C * -algebra A, we need the following adaptation of the matrix positivity domain: where g(X) 0 means that g(X) is positive in A.
Analogously to the truncated quadratic module and matrix positivity domain, we will also use truncated ideals and matrix varieties. For T ⊆ R x we define When all polynomials h ∈ T are symmetric one can capture the constraint L = 0 on I 2t (T ) by requiring L ≥ 0 on M 2t (±T ). An important advantage of treating the ideal separately, however, is that we can also use nonsymmetric generators in T .
In the following theorem we assume M(S) + I(T ) is Archimedean, by which we mean that there exists a scalar R > 0 such that R − n i=1 x 2 i ∈ M(S) + I(T ). The following theorem is implicit in several works (see, e.g., [NPA12,BKP16]). (2) there is a unital C * -algebra A with tracial state τ and X ∈ D A (S) ∩ V A (T ) with The next result can be seen as a finite dimensional analogue of the above result, where we do not need M(S) to be Archimedean, but instead assume the rank of M (L) is finite. In addition to the Gelfand-Naimark-Segal construction, the proof uses Artin-Wedderburn theory. For the unconstrained case the proof of this can be found in [BK12] and in [BKP16,KP16] this result is extended to the constrained case.
Theorem 2.2. For S ⊆ Sym R x , T ⊆ R x , L ∈ R x * , the following are equivalent: (1) L is a symmetric, tracial, linear form with L(1) = 1 that is nonnegative on M(S), zero on I(T ), and has rank(M (L)) < ∞; (2) there is a finite dimensional C * -algebra A with a tracial state τ , and X ∈ D A (S) ∩ V A (T ) satisfying equation (3); (3) L is a convex combination of normalized trace evaluations at points in D(S) ∩ V(T ).
The following result claims that any flat linear functional on a truncated polynomial space can be extended to a linear functional on the full space of polynomials (with the same positivity properties). It is due to Curto and Fialkow [CF96] in the commutative case and extensions to the noncommutative case can be found in [PNA10] (for eigenvalue optimization) and [BK12] (for trace optimization).
is symmetric, tracial, δ-flat, nonnegative on M 2t (S), and zero on I 2t (T ). Then L extends to a symmetric, tracial, linear form on R x that is nonnegative on M(S), zero on I(T ), and whose moment matrix has finite rank.
Combining Theorems 2.2 and 2.3 gives the following result, which shows that a flat linear form can be extended to a conic combination of trace evaluation maps. It was first proven in [KP16, Proposition 6.1] (and in [BK12] for the unconstrained case).
2t is symmetric, tracial, δ-flat, nonnegative on M 2t (S), and zero on I 2t (T ), then it extends to a conic combination of trace evaluations at elements of D(S) ∩ V(T ).

Specialization to the commutative setting
The material from the previous section can be adapted to the commutative setting. The Due to the commutativity of the variables, this matrix is smaller and more entries are now required to be equal. For instance, the (x 2 x 1 , x 3 x 4 )-entry of M 2 (L) is equal to its (x 3 x 1 , x 2 x 4 )-entry, which does not hold in general in the non-commutative case.
Given a ∈ R n , the evaluation map at a is the linear map L a ∈ R[x] * defined by L a (p) = p(a 1 , . . . , a n ) for all p ∈ R[x].
We can view L a as a trace evaluation at scalar matrices. Moreover, we can view a trace evaluation map at a tuple of pairwise commuting matrices as a conic combination of evaluation maps by simultaneously diagonalizing the matrices. The quadratic module M(S) and the ideal I(T ) have immediate specializations to the commutative setting. We only need the following two new definitions: the (scalar) positivity domain and scalar variety of sets S, T ⊆ R[x] are given by D(S) = a ∈ R n : g(a) ≥ 0 for g ∈ S , V (T ) = a ∈ R n : h(a) = 0 for h ∈ T . (4) We first give the commutative analogue of Theorem 2.1, where we give an additional integral representation in point (3). The equivalence of points (1) and (3) is proved in [Put93] based on Putinar's Positivstellensatz. In the appendix we give a direct proof on the "moment side" using the Gelfand representation. (2) there exists a unital commutative C * -algebra A with a state τ and X ∈ D A (S) ∩ V A (T ) such that L(p) = τ (p(X)) for all p ∈ R[x]; (3) there is a probability measure µ on D(S) ∩ V (T ) such that The following is the commutative analogue of Theorem 2.2.
, and L ∈ R[x] * , the following are equivalent: (1) L is nonnegative on M(S), zero on I(T ), has rank(M (L)) < ∞, and L(1) = 1; (2) there is a finite dimensional commutative C * -algebra A with a state τ , and X ∈ D A (S) ∩ V A (T ) such that L(p) = τ (p(X)) for all p ∈ R[x]; (3) L is a convex combination of evaluations at points in D(S) ∩ V (T ).
The next result, due to Curto and Fialkow [CF96], is the commutative analogue of Corollary 2.4.
2t is δ-flat, nonnegative on M 2t (S), and zero on I 2t (T ), then L extends to a conic combination of evaluation maps at points in D(S) ∩ V (T ).
We will also use the following result, which permits to express any linear functional L nonnegative on an Archimedean quadratic module as a conic combination of evaluations at points, when restricting L to polynomials of bounded degree.
and zero on I(T ), then for any integer k ∈ N the restriction of L to R[x] k extends to a conic combination of evaluations at points in D(S) ∩ V (T ).

Commutative and tracial polynomial optimization
We briefly recall here the basic polynomial optimization problem in both the commutative and tracial settings. We recall how to design hierarchies of semidefinite programming based bounds and we give their main convergence properties. In the rest of the paper we will see how these results can be adapted to give hierarchies of bounds for matrix factorization ranks. The classical commutative polynomial optimization problem asks to minimize a polynomial f ∈ R[x] over a feasible region of the form D(S) as defined in (4): In tracial polynomial optimization, given f ∈ Sym R x , this is modified to minimizing tr(f (X)) over a feasible region of the form D(S) as in (2): where the infimum does not change if we replace H d by S d . Commutative optimization is recovered by restricting to 1 × 1 matrices. For the commutative case, Lasserre [Las01] and Parrilo [Par00] have proposed hierarchies of semidefinite programming relaxations based on sums of squares of polynomials and the dual theory of moments. This approach has been extended to eigenvalue optimization [PNA10,NPA12] and later to tracial optimization [BCKP13,KP16]. The starting point in deriving these relaxations is to reformulate the above problems as minimizing L(f ) over all normalized trace evaluation maps L at points in D(S) or D(S), and then to express computationally tractable properties satisfied by such maps L.
We use this to formulate the following semidefinite programming lower bound on f * : In the same way, for S ∪ {f } ⊆ Sym R x and t such that deg(f )/2 ≤ t ≤ ∞, we have the following semidefinite programming lower bound on f tr * : where we now use definition (1) for M 2t (S). The next theorem from [Las01] gives fundamental convergence properties for the commutative case; see, e.g., [Las09,Lau09] for a detailed exposition.
(ii) If f t admits an optimal solution L that is δ-flat, then L is a convex combination of evaluation maps at global minimizers of f in D(S), To discuss convergence for the tracial case we need one more optimization problem: This problem can be seen as an infinite dimensional analogue of f tr * : if we restrict to finite dimensional C * -algebras in the definition of f tr II 1 , then we recover f tr * (cf. Theorem 2.2). Moreover, under a certain 'flatness' condition, equality f tr * = f tr II 1 holds (cf. Theorem 2.11). Whether f tr II 1 = f tr * is true in general is related to Connes' embedding conjecture (see [KS08,KP16,BKP16]). Above we defined the parameter f tr II 1 using C * -algebras. However, the following lemma shows that we get the same optimal value if we restrict to A being a von Neumann algebra of type II 1 with separable predual, which is the more common way of defining the parameter f tr II 1 as is done in [KP16] (and justifies the notation). We omit the proof of this lemma which relies on a GNS construction and algebraic manipulations, standard for algebraists.
Lemma 2.10. Let A be a C * -algebra with tracial state τ and a 1 , . . . , a n ∈ A. There exists a von Neumann algebra F of type II 1 with separable predual, a faithful normal tracial state φ, and elements b 1 , . . . , b n ∈ F, so that for every p ∈ R x we have τ (p(a 1 , . . . , a n )) = φ(p(b 1 , . . . , b n )) and where the last inequality follows by considering for A the full matrix algebra C d×d . The next theorem from [KP16] summarizes convergence properties.
is Archimedean, then f tr t → f tr ∞ as t → ∞, and the optimal values in f tr ∞ and f tr II 1 are attained and equal.
(ii) If f tr t has an optimal solution L that is δ-flat, then L is a convex combination of normalized trace evaluations at matrix tuples in D(S), and f tr We conclude with the following technical lemma, based on the Banach-Alaoglu theorem. It is a well known crucial tool for proving the asymptotic convergence result from Theorem 2.11(i) and it will be used again later in the paper.
3 Lower bounds on the completely positive semidefinite rank Let A be a completely positive semidefinite n × n matrix. For t ∈ N ∪ {∞} we consider the following semidefinite program to lower bound its complex completely positive semidefinite rank: Additionally, define ξ cpsd * (A) by adding the constraint rank(M (L)) < ∞ to the program defining ξ cpsd ∞ (A) and considering the infimum instead of the minimum, since we do not know if the infimum is attained in ξ cpsd * This gives a hierarchy of monotone nondecreasing lower bounds on the completely positive semidefinite rank: The inequality ξ cpsd ∞ (A) ≤ ξ cpsd * (A) and monotonicity are clear: The following notion of localizing polynomials will be useful. A set S ⊆ R x is said to be localizing at a matrix tuple X if X ∈ D(S), and we say that S is localizing for A if S is localizing at some factorization (5) is localizing for A. In fact, it is localizing at any factorization X of A by Hermitian positive semidefinite matrices. Indeed, since . We can use this to show ξ cpsd * By construction L X is symmetric and tracial, and A = (L(x i x j )). Moreover, since the set of polynomials S cpsd A is localizing for A, the form L X is nonnegative on M(S cpsd A ). Finally, we have rank(M (L X )) < ∞, since the algebra generated by X 1 , . . . , X n is finite dimensional. Hence, L X is feasible for ξ cpsd * which hold in light of the following identities: In the rest of this section we investigate properties of the above hierarchy as well as variations on it. We discuss convergence properties, asymptotically and under flatness, and we give another formulation for the parameter ξ cpsd * (A). Moreover, as the inequality ξ cpsd * is typically strict, we present an approach to strengthen the bounds in order to go beyond ξ cpsd * (A). We also propose some techniques to simplify the computation of the bounds, and we illustrate the behaviour of the bounds on some examples.

The parameters ξ cpsd
In this section we consider convergence properties of the hierarchy ξ cpsd t (·), both asymptotically and under flatness.
Proof. The sequence (ξ cpsd . Hence, by Lemma 2.12, the feasible region of ξ cpsd t (A) is compact, and thus it has an optimal solution L t . Again by Lemma 2.12, the sequence (L t ) has a pointwise converging subsequence with limit L ∈ R x * . This pointwise limit L is symmetric, tracial, satisfies (L(x i x j )) = A, and is nonnegative on M(S cpsd A ). Hence L is feasible for ξ cpsd ∞ (A). This implies that L is optimal for ξ cpsd ∞ (A) and that lim t→∞ ξ cpsd t (A) = ξ cpsd ∞ (A). The reformulation of ξ cpsd ∞ (A) using C * -algebras follows from Theorem 2.1.
The following proposition is a direct application of Corollary 2.4. In general we do not know whether the infimum in ξ cpsd * (A) is attained, but as the following result shows this infimum is attained if there is a t ∈ N for which ξ cpsd t (A) admits a flat optimal solution. We have the following more explicit formulation for ξ cpsd * (A), which also explains why the inequality ξ cpsd * (A) ≤ cpsd-rank C (A) is typically strict. Here · denotes the operator norm, so that X = λ max (X) for X 0.
Note that using matrices from S dm + instead of H dm + does not change the optimal value.
Proof. The proof uses the formulation of ξ cpsd * (A) in terms of conic combinations of trace evaluations at matrix tuples in D(S cpsd A ) given in Proposition 3.2. We first show the inequality β ≤ ξ cpsd * (A), where β is the optimal value of the program in (10). For this, assume L ∈ R x * is a conic combination of trace evaluations at elements of D(S cpsd A ) such that A = (L(x i x j )). We will construct a feasible solution for (10) with objective value L(1). The linear functional L can be written as (7) and (9). This implies Y m This shows β ≤ L(1), and hence β ≤ ξ cpsd * (A). For the other direction we assume We have L(1) = m λ m d m and A = (L(x i x j )), and thus it suffices to show that each matrix tuple Y m belongs to D(S cpsd We can say a bit more when A lies on an extreme ray of the cone CS n + : In the formulation from Proposition 3.3 it suffices to restrict the minimization over all factorizations of A involving only one block. We know very little about the extreme rays of CS n + , also in view of the recent result that the cone is not closed for large n [Slo17]. Moreover, if ⊕ M m=1 X m 1 , . . . , ⊕ M m=1 X m n is a Gram decomposition of A providing an optimal solution to (10) and some block X m i has rank 1, then ξ cpsd * (A) = cpsd-rank C (A).
Proof. Let β denote the infimum in Proposition 3.4. The inequality ξ cpsd * (A) ≤ β follows from the reformulation of ξ cpsd * (A) in Proposition 3.3. To show the reverse inequality we consider a solution ⊕ M m=1 X m 1 , . . . , ⊕ M m=1 X m n to (10), and we set λ m = max i X m i 2 /A ii . We will show β ≤ m d m λ m . For this define the matrices A m = Gram(X m 1 , · · · , X m n ), so that A = m A m . As A lies on an extreme ray of CS n + , we must have A m = α m A for some α m > 0 with m α m = 1. 3.2 Additional localizing constraints to improve on ξ cpsd *

(A)
In order to strengthen the bounds we may require nonnegativity over a (truncated) quadratic module generated by a larger set of localizing polynomials for A. The following lemma gives one such approach.
for A (at any Gram factorization by Hermitian positive semidefinite matrices).
Proof. If X 1 , . . . , X n is a Gram decomposition of A by Hermitian positive semidefinite matrices, then Given a set V ⊆ R n , we consider the larger set By scaling invariance, we can add the above constraints for all v ∈ R n by setting V to be the unit sphere S n−1 . Since S n−1 is a compact metric space, there exists a sequence V 1 ⊆ V 2 ⊆ . . . ⊆ S n−1 of finite subsets such that k≥1 V k is dense in S n−1 .
Propositions 3.1 and 3.2 have analogues for the programs ξ cpsd t,V (A). These show that ξ cpsd . Since ε > 0 was arbitrary, this completes the proof.
Consider the map Let u ∈ S n−1 and let v ∈ V k be such that (11) holds. Using Cauchy-Schwarz we have (6) and (7), which implies X i ≤ √ A ii . By the reverse triangle inequality we then have Combining the above inequalities we obtain that u T Au − f X (u) ≥ 0 for all S n−1 , and hence We now discuss two examples where the bounds ξ cpsd * ,V (A) go beyond ξ cpsd * (A).

Boosting the bounds
In this section we propose some additional constraints that can be added to strengthen the bounds ξ cpsd t,V (A) for finite t. These constraints sometimes shrink the feasibility region of ξ cpsd t,V (A) for t ∈ N, but they are redundant for the parameters ξ cpsd ∞,V (A) and ξ cpsd * ,V (A). The latter is shown using the reformulation of ξ cpsd t,V (A) (t ∈ {∞, * }) in terms of C * -algebras. We first mention how to construct localizing constraints of "bilinear type", inspired by the work of Berta, Fawzi and Scholz [BFS16]. Note that as for localizing constraints, these bilinear constraints can be modeled as semidefinite constraints. Lemma 3.9. Let A ∈ CS n + , t ∈ N ∪ {∞, * }, and let g, g be localizing for A. If we add the constraints to ξ cpsd t,V (A), then it still lower bounds cpsd-rank C (A). Moreover, if g, g ∈ M(S cpsd A,V ) then the constraints (13) are redundant for ξ cpsd ∞,V (A) and ξ cpsd * ,V (A).
Proof. Let X ∈ (H d + ) n be a Gram decomposition of A, and let L be the real part of the trace evaluation at X. Then, p(X) * g(X)p(X) 0 and g (X) 0, hence L(p * gpg ) = Re(Tr(p(X) * g(X)p(X)g (X))) ≥ 0. Now suppose L is feasible for ξ cpsd t,V (A) with t ∈ {∞, * }. By Theorem 2.1 there exist a unital C * -algebra A with tracial state τ and X ∈ D(S cpsd A,V ) such that L(p) = L(1)τ (p(X)) for all p ∈ R x . Since g, g ∈ M(S cpsd A,V ) we know that g(X), g (X) are positive elements in A, that is, g(X) = a * a and g (X) = b * b for some a, b ∈ A. Then we have L(p * gpg) = L(1) τ (p * (X) g(X) p(X) g (X)) where we use that τ is a positive tracial state on A.
Second, we show how to use zero entries in A and vectors in the kernel of A to enforce new constraints on ξ cpsd t (A).
to ξ cpsd t,V (A), then we still get a lower bound on cpsd-rank C (A). Moreover, these constraints are redundant for ξ cpsd ∞,V (A) and ξ cpsd * ,V (A).
Proof. Let X ∈ (H d + ) n be a Gram decomposition of A, and let L be the real part of the trace evaluation L X . If Av = 0, then 0 )) = 0. If A ij = 0, then Tr(X i X j ) = 0, which implies X i X j = 0, since X i and X j are positive semidefinite. It follows that L(x i x i p) = Re(Tr(X i X j p(X))) = 0.
As in the proof of the previous lemma, if L is feasible for ξ cpsd t,V (A) with t ∈ {∞, * } then, by Theorem 2.1, there exist a unital C * -algebra A with tracial state τ and X ∈ D(S cpsd A,V ) such that L(p) = L(1)τ (p(X)) for all p ∈ R x . Moreover, by Lemma 2.10 we may assume τ to be faithful. For a vector v in the kernel of A we have 0 = v T Av = L(( i v i x i ) 2 ) = L(1)τ (( i v i X i ) 2 ), and hence, since τ is faithful, i v i X i = 0 in A. It follows that L(p( i v i x i )) = L(1)τ (p(X) 0) = 0 for all p ∈ R x . Analogously, if A ij = 0, then L(x i x j ) = 0 implies τ (X i X j ) = 0 and, since X i , X j are positive in A and τ is faithful, thus X i X j = 0. From which it follows that L(px i x j ) = 0 for all p ∈ R x . Note, the constraints L(p ( n i=1 v i x i )) = 0 for p ∈ R x t , which are implied by (14), are in fact redundant, because, if v ∈ ker(A), then the vector obtained by extending v with zeros belongs to ker(M t (L)), since M t (L) 0. Also, for an implementation of ξ cpsd t (A) with the additional constraints (14), it is more efficient to index the moment matrices with a basis for

Additional properties of the bounds
Here we list some additional properties of the parameters ξ cpsd  We also have the following direct sum property, where the equality follows using the C *algebra reformulations as given in Proposition 3.1 and Proposition 3.2.
). Let P A be the projection onto the space i Im(X i ) and define L A ∈ R x * by L A (p) = α · τ (p(X)P A ). It follows that L A is is nonnegative on M(S cpsd A ), and In the same way we consider the projection P B onto the space j Im(Y j ) and define a feasible solution L B for ξ cpsd t (B) with L B (1) = ατ (P B ). By Lemma 2.10 we may assume τ to be faithful, so that positivity of X i and Y j together with τ (X i Y j ) = 0 implies X i Y j = 0 for all i and j, and thus i Im(X i ) ⊥ j Im(Y j ). This implies I P A + P B and thus τ (P A + P B ) ≤ τ (1) = 1. We have Note that the cpsd-rank of a matrix satisfies the same properties as those mentioned in the above two lemmas, where the inequality in Lemma 3.12 is always an equality: cpsd-rank C (A ⊕ B) = cpsd-rank C (A) + cpsd-rank C (B) [PSVW16,GdLL17a].
The following lemma shows that the first level of our hierarchy is at least as good as the analytic lower bound (15) on the cpsd-rank derived in [PSVW16, Theorem 10]. Lemma 3.13. For any non-zero matrix A ∈ CS n + we have Proof. Let L be feasible for ξ cpsd Moreover, the matrix M 1 (L) is positive semidefinite. By taking the Schur complement with respect to its upper left corner (indexed by 1) it follows that the matrix L(1) · A − (L(x i )L(x j )) is positive semidefinite. Hence the sum of its entries is nonnegative, which gives A ii ) 2 and shows the desired inequality.
As an application of Lemma 3.13, the first bound ξ cpsd 1 is exact for the k × k identity matrix: ξ cpsd 1 (I k ) = cpsd-rank C (I k ) = k. Moreover, by combining with Lemma 3.11, it follows that ξ cpsd 1 (A) ≥ k if A contains a diagonal positive definite k × k principal submatrix. A slightly more involved example is given by the 5 × 5 circulant matrix A whose entries are given by A ij = cos((i − j)4π/5) 2 (i, j ∈ [5]); this matrix was used in [FGP + 15] to show a separation between the completely positive semidefinite cone and the completely positive cone, and it was shown that cpsd-rank C (A) = 2. The analytic lower bound of [PSVW16] also evaluates to 2, hence Lemma 3.13 shows that our bound is tight on this example.
We now examine further analytic properties of the parameters ξ cpsd t (·). For each r ∈ N, the set of matrices A ∈ CS n + with cpsd-rank C (A) ≤ r is closed, which shows that the function A → cpsd-rank C (A) is lower semicontinuous. We now show that the functions A → ξ cpsd t (A) have the same property. The other bounds defined in this paper are also lower-semicontinuous, with a similar proof. Proof. It suffices to show the result for t ∈ N, because ξ cpsd ∞,V (A) = sup t ξ cpsd t,V (A), and the pointwise supremum of lower semicontinuous functions is lower semicontinuous. We show that the level sets {A ∈ S n : ξ cpsd t,V (A) ≤ r} are closed. For this we consider a sequence (A k ) k∈N of symmetric matrices converging to A ∈ S n such that ξ cpsd t,V (A k ) ≤ r for all k. We show that ξ cpsd t,V (A) ≤ r. Let L k ∈ R x * 2t be an optimal solution to ξ cpsd t,V (A k ). As L k (1) ≤ r for all k, it follows from Lemma 2.12 that there is a pointwise converging subsequence of (L k ), still denoted (L k ) for simplicity, that has a limit L ∈ R x * 2t with L(1) ≤ r. To complete the proof we show that L is feasible for ξ cpsd t,V (A). By the pointwise convergence of L k to L, for every ε > 0, p ∈ R x , and i ∈ [n], there exists a K ∈ N such that for all k ≥ K we have Hence we have where in the second inequality we use that 0 ≤ L k (p * x i p) ≤ L(p * x i p) + 1. Letting ε → 0 gives If we restrict to completely positive semidefinite matrices with an all-ones diagonal, that is, to CS n + ∩ E n , we can show an even stronger property. Here E n is the elliptope, which is the set of n × n positive semidefinite matrices with an all-ones diagonal.
is convex, and hence continuous on the interior of its domain.
Proof. Let A, B ∈ CS n + ∩ E n and 0 < λ < 1. Let L A and L B be optimal solutions for ξ cpsd t (A) and ξ cpsd t (B). Since the diagonals of A and B are the same, we have S cpsd Example 3.16. In this example we show that for t ≥ 1, the function is not continuous. For this we consider the matrices with cpsd-rank C (A k ) = 2 for all k ≥ 1. As A k is diagonal positive definite we have ξ cpsd t (A k ) = 2 for all t, k ≥ 1, while ξ cpsd t (lim k→∞ A k ) = 1. This argument extends to CS n + with n > 2. This example also shows that the first level of the hierarchy ξ cpsd 1 (·) can be strictly better than the analytic lower bound (15) of [PSVW16].

Lower bounds on the completely positive rank
The best current approach for lower bounding the completely positive rank of a matrix is due to Fawzi and Parrilo [FP16]. Their approach relies on the atomicity of the completely positive rank, that is, the fact that cp-rank(A) = r if and only if A has an atomic decomposition A = r k=1 v k v T k for nonnegative vectors v k . In other words, if cp-rank(A) = r, then A/r can be written as a convex combination of r rank one positive semidefinite matrices v k v T k that satisfy Based on this observation Fawzi and Parrilo define the parameter τ cp (A) = min α : α ≥ 0, A ∈ α · conv R ∈ S n : 0 ≤ R ≤ A, R A, rank(R) ≤ 1 , as lower bound for cp-rank(A). They also define the semidefinite programming parameter τ sos cp (A) = min α : α ∈ R, X ∈ S n 2 , α vec(A) T vec(A) X 0, Instead of the atomic point of view, here we take the matrix factorization perspective, which allows us to obtain bounds by adapting the techniques from Section 3 to the commutative setting. Indeed, we may view a factorization A = (a T i a j ) by nonnegative vectors as a factorization by diagonal (and thus pairwise commuting) positive semidefinite matrices.
Before presenting the details of our hierarchy of lower bounds, we mention some of our results in order to pin point the link to the parameters τ sos cp (A) and τ cp (A). The direct analogue of ξ cpsd t (A) in the commutative setting leads to a hierarchy that does not converge to τ cp (A), but we provide two approaches to strengthen it that do converge to τ cp (A). The first approach is based on a generalization of the tensor constraints in τ sos cp (A). We also provide a computationally more efficient version of these tensor constraints, leading to a hierarchy whose second level is at least as good as τ sos cp (A) while being defined by a smaller semidefinite program. The second approach relies on adding localizing constraints for vectors in the unit sphere as in Section 3.2.
The following hierarchy is a commutative analogue of the hierarchy from Section 3, where we may now add the localizing polynomials A ij − x i x j for all 1 ≤ i < j ≤ n, which was not possible in the noncommutative setting of the completely positive semidefinite rank. For each t ∈ N ∪ {∞} we consider the semidefinite program where we set We additionally define ξ cp * (A) by adding the constraint rank(M (L)) < ∞ to ξ cp ∞ (A). We also consider the strengthening ξ cp t, † (A), where we add to ξ cp t (A) the positivity constraints and the tensor constraints which generalize the case l = 2 used in the relaxation τ sos cp (A). Here, for a word w ∈ x , we denote by w c the corresponding (commutative) monomial in [x]. The tensor constraints (17) involve matrices indexed by the noncommutative words of length exactly l. In Section 4.4 we show a more economical way to rewrite these constraints as (L(mm )) m,m ∈[x] =l Q l A ⊗l Q T l , thus involving smaller matrices indexed by commutative words of degree l.
Note that, as before, we can strengthen the bounds by adding other localizing polynomials to the set S cp A . In particular, we can follow the approach of Section 3.2. Another possibility is to add localizing constraints specific to the commutative setting: we can add each monomial u ∈ [x] to S cp A (see Section 4.5.2 for an example). The bounds ξ cp t and ξ cp t, † are monotonically nondecreasing in t and they are invariant under simultaneously permuting the rows and columns of A and under scaling a row and column of A by a positive number. In

Comparison to τ sos cp (A)
We first show that the semidefinite programs defining ξ cp t, † (A) are valid relaxations for the completely positive rank. More precisely, we show that they lower bound τ cp (A).

Proof. It suffices to show the inequality for t = * . For this consider a decomposition
follow in a similar way.
It remains to be shown that X l A ⊗l for all l, where we set X l = (L(uv)) u,v∈ x =l . Note that X 1 = A. We adapt the argument used in [FP16] to show X l A ⊗l for all l ≥ 2. Suppose k for each k. Scale by factor αλ k and sum over k to get Finally, combining with A ⊗(l−1) − X l−1 0 and A 0, we obtain Now we show that the new parameter ξ cp 2, † (A) is at least as good as τ sos cp (A). Later in Section 4.5.1 we will give an example where the inequality is strict. Proof. Let L be feasible for ξ cp 2, † (A). We will construct a feasible solution to the program defining τ sos cp (A) with objective value L(1), which implies τ sos cp (A) ≤ L(1) and thus the desired inequality.
For this set α = L(1) and define the symmetric n 2 × n 2 matrix X by X (i,j),(k,l) = L(x i x j x k x l ) for i, j, k, l ∈ [n]. Then the matrix is positive semidefinite. This follows because M is obtained from the principal submatrix of M 2 (L) indexed by the monomials 1 and where the rows/columns indexed by x j x i with 1 ≤ i < j ≤ n are duplicates of the rows/columns indexed by (16)), and for i = j this follows from which holds because of (7), the constraint L(p 2 ) ≥ 0 for deg(p) ≤ 2, and the constraint

Convergence of the basic hierarchy
We first summarize convergence properties of the hierarchy ξ cp t (A). Note that unlike in Section 3 where we can only claim the inequality ξ cpsd Proof. We may assume A = 0. Since Next, we give a reformulation for the parameter ξ cp * (A), which is similar to the formulation of τ cp (A), although we miss in it the constraint R A present in τ cp (A).
Proposition 4.4. We have Proof. This follows directly from the reformulation of ξ cp * (A) in Proposition 4.3 in terms of conic evaluations at points in D(S cp A ) after observing that, for v ∈ R n , we have v ∈ D(S cp A ) if and only if the matrix R = vv T satisfies 0 ≤ R ≤ A.

Additional constraints and convergence to τ cp (A)
The reformulation of the parameter ξ cp * (A) in Proposition 4.4 differs from τ cp (A) in that the constraint R A is missing. In order to have a hierarchy converging to τ cp (A) we need to add constraints to enforce that L can be decomposed as a conic combination of evaluation maps at nonnegative vectors v satisfying vv T A. Here we present two ways to achieve this goal. First we show that the tensor constraints (17) suffice in the sense that ξ cp * , † (A) = τ cp (A) (note that the constraints (16) are not needed for this result). However, because of the special form of the tensor constraints we do not know whether ξ cp t, † (A) admitting a flat optimal solution implies ξ cp t, † (A) = ξ cp * , † (A), and we do not know whether ξ cp ∞, † (A) = ξ cp * , † (A). Second, we adapt the approach of adding additional localizing constraints from Section 3.2 to the commutative setting, where we do show ξ cp ∞,S n−1 (A) = ξ cp * ,S n−1 (A) = τ cp (A). This yields a doubly indexed sequence of semidefinite programs whose optimal values converge to τ cp (A).
. Clearly also R k 0. It remains to show that R k A. For this we use the tensor constraints (17). Using that L is a conic combination of evaluation maps, we may rewrite these constraints as from which it follows that L(1)λ k R ⊗l k A ⊗l for all k ∈ [K]. Therefore, for all k ∈ [K] and all vectors v with v T R k v > 0 we have The second approach for reaching τ cp (A) is based on using the extra localizing constraints from Section 3.2. For V ⊆ S n−1 , define ξ cp t,V (A) by replacing the truncated quadratic module Combined with Proposition 4.6 this shows that we have a doubly indexed sequence ξ cp t,V k (A) of semidefinite programs that converges to τ cp (A) as t → ∞ and k → ∞.
Proof. The proof is the same as the proof of Proposition 4.4, with the following additional observation: Given a vector u ∈ R n , we have u ∈ D(S cp A,S n−1 ) only if uu T A. The latter follows from the additional localizing constraints: for each v ∈ R n we have

More efficient tensor constraints
Here we show that for any integer l ≥ 2 the constraint A ⊗l − (L((ww ) c )) w,w ∈ x =l 0, used in the definition of ξ cp t,+ (A), can be reformulated in a more economical way using matrices indexed by commutative monomials in [x] =l instead of noncommutative words in x =l . For this we exploit the symmetry in the matrices A ⊗l and (L((ww ) c )) w,w ∈ x =l for L ∈ R[x] * 2l . Recall that for a word w ∈ x , we let w c denote the corresponding (commutative) monomial in [x].
Define the matrix where, for m = x α 1 1 · · · x αn n ∈ [x] =l , we define the multinomial coefficient Proof. For m, m ∈ [x] l , the (m, m )-entry of the left hand side is equal to The symmetric group S l acts on x =l by ( where, for any σ ∈ S l , P σ ∈ R x =l × x =l is the permutation matrix defined by We claim that Q T l DQ l = P , the matrix in (21). Indeed, for any w, w ∈ x =l , we have Suppose Q l M Q T l 0, and let λ be an eigenvalue of M with eigenvector z. Since M P = P M , we may assume P z = z, for otherwise we can replace z by P z, which is still an eigenvector of M with eigenvalue λ. We may also assume z to be a unit vector. Then λ ≥ 0 can be shown using the identity Q T l DQ l = P as follows: We can now derive our symmetry reduction result: This shows that the matrix A ⊗l −(L((ww ) c )) w,w ∈ x =l is S l -invariant. Hence the claimed result follows by using Lemma 4.7 and Lemma 4.8.

Bipartite matrices
Consider the (p + q) × (p + q) matrices where J p,q denotes the all-ones matrix of size p × q. We have P (a, b) = P (0, 0) + D for some nonnegative diagonal matrix D. As can be easily verified, P (0, 0) is completely positive with cp-rank(P (0, 0)) = pq, so P (a, b) is completely positive with pq ≤ cp-rank (P (a, b)) ≤ pq + p + q. For p = 2 and q = 3 we have cp-rank(P (a, b)) = 6 for all a, b ≥ 0, which follows from the fact that 5 × 5 completely positive matrices with at least one zero entry have cp-rank at most 6; see [BSM03,Theorem 3.12]. Fawzi and Parrilo [FP16] show τ sos cp (P (0, 0)) = 6, and give a subregion of [0, 1] 2 where 5 < τ sos cp (P (a, b)) < 6. The next lemma shows the bound ξ cp 2, † (P (a, b)) is tight for all a, b ≥ 0 and therefore strictly improves on τ sos cp in this region.
Proof. Let L be feasible for ξ cp 2, † (P (a, b)) and let B = α c T c X be the principal submatrix of M 2 (L) where the rows and columns are indexed by It follows that c is the all ones vector c = 1. Moreover, if P (a, b) ij = 0 for some i = j, then the constraints L(x i x j u) ≥ 0 and L ((P (a, b) Diag(z 1 , . . . , z pq ) .
Finally, by the constraints . Combined with the above inequality, it follows that and hence ξ cp 2, † (P (a, b)) ≥ pq.

Examples related to the DJL-conjecture
The Drew-Johnson-Loewy conjecture [DJL94] states that the maximal cp-rank of an n × n completely positive matrix is equal to n 2 /4 . Recently this conjecture has been disproven for n = 7, 8, 9, 10, 11 in [BSU14]. Although our bounds are not tight for the cp-rank, they are nontrivial and as such may be of interest for future comparisons. For numerical stability reasons we have evaluated our bounds on scaled versions of the matrices from [BSU14], so that the diagonal entries become equal to 1. The matricesM 7 ,M 8 andM 9 correspond to the matrices M in Examples 1,2,3 of [BSU14], and M 7 , M 11 correspond to the matrices M in Examples 1 and 4. The column ξ cp 2, † (·) + x i x j corresponds to the bound ξ cp 2, † (·) where we replace S cp A by S cp A ∪ {x i x j : 1 ≤ i < j ≤ n}.

Lower bounds on the nonnegative rank
In this section we adapt the techniques for the cp-rank from Section 4 to the asymmetric setting of the nonnegative rank. We now view a factorization A = (a T i b j ) i∈[m],j∈ [n] by nonnegative vectors as a factorization by positive semidefinite diagonal matrices, by writing A ij = Tr(X i X m+j ), with X i = Diag(a i ) and X m+j = Diag(b j ). Note that we can view this as a "partial matrix" setting, where for the symmetric matrix (Tr(X i X k )) i,k∈ [m+n] of size m + n, only the off-diagonal entries at the positions (i, m + j) for i ∈ [m], j ∈ [n] are specified.
This asymmetry requires rescaling the factors in order to get upper bounds on their maximal eigenvalues, which is needed to ensure the Archimedean property for the selected localizing polynomials. For this we use the well-known fact that for any A ∈ R m×n + there exists a factorization A = (Tr(X i X m+j )) by diagonal nonnegative matrices of size rank + (A), such that where A max := max i,j A ij . To see this, observe that for any rank one matrix R = uv T with 0 ≤ R ≤ A, one may assume 0 ≤ u i , v j ≤ √ A max for all i, j. Hence, the set is localizing for A; that is, there exists a minimal factorization X of A with X ∈ D(S + A ). Given A ∈ R m×n ≥0 , for each t ∈ N ∪ {∞} we consider the semidefinite program Moreover, define ξ + * (A) by adding the constraint rank(M (L)) < ∞ to the program defining ξ + ∞ (A). It it easy to check that Note that these extra constraints can help for finite t, but are redundant for t ∈ {∞, * }.

Comparison to other bounds
As in the previous section, we compare our bounds to the bounds by Fawzi and Parrilo [FP16]. They introduce the following parameter τ + (A) as analogue of the bound τ cp (A) for the nonnegative rank: and the analogue τ sos + (A) of the bound τ sos cp (A) for the nonnegative rank: First we give the analogue of Proposition 4.3, whose proof we omit since it is very similar. Now we observe that the parameters ξ + ∞ (A) and ξ + * (A) coincide with τ + (A), so that we have a sequence of semidefinite programs converging to τ + (A).

Proposition 5.2. For any
Proof. The discussion at the beginning of Section 5 shows that for any rank one matrix R satisfying 0 ≤ R ≤ A we may assume that n]. Hence, τ + (A) can be written as In the remainder of this section we recall how τ + (A) and τ sos + (A) compare to other bounds in the literature. These bounds can be divided into two categories: combinatorial lower bounds and norm-based lower bounds. The following diagram from [FP16] summarizes how τ sos + (A) and τ + (A) relate to the combinatorial lower bounds  , j), (k, l)) : A il A kj = 0}. The coloring number of RG(A) coincides with the well known rectangle covering number (also denoted rank B (A)), which was used, e.g., in [FMP + 15] to show that the extension complexity of the correlation polytope is exponential. The clique number of RG(A) is also known as the fooling set number (see, e.g., [FKPT13]). Observe that the above combinatorial lower bounds only depend on the sparsity pattern of the matrix A, and that they are all equal to one for a strictly positive matrix. Fawzi and Parrilo [FP16] have furthermore shown that the bound τ + (A) is at least as good as norm-based lower bounds: These bounds are called norm-based since norms often provide valid functions N . For example, when N is the ∞ -norm, Rothvoß [Rot14] used the corresponding lower bound to show that the matching polytope has exponential extension complexity.
When N is the Frobenius norm: N (A) = i,j A 2 ij , the parameter N * (A) is known as the nonnegative nuclear norm. In [FP15] it is denoted by ν + (A), shown to satisfy rank + (A) ≥ (ν + (A)/||A|| F ) 2 , and reformulated as where the cone of copositive matrices is the dual of the cone of completely positive matrices. Fawzi and Parrilo [FP15] use the copositive formulation (24) to provide bounds ν [k] + (A) (k ≥ 0), based on inner approximations of the copositive cone from [Par00], which converge to ν + (A) from below. We now observe that by Theorem 2.8 the atomic formulation of ν + (A) from (23) can be seen as a moment optimization problem: Here, the optimization variable µ is required to be a Borel measure on the variety V (S), where (The same observation is made in [TS15] for the real nuclear norm of a symmetric 3-tensor and in [Nie16] for

Computational examples
We illustrate the performance of our approach by comparing our lower bounds ξ + 2, † and ξ + 3, † to the lower bounds τ + and τ sos + on the two examples considered in [FP16].
Since the parameters τ + (A) and τ sos + (A) are invariant under scaling and permuting rows and columns of A, one can use the identity to see this describes the parameters for all nonnegative 2 × 2 matrices. By using a semidefinite programming solver for α = k/100, k ∈ [100], we see that ξ + 2 (A(α)) coincides with τ + (A(α)).

The nested rectangles problem
In this section we consider the nested rectangles problem as described in [FP16, Section 2.7.2] (see also [MSvS03]), which asks for which a, b there exists a triangle T such that R(a, b) The nonnegative rank relates not only to the extension complexity of a polytope [Yan91], but also to extended formulations of nested pairs [BFPS15,GG12]. An extended formulation of a pair of polytopes P 1 ⊆ P 2 ⊆ R d is a (possibly) higher dimensional polytope K whose projection π(K) is nested between P 1 and P 2 . Suppose π(K) = {x ∈ R d : y ∈ R k + , (x, y) ∈ K} and K = {(x, y) : Ex + F y = g, y ∈ R k + }, then k is the size of the extended formulation, and the smallest such k is called the extension complexity of the pair (P 1 , P 2 ). It is known (cf. [BFPS15,Theorem 1]) that the extension complexity of the pair (P 1 , P 2 ), where is equal to the nonnegative rank of the generalized slack matrix S P 1 ,P 2 ∈ R m×n , defined by Any nonnegative matrix is the slack matrix of some nested pair of polytopes [GPT13, Lemma 4.1] (see also [GG12]). Applying this to the pair (R(a, b), P ), one immediately sees that there exists a polytope K with at most three facets whose projection T = π(K) ⊆ R 2 satisfies R(a, b) ⊆ T ⊆ P if and only if the pair (R(a, b), P ) admits an extended formulation of size 3. For a, b > 0, the polytope T has to be 2 dimensional, therefore K has to be at least 2 dimensional as well; it follows that K and T have to be triangles. Hence there exists a triangle T such that R(a, b) ⊆ T ⊆ P if and only if the nonnegative rank of the slack matrix S(a, b) := S R(a,b),P is equal to 3. One can verify that Such a triangle exists if and only if (1 + a)(1 + b) ≤ 2 (see [FP16,Proposition 4] for a proof sketch). To test the quality of their bound, Fawzi and Parrilo [FP16] compute τ sos + (S(a, b)) for different values of a and b. In doing so they determine the region where τ sos + (S(a, b)) > 3. We do the same for the bounds ξ + 1, † (S(a, b)), ξ + 2, † (S(a, b)) and ξ + 3, † (S(a, b)), see Figure 1. The results show that ξ + 2, † (S(a, b)) strictly improves upon the bound τ sos + (S(a, b)), and that ξ + 3, † (S(a, b)) is again a strict improvement over ξ + 2, † (S(a, b)).

Lower bounds on the positive semidefinite rank
The positive semidefinite rank can be seen as an asymmetric version of the completely positive semidefinite rank. Hence, as was the case in the previous section for the nonnegative rank, we need to select suitable factors in a minimal factorization in order to be able to bound their maximum eigenvalues and obtain a localizing set of polynomials leading to an Archimedean quadratic module. For this we can follow, e.g., the approach in [LWdW17, Lemma 5] to rescale a factorization and claim that, for any A ∈ R m×n + with psd-rank C (A) = d, there exists a factorization A = ( X i , X m+j ) by matrices X 1 , . . . , X m+n ∈ H d + such that m i=1 X i = I and Tr(X m+j ) = i A ij for all j ∈ [n]. Indeed, starting from any factorization X i , X m+j ∈ H d + of A, we may replace X i by X −1/2 X i X −1/2 and X m+j by X 1/2 X m+j X 1/2 , where X := m i=1 X i is positive definite (by  Figure 1: The colored region corresponds to rank + (S(a, b)) = 4. The top right region (black) corresponds to ξ + 1, † (S(a, b)) > 3, the two top right regions (black and red) together correspond to τ sos + (S(a, b)) > 3, the three top right regions (black, red and yellow) to ξ + 2, † (S(a, b)) > 3, and the four top right regions (black, red, yellow, and green) to ξ + 3, † (S(a, b)) > 3 minimality of d). This argument shows that the set of polynomials is localizing for A; that is, there is at least one minimal factorization X of A such that g(X) 0 for all polynomials g ∈ S psd A . Moreover, for the same minimal factorization X of A we have p(X)(1 − m i=1 X i ) = 0 for all p ∈ R x . Given A ∈ R m×n ≥0 , for each t ∈ N ∪ {∞} we consider the semidefinite program We additionally define ξ psd * (A) by adding the constraint rank(M (L)) < ∞ to the program defining ξ psd ∞ (A) (and considering the infimum instead of the minimum, since we do not know if the infimum is attained in ξ psd * (A)). By the above discussion it follows that the parameter ξ psd * (A) is a lower bound on psd-rank C (A) and we have Note that, in contrast to the previous bounds, the parameter ξ psd t (A) is not invariant under rescaling the rows of A or under taking the transpose of A (see Section 6.2.2).
It follows from the construction of S psd A and Equation (7) that the quadratic module M(S psd A ) is Archimedean, and hence the following analogue of Proposition 3.1 can be shown. Moreover, ξ psd ∞ (A) is equal to the smallest α ≥ 0 for which there exists a unital C * -algebra A with tracial state τ and X ∈ D A (S psd ,j∈ [n] .

Comparison to other bounds
In [LWdW17] the following bound on the complex positive semidefinite rank was derived: If a feasible linear form L to ξ psd t (A) satisfies the inequalities L( , then L(1) is at least the above lower bound. Indeed, the inequalities give [n] L(x i x m+j ) and hence The inequalities L(x i ( i A ij − x m+j )) ≥ 0 are easily seen to be valid for trace evaluations at points of D(S psd A ). More importantly, as in Lemma 3.9, these inequalities are satisfied by feasible linear forms to the programs ξ psd ∞ (A) and ξ psd * (A). Hence, ξ psd ∞ (A) and ξ psd * (A) are at least as good as the lower bound (25).
In [LWdW17] two other fidelity based lower bounds on the psd-rank were defined; we do not know how they compare to ξ psd t (A).

Computational examples
In this section we apply our bounds to some (small) examples taken from the literature, namely 3 × 3 circulant matrices and slack matrices of small polygons.

Nonnegative circulant matrices of size 3
We consider the nonnegative circulant matrices of size 3 which are, up to scaling, of the form Hence, if b and c do not satisfy the above relation then psd-rank R (M (b, c)) = 3. To see how good our lower bounds are for this example, we use a semidefinite programming solver to compute ξ psd 2 (M (b, c)) for (b, c) ∈ [0, 4] 2 (with stepsize 0.01). In Figure 2 we see that the bound ξ psd 2 (M (b, c)) certifies that psd-rank R (M (b, c)) = psd-rank C (M (b, c)) = 3 for most values (b, c) where psd-rank R (M (b, c)) = 3.

Polygons
Here we consider the slack matrices of two polygons in the plane, where the bounds are sharp (after rounding) and illustrate the dependence on scaling the rows or taking the transpose. We consider the quadrilateral Q with vertices (0, 0), (0, 1), (1, 0), (2, 2), and the regular hexagon H, whose slack matrices are given by Our lower bounds on the psd-rank C are not invariant under taking the transpose, indeed numerically we have ξ psd 2 (S Q ) ≈ 2.266 and ξ psd 2 (S T Q ) ≈ 2.5. The slack matrix S Q has psd-rank R (S Q ) = 3 (a corollary of [GRT13,Theorem 4.3]) and therefore both bounds certify psd-rank C (S Q ) = 3 = psd-rank R (S Q ).
Secondly, our bounds are not invariant under rescaling the rows of a nonnegative matrix. Numerically we have ξ psd 2 (S H ) ≈ 1.99 while ξ psd 2 (DS H ) ≈ 2.12, where D = Diag(2, 2, 1, 1, 1, 1). The bound ξ psd 2 (DS H ) is in fact tight (after rounding) for the complex positive semidefinite rank of DS H and hence of S H : in [GGS16] it is shown that psd-rank C (S H ) = 3.

Discussion and future work
In this work we provide a unified approach for the four matrix factorizations obtained by considering (a)symmetric factorizations by nonnegative vectors and positive semidefinite matrices. Our methods can be extended to the nonnegative tensor rank: The smallest integer d for which a k-tensor A ∈ R n 1 ×···×n k + can be written as A = d l=1 u 1,l ⊗ · · · ⊗ u k,l for nonnegative vectors u j,l ∈ R n j + . The approach from Section 5 for rank + can be extended to obtain a hierarchy of lower bounds on the nonnegative tensor rank. For instance, if A is a 3-tensor, the analogous bound ξ + t (A) is obtained by minimizing L(1) over L ∈ R[x 1 , . . . , x n 1 +n 2 +n 3 ] * such that , using as localizing polynomials in S + A the polynomials 3 √ A max x i − x 2 i and A i 1 i 2 i 3 − x i 1 x n 1 +i 2 x n 1 +n 2 +i 3 . As in the matrix case one can compare to the bounds τ + (A) and τ sos + (A) from [FP16]. One can show ξ + * (A) = τ + (A), and, by adding the conditions L(x i 1 x n 1 +i 2 x n 1 +n 2 +i 3 (A i 1 i 2 i 3 −x i 1 x n 1 +i 2 x n 1 +n 2 +i 3 )) ≥ 0 to ξ + 3 (A), one can show ξ + 3, † (A) ≥ τ sos + (A). Testing membership in the completely positive cone and the completely positive semidefinite cone is another important problem, to which our hierarchies can also be applied. It follows from the proof of Proposition 4.3 that if A is not completely positive then, for some order t, the program ξ cp t (A) is infeasible or its optimum value is larger than the Caratheodory bound on the cp-rank (which is similar to an earlier result in [Nie14]). In the noncommutative setting the situation is more complicated: If ξ cpsd * (A) is feasible, then A ∈ CS + , and if A ∈ CS n +,vN , then ξ cpsd ∞ (A) is infeasible (Propositions 3.1 and 3.2). Here CS n +,vN is the cone defined in [BLP17] consisting of the matrices admitting a factorization in a von Neumann algebra with a trace. By Lemma 2.10, CS n +,vN can equivalently be characterized as the set of matrices of the form α (τ (a i a j )) for some C * -algebra A with tracial state τ , positive elements a 1 , . . . , a n ∈ A and α ∈ R + .
Our lower bounds are on the complex version of the (completely) positive semidefinite rank. As far as we are aware, the existing lower bounds (except for the dimension counting rank lower bound) are also on the complex (completely) positive semidefinite rank. It would be interesting to find a lower bound on the real (completely) positive semidefinite rank that can go beyond the complex (completely) positive semidefinite rank.
We conclude with some open questions regarding applications of lower bounds on matrix factorization ranks. First, as was shown in [PSVW16,GdLL17a,PV17], completely positive semidefinite matrices whose cpsd-rank C is larger than their size do exist, but currently we do not know how to construct small examples for which this holds. Hence, a concrete question: Does there exist a 5 × 5 completely positive semidefinite matrix whose cpsd-rank C is at least 6? Second, as we mentioned before, the asymmetric setting corresponds to (semidefinite) extension complexity of polytopes. Rothvoß' result [Rot14] (indirectly) shows that the parameter ξ + ∞ is exponential (in the number of nodes of the graph) for the slack matrix of the matching polytope. Can this result also be shown directly using the dual formulation of ξ + ∞ , that is, by a sum-ofsquares certificate? If so, could one extend the argument to the noncommutative setting (which would show a lower bound on the semidefinite extension complexity)?
We show (1) ⇒ (2) by applying a GNS construction. Consider the quotient vector space R x /N (L), and denote the class of p in R x /N (L) by p. We can equip this quotient with the inner product p, q = L(p * q) for p, q ∈ R x , so that the completion H of R x /N (L) is a separable Hilbert space. As N (L) is a left ideal in R x , the operator is well defined. We have so the X i are self-adjoint. Since g ∈ S ∪{1} is symmetric and p, g(X)p = p, gp = L(p * gp) ≥ 0 for all p we have g(X) 0. By the Archimedean condition, there exists an R > 0 such that So each X i extends to a bounded self-adjoint operator, also denoted X i , on the Hilbert space H such that g(X) is positive for all g ∈ S ∪ {1}. Moreover, we have f , h(X)1 = L(f * h) = 0 for all f ∈ R x , h ∈ T . The operators X i ∈ B(H) extend to self-adjoint operators in B(C ⊗ R H), where C ⊗ R H is the complexification of H. Let A be the unital C * -algebra obtained by taking the operator norm closure of R X ⊆ B(C ⊗ R H). It follows that X ∈ D A (S) ∩ V A (T ).
Define the state τ on A by τ (a) = 1, a1 for a ∈ A. For all p, q ∈ R x we have τ (p(X)q(X)) = 1, p(X)q(X)1 = 1, pq = L(pq), so that the restriction of τ to R X is tracial. Since R X is dense in A in the operator norm, this implies τ is tracial.
To conclude the proof, observe that (3) follows from (27) by taking q = 1.
Theorem 2.2. For S ⊆ Sym R x , T ⊆ R x , L ∈ R x * , the following are equivalent: (1) L is a symmetric, tracial, linear form with L(1) = 1 that is nonnegative on M(S), zero on I(T ), and has rank(M (L)) < ∞; (2) there is a finite dimensional C * -algebra A with a tracial state τ , and X ∈ D A (S) ∩ V A (T ) satisfying equation (3); (3) L is a convex combination of normalized trace evaluations at points in D(S) ∩ V(T ).
Proof. ((1) ⇒ (2)) Here we can follow the proof of Theorem 2.1, with the extra observation that the condition rank(M (L)) < ∞ implies that the quotient space R x /N (L) is finite dimensional. Since R x /N (L) is finite dimensional the multiplication operators are bounded, and the constructed C * -algebra A is finite dimensional.
((2) ⇒ (3)) By Artin-Wedderburn theory we have the * -isomorphism Define the * -homomorphisms ϕ m : A → C dm×dm by ϕ = ⊕ m ϕ m . The map C dm×dm → C defined by X → τ (ϕ −1 m (X)) is a positive tracial linear form, and hence a nonnegative multiple λ m tr(·) of the normalized matrix trace (since, for a full matrix algebra, the normalized trace is the unique tracial state). Then τ (a) = m λ m tr(ϕ m (a)) for nonnegative scalars λ m with m λ m = L(1) = 1. By defining the matrices X m i = ϕ m (X i ) for m ∈ [M ], we get Since ϕ m is a * -homomorphism we have g(X m 1 , . . . , X m n ) 0 for all g ∈ S ∪ {1} and also h(X m 1 , . . . , X m n ) = 0 for all h ∈ T , which shows (X m 1 , . . . , X m n ) ∈ D(S) ∩ V(T ). ((3) ⇒ (1)) If L is a conic combination of trace evaluations at elements from D(S) ∩ V(T ), then L is symmetric, tracial, nonnegative on M(S), zero on I(R), and satisfies rank(M (L)) < ∞ because the moment matrix of any trace evaluation has finite rank.
is symmetric, tracial, δ-flat, nonnegative on M 2t (S), and zero on I 2t (T ). Then L extends to a symmetric, tracial, linear form on R x that is nonnegative on M(S), zero on I(T ), and whose moment matrix has finite rank.
Proof. Let W ⊆ x t−δ index a maximum nonsingular submatrix of M t−δ (L), and let span(W ) be the linear space spanned by W . We have the vector space direct sum That is, for each u ∈ x t there exists a unique r u ∈ span(W ) such that u − r u ∈ N t (L). We first construct the (unique) symmetric flat extensionL ∈ R x 2t+2 of L. For this we set L(p) = L(p) for deg(p) ≤ 2t, and we set for all i, j ∈ [n] and u, v ∈ x with |u| = |v| = t. One can verify thatL is symmetric and satisfies x i (u − r u ) ∈ N t+1 (L) for all i ∈ [n] and u ∈ R x t , from which it follows thatL is 2-flat.
We also have (u − r u )x i ∈ N t+1 (L) for all i ∈ [n] and u ∈ R x t : SinceL is 2-flat, we have (u−r u )x i ∈ N t+1 (L) if and only ifL(p(u−r u )x i ) = 0 for all p ∈ R x t−1 . By using deg(x i p) ≤ t, L is tracial, and u − r u ∈ N t (L), we getL(p(u − r u )x i ) = L(p(u − r u )x i ) = L(x i p(u − r u )) = 0.
By consecutively using (v − r v )x j ∈ N t+1 (L), symmetry ofL, x i (u − r u ) ∈ N t+1 (L), and again symmetry ofL, we see that and in an analogous way one can shoŵ We can now show thatL is tracial. We do this by showing thatL(wx j ) =L(x j w) for all w with deg(w) ≤ 2t + 1. Notice that when deg(w) ≤ 2t − 1 the statement follows from the fact thatL is an extension of L. Suppose w = u * v with deg(u) = t + 1 and deg(v) ≤ t. We write u = x i u , and we let r u , r v ∈ R x t−1 be such that u − r u , v − r v ∈ N t (L). We then havê L(wx j ) =L(u * vx j ) =L((x i u ) * vx j ) =L((x i r u ) * r v x j ) by (29) (30) =L(x j w).
It followsL is a symmetric tracial flat extension of L, and rank(M (L)) = rank(M (L)). Next, we iterate the above procedure to extend L to a symmetric tracial linear functional L ∈ R x * . It remains to show thatL is nonnegative on M(S) and zero on I(T ). For this we make two observations: (i) I(N t (L)) ⊆ N (L).
For (i) we use the (easy to check) fact that N t (L) = span({u − r u : u ∈ x t }). Then it suffices to show that w(u − r u ) ∈ N (L) for all w ∈ x , which can be done using induction on |w|. From (i) one easily deduces that span(W ) ∩ N (L) = {0}, so we have the direct sum span(W ) ⊕ I(N t (L)). The claim (ii) follows using induction on the length of w ∈ x : The base case w ∈ x t follows from (28). Let w = x i v ∈ x and assume v ∈ span(W ) ⊕ I(N t (L)), that is, v = r v + q v where r v ∈ span(W ) and q v ∈ I(N t (L)). We have x i v = x i r v + x i q v so it suffices to show x i r v , x i q v ∈ span(W ) ⊕ I(N t (L)). Clearly x i q v ∈ I(N t (L)), since q v ∈ I(N t (L)). Also, observe that x i r v ∈ R x t and therefore x i r v ∈ span(W ) ⊕ I(N t (L)) by (28).
We conclude the proof by showing thatL is nonnegative on M(S) and zero on I(T ). Let g ∈ M(S), h ∈ I(T ), and p ∈ R x . For p ∈ R x we extend the definition of r p so that r p ∈ span(W ) and p − r p ∈ I(N t (L)), which is possible by (ii). Then, L(p * gp) (2) there exists a unital commutative C * -algebra A with a state τ and X ∈ D A (S) ∩ V A (T ) such that L(p) = τ (p(X)) for all p ∈ R[x]; Proof. ((1) ⇒ (2)) This is the commutative analogue of the implication (1) ⇒ (2) in Theorem 2.1 (observing in addition that the operators X i in (26) pairwise commute so that the constructed C * -algebra A is commutative). ((2) ⇒ (3)) Let A denote the set of unital * -homomorphisms A → C, known as the spectrum of A. We equip A with the weak- * topology, so that it is compact as a result of A being unital (see, e.g., [Bla06, II.2.1.4]). The Gelfand representation is the * -isomorphism Γ : A → C( A), Γ(a)(φ) = φ(a) for a ∈ A, φ ∈ A, where C( A) is the set of complex-valued continuous functions on A. Since Γ is an isomorphism, the state τ on A induces a state τ on C( A) defined by τ (Γ(a)) = τ (a) for a ∈ A. By the Riesz representation theorem (see, e.g., [Rud87, Theorem 2.14]) there is a Radon measure ν on A such that τ (Γ(a)) = A Γ(a)(φ) dν(φ) for all a ∈ A.
Proof. ((1) ⇒ (2)) We indicate how to derive this claim from its noncommutative analogue. For this denote the commutative version of p ∈ R x by p c ∈ R[x]. For any g ∈ S and h ∈ T , select symmetric polynomials g , h ∈ R x with (g ) c = g and (h ) c = h, and set S = g : g ∈ S and T = h : h ∈ T ∪ x i x j − x j x i ∈ R x : i, j ∈ [n], i = j .
Define the linear form L ∈ R x * by L (p) = L(p c ) for p ∈ R x . Then L is symmetric, tracial, nonnegative on M(S ), zero on I(T ), and satisfies rank M (L ) = rank M (L) < ∞. Following For the inequality we use the fact that L t (w * (x 2 i − R)w) ≤ 0 since w * (R − x 2 i )w can be written as the sum of a polynomial in M 2t (S) + I 2t (T ) and a sum of commutators of degree at most 2t, which follows using the following identity: w * qhw = ww * qh + [w * qh, w]. Next we write any w ∈ x 2(t−d+1) as w = w * 1 w 2 with w 1 , w 2 ∈ x t−d+1 and use the positive semidefiniteness of the principal submatrix of M t (L t ) indexed by {w 1 , w 2 } to get L t (w) 2 = L t (w * 1 w 2 ) 2 ≤ L t (w * 1 w 1 )L t (w * 2 w 2 ) ≤ R |w 1 |+|w 2 | L t (1) 2 = R |w| L t (1) 2 .
This shows the first claim. Suppose c := sup t L t (1) < ∞. For each t ∈ N, consider the linear functionalL t ∈ R x * defined byL t (w) = L t (w) if |w| ≤ 2t − 2d + 2 andL t (w) = 0 otherwise. Then the vector (L t (w)/(cR |w|/2 )) w∈ x lies in the supremum norm unit ball of R x , which is compact in the weak * topology by the Banach-Alaoglu theorem. It follows that the sequence (L t ) t has a pointwise converging subsequence and thus the same holds for the sequence (L t ) t .
(i) If M(S) is Archimedean, then f tr t → f tr ∞ as t → ∞, and the optimal values in f tr ∞ and f tr II 1 are attained and equal.
(ii) If f tr t has an optimal solution L that is δ-flat, then L is a convex combination of normalized trace evaluations at matrix tuples in D(S), and f tr t = f tr ∞ = f tr II 1 = f tr * . Proof. We first show (i). As M(S) is Archimedean, R − n i=1 x 2 i ∈ M 2d (S) for some R > 0 and d ∈ N. Since the bounds f tr t are monotone nondecreasing in t and upper bounded by f tr ∞ , the limit lim t→∞ f tr t exists and it is at most f tr ∞ . Fix ε > 0. For each t ∈ N let L t be a feasible solution to the program defining f tr t with value L t (f ) ≤ f tr t + ε. As L t (1) = 1 for all t we can apply Lemma 2.12 and conclude that the sequence (L t ) t has a convergent subsequence. Let L ∈ R x * be the pointwise limit. One can easily check that L is feasible for f tr ∞ . Hence we have f tr ∞ ≤ L(f ) ≤ lim t→∞ f tr t + ε ≤ f tr ∞ + ε. Letting ε → ∞ we obtain that f tr ∞ = lim t→∞ f tr t and L is optimal for f tr ∞ . Next, since L is symmetric, tracial, and nonnegative on M(S), we can apply Theorem 2.1 to obtain a feasible solution (A, τ, X) to f tr II 1 satisfying (3) with objective value L(f ). This shows f tr ∞ = f tr II 1 and that the optima are attained in f tr ∞ and f tr II 1 . Finally, part (ii) is derived as follows. If L is an optimal solution of f tr t that is δ-flat, then, by Corollary 2.4, it has an extensionL ∈ R x * that is a conic combination of trace evaluations at elements of D(S). This shows f tr * ≤L(f ) = L(f ), and thus the chain of equalities f tr t = f tr ∞ = f tr * = f tr Π 1 holds.
We now derive the commutative analogue of the above theorem.
(i) If M(S) is Archimedean, then f t → f ∞ as t → ∞, the optimal values in f ∞ and f * are attained, and f ∞ = f * .
(ii) If f t admits an optimal solution L that is δ-flat, then L is a convex combination of evaluation maps at global minimizers of f in D(S), and f t = f ∞ = f * .
Proof. (i) By repeating the first part of the proof of Theorem 2.11 in the commutative setting we see that f t → f ∞ and that the optimum is attained in f ∞ . Let L be optimal for f ∞ and let k be greater than deg(f ) and deg(g) for g ∈ S. By Theorem 2.8, the restriction of L to R[x] k extends to a conic combination of evaluations at points in D(S). It follows that this extension if feasible for f * with the same objective value, which shows f ∞ = f * .
(ii) This follows in the same way as the proof of Theorem 2.11(ii), where, instead of using Corollary 2.4, we now use its commutative analogue, Theorem 2.7.