Facial reduction for symmetry reduced semidefinite and doubly nonnegative programs

We consider both facial reduction, FR, and symmetry reduction, SR, techniques for semidefinite programming, SDP. We show that the two together fit surprisingly well in an alternating direction method of multipliers, ADMM, approach. In fact, this approach allows for simply adding on nonnegativity constraints, and solving the doubly nonnegative, DNN , relaxation of many classes of hard combinatorial problems. We also show that the singularity degree remains the same after SR, and that the DNN relaxations considered here have singularity degree one, that is reduced to zero after FR. The combination of FR and SR leads to a significant improvement in both numerical stability and running time for both the ADMM and interior point approaches. We test our method on various DNN relaxations of hard combinatorial problems including quadratic assignment problems with sizes of more than \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=500$$\end{document}n=500. This translates to a semidefinite constraint of order 250, 000 and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$625\times 10^8$$\end{document}625×108 nonnegative constrained variables, before applying the reduction techniques.

1 Introduction We consider two reduction techniques, facial and symmetry reduction, for semidefinite programming, SDP.We see that the exposing vector approach for facial reduction, FR, moves naturally onto the symmetry reduction, SR.We show that the combination of the two reductions fits surprisingly well in an alternating direction method of multipliers, ADMM, approach.In fact, the combination of FR and SR makes possible solving the doubly nonnegative, DNN , relaxations of many classes of hard combinatorial problems by using ADMM.The combination of facial and symmetry reduction also leads to a significant improvement in both numerical stability and running time for both the ADMM and interior point approaches.We test our method on various DNN relaxations of hard combinatorial problems including quadratic assignment problems (QAP) with sizes of more than n = 500, see Table 5.1.Note that the order of the symmetric matrix variable in the SDP relaxation of the QAP with n = 500, before applying the reduction techniques, is 250, 000.This yields approximately 625 × 10 8 nonnegatively constrained variables in the semidefinite constrained matrix of the original, not reduced, problem formulation.Semidefinite programming can be viewed as an extension of linear programming where the nonnegative orthant is replaced by the cone of positive semidefinite matrices.Although there are many algorithms for solving semidefinite programs, they currently do not scale well and often do not provide high accuracy solutions.An early method for exploiting sparsity and reducing problem size was based on recognizing a chordal pattern in the matrices forming the SDP, see e.g., [21,32], and the survey [64].Another technique is that of symmetry reduction, a methodology, pioneered by Schrijver [52], that exploits symmetries in the data matrices that allows for the problem size to be reduced, often significantly.More details and surveys for SR are available in [2,13] Without loss of generality, we consider the case where the primal problem has a finite optimal value.Then for linear programming, strong duality holds for both the primal and the dual problems.But, this is not the case for SDP, where the primal and/or the dual can be unattained, and one can even have a positive duality gap between the primal and dual optimal values.The usual constraint qualification to guarantee strong duality is the Slater condition, strict feasibility.Failure of the Slater condition may lead to theoretical and numerical problems when solving the SDP.Facial reduction, FR, introduced by Borwein and Wolkowicz [5][6][7], addresses this issue by projecting the minimal face of the SDP into a lower dimensional space.The literature for the theory and applications for FR is large.For a recent survey and theses see [18,46,55].
An earlier work [39] combines partial FR and SR for solving sum of square (SOS) programs.In particular, Löfberg [39] applies a partial FR via monomial basis selection and shows how to perform a partial SR via identification of sign-symmetries to obtain block-diagonal SOS programs.Examples in [39] verify the efficiency of the combined approach for SOS programs.For the connection between FR and monomial basis selection see [18,65].
In our paper, we assume that we know how to do FR and SR separately for the input SDP instance.Under this assumption, we show that it is possible to implement FR to the symmetry reduced SDP.The obtained reduced SDP is both facially reduced and symmetry reduced.And, it can be solved in a numerically stable manner by interior point methods.Moreover, the nonnegativity constraints can be added to the original SDP, and the resulting DNN relaxation can be solved efficiently, as the nonnegativities follow through, and are in fact simplified, to the reduced SDP program.Thus, in fact we solve the facially and symmetry reduced DNN relaxation using an alternating direction method of multipliers approach, ADMM.As a consequence, we are able to solve some huge DNN relaxations for highly symmetric instances of certain hard combinatorial problems, and we do so in a reasonable amount of time.
We include theoretical results on facial reduction, as well as on the singularity degree of both SDP and DNN relaxations.We present a view of FR for DNN from the ground set of the original hard combinatorial problem.The singularity degree indicates the importance of FR for splitting type methods.In particular we show that the singularity degree remains the same after SR, and that our applications all have singularity degree one, that get reduced to zero after FR.

Outline
In Section 2 we provide the background on using substitutions to first obtain FR and then symmetry and block diagonal SR.In Section 3 we show how to apply FR to the symmetry reduced SDP, and we also provide conditions such that the obtained SDP is strictly feasible.In fact, we show that the nonnegativity constraints are essentially unchanged and that we have strict feasibility for the reduced DNN relaxation.The results that singularity degree does not change after SR are included as well, see Section 3.3.In Section 4 we show that the reduced DNN relaxation can be solved efficiently using an ADMM approach.In Section 5 we apply our result to two classes of problems: the quadratic assignment and graph partitioning problems.Concluding comments are in Section 6.

Semidefinite programming
The semidefinite program, SDP, in standard form is where the linear transformation A : S n → R m maps real n × n symmetric matrices to R m , and X 0 denotes positive semidefiniteness, i.e., X ∈ S n + .The set S n + is the positive semidefinite, P SD , cone.In the case of a doubly nonnegative, DNN , relaxation, nonnegativity constraints, X ≥ 0, are added to (2.1), i.e., we use the DNN cone denoted DNN ∼ = DNN n = S n + ∩ R n×n + .Without loss of generality, we assume that A is onto.We let (P F ) denote the feasibility problem for this formulation with data A, b, S n + from (2.1).Note that the linear equality constraint is equivalent to for some A i ∈ S n , i = 1, . . ., m.The adjoint transformation A * : R m → S n is A * (y) = m i=1 y i A i .

Strict feasibility and facial reduction
The standard constraint qualification to guarantee strong duality 1 for the primal SDP is the Slater constraint qualification (strict feasibility) ∃ X : A( X) = b, X 0, where X 0 denotes positive definiteness, i.e., X ∈ S n ++ .For many problems where strict feasibility fails, one can exploit structure and facially reduce the problem to obtain strict feasibility, see e.g., [5,6] for the theory and [7] for the facial reduction algorithm.A survey with various views of FR is given in [18].Facial reduction means that there exists a full column rank matrix V ∈ R n×r , r < n, and the corresponding adjoint of the linear transformation V : S n → S r given in V * (R) = V RV T , R ∈ S r , such that the substitution X = V * (R) results in the equivalent, regularized, smaller dimensional, problem . ., m}, R ∈ S r + }. 2  (2.3) 1 Strong duality for the primal means a zero duality gap, p * SDP = d * SDP , and dual attainment. 2 FR generally results in the constraints becoming linearly dependent.Therefore, a linearly independent subset need only be used [55].
Strict feasibility holds for (2.3).The cone V S r + V T is the minimal face of the SDP, i.e., the smallest face of S n + that contains the feasible set, F X .And range(V ) = range(X), ∀X ∈ relint(F X ).
If U ∈ R n×n−r with range(U ) = null(V T ), then W := U U T is an exposing vector for the minimal face, i.e., X feasible =⇒ W X = 0.
Let F R denote the feasible set for (2.3).We emphasize the following constant rank result for the FR substitution: Remark 2.1.For a typical FR algorithm for finding the minimal face, at each iteration the dimension is strictly reduced, and at least one redundant linear constraint can be discarded, i.e., we need at most min{m, n − 1} iterations, e.g., [18,56], [55,Theorem 3.5.4].
Note that FR can also be considered in the original space using rotations.Each step of FR involves finding an exposing vector W = U U T to the minimal face.Without loss of generality, we can assume that the matrix Q = V U is orthogonal.Then the FR that reduces the size of the problem X = V * (R) = V RV T can equivalently be considered as a rotation (orthogonal congruence): i.e., after this rotation, we can discard zero blocks and reduce the size of the problem.We note that this can then be compared to the Constrained Set Invariance Conditions approach in [46], where a special projection is used to obtain the reduced problem.In addition, the approach in [46] performs the projections on the primal-dual problem thus maintaining the original optimal values of both.In contrast, we emphasize the importance of the primal problem as being the problem of interest.After FR we have a regularized primal problem (2.3) with optimal value the same as that of the original primal problem.In addition, the reduced program has the important property that the dual of the dual is the primal.

Group invariance and symmetry reduction, SR
We now find a substitution using the adjoint linear transformation B * (x) from (2.11) below, that provides the SR in block diagonal form.We first look at the procedure for simplifying an SDP that is invariant under the action of a symmetry group.This approach was introduced by Schrijver [52]; see also the survey [2].The appropriate algebra isomorphism follows from the Artin-Wedderburn theory [66].A more general framework is given in the thesis [46].More details can be found in e.g., [12,22,24,62].
Let G be a nontrivial group of permutation matrices of size n.The commutant, A G , (or centralizer ring) of G is defined as the subspace (2.4) Thus, A G is the set of matrices that are self-permutation-congruent for all P ∈ G.An equivalent definition of the commutant is is called the Reynolds operator (or group average) of G.The operator R G is the orthogonal projection onto the commutant.The commutant A G is a matrix * -algebra, i.e., it is a set of matrices that is closed under addition, scalar multiplication, matrix multiplication, and taking transposes.
One may obtain a basis for A G from the orbits of the action of G on ordered pairs of vertices, where the orbit of (u i , u j ) ∈ {0, 1} n ×{0, 1} n under the action of G is the set {(P u i , P u j ) | P ∈ G}, and u i ∈ R n is the i-th unit vector.In what follows, we denote basis for In what follows we obtain that the Reynolds operator maps the feasible set F X of (2.1) into itself and keeps the objective value the same, i.e., One can restrict optimization of an SDP problem to feasible points in a matrix * -algebra that contains the data matrices of that problem, see e.g., [14,23].In particular, the following result is known.
Theorem 2.3 ( [14], Theorem 4).Let A G denote a matrix * -algebra that contains the data matrices of an SDP problem as well as the identity matrix.If the SDP problem has an optimal solution, then it has an optimal solution in A G .
Remark 2.4.In [14], the authors consider complex matrix * -algebras.However in most applications, including applications in this paper, the data matrices are real symmetric matrices and A G has a real basis, see Definition 2.2.Thus, we consider here the special real case.The authors in [14] also prove that if A G has a real basis, and the SDP has an optimal solution, then it has a real optimal solution in A G .Real matrix * -algebras are also considered in [15,16,43].
In addition, Theorem 2.3 holds for DNN, i.e., we can move any nonnegativity constraints on X added to the SDP in Theorem 2.3 to simple nonnegativity constraints on x in (2.7), see e.g., [12,Pg 5].
Therefore, we may restrict the feasible set of the optimization problem to its intersection with A G .In particular, we can use the basis matrices and assume that (2.7) From now on we assume that G is such that A G contains the data matrices of (2.1).
Example 2.5 (Hamming Graphs).We now present an example of an algebra that we use later in our numerics.
The Hamming graph H(d, q) is the Cartesian product of d copies of the complete graph K q , with vertices represented by d-tuples of letters from an alphabet of size q.The Hamming distance between vertices u and v, denoted by |(u, v)|, is the number of positions in which d-tuples u and v differ.
The matrices form a basis of the Bose-Mesner algebra of the Hamming scheme, see [17].In particular, B 0 = I is the identity matrix and B 1 is the adjacency matrix of the Hamming graph H(d, q) of size q d × q d .In cases, like for the Bose-Mesner algebra, when one of the basis elements equals the identity matrix, it is common to set the index of the corresponding basis element to zero.The basis matrices B i can be simultaneously diagonalized by the real, orthogonal matrix Q given by The distinct elements of the matrix where are Krawtchouk polynomials.We denote by µ j := d j (q − 1) j the multiplicity of the j-th eigenvalue K i (j).The elements of the character table P ∈ R (d+1)×(d+1) of the Hamming scheme H(d, q), given in terms of the Krawtchouk polynomials, are p i,j := K i (j), i, j = 0, . . ., d.
In the later sections, we use the following well-known orthogonality relations on the Krawtchouk polynomial, see e.g., [17], where δ r,s is the Kronecker delta function.

First symmetry reduction using
We now obtain our first reduced program using the substitution X = B * (x).Note that the program is reduced in the sense that the feasible set can be smaller though the optimal value remains the same.
Here, B is the adjoint of B * .In the case of a DNN relaxation, the structure of the basis in (2.6) allows us to equate X = B * (x) ≥ 0 with the simpler x ≥ 0. This changes the standard doubly nonnegative cone into a splitting, the Cartesian product of the cones x ≥ 0, B * (x) 0, see Remarks 2.4 and 5.1.
where ⊕ denotes the direct sum of matrices.A very important decomposition result for matrix * -algebras is the following result due to Wedderburn.

Theorem 2.6 ( [66]
).Let M be a matrix * -algebra containing the identity matrix.Then there exists a unitary matrix Q such that Q * MQ is a direct sum of basic matrix * -algebras.
The above result is derived for a complex matrix * -algebras.In [43], the authors study numerical algorithms for block-diagonalization of matrix * -algebras over R. Unfortunately, the Wedderburn decomposition described in the above theorem does not directly apply for * -algebras over reals.To demonstrate our approach in the section on numerical results we use known orthogonal matrices or a simple heuristics to obtain them.
To simplify our presentation, the matrix Q in Theorem 2.6 is assumed to be real orthogonal.(The case when Q is complex can be derived analogously.)Then, the matrices in the basis B j , j = 1, . . ., d, can be mutually block-diagonalized by some orthogonal matrix Q.More precisely, there exists an orthogonal matrix Q such that we get the following block-diagonal transformation on B j : For Q T XQ = d j=1 x j Bj , we now define the linear transformation for obtaining the block matrix diagonal form: where is the k-th diagonal block of B * (x), and the sum of the t block sizes n 1 + . . .+ n t = n.Thus, for any feasible X we get 2.2.2 Second symmetry reduction to block diagonal form using X = Q B * (x)Q T We now derive the second reduced program using the substitution X = Q B * (x)Q T .The program is further reduced since we obtain the block diagonal problem where C = Q T CQ and Ã is the linear transformation obtained from A as follows: Ãj = Q T A j Q, ∀j.We denote the corresponding blocks as Ãk j , ∀j = 1, . . ., d, ∀k = 1, . . ., t.We see that the objective in (2.12) satisfies While the i-th row of the linear equality constraint in (2.12), Ãx = b, is Without loss of generality, we can now define c := c, A := Ã.
Moreover, just as for FR, the SR step can result in A not being full row rank (onto).We then have to choose a nice (well conditioned) submatrix that is full row rank and use the resulting subsystem of Ax = b.We see below how to do this and simultaneously obtain strict feasibility.
We can now rewrite the SDP (2.1) as For many applications, there are repeated blocks.We then take advantage of this to reduce the size of the problem and maintain stability.The program (2.14) is a symmetry reduced formulation of (2.1).We denote its feasible set and feasible slacks as (2.15) We denote the feasibility problem for this formulation with data B * , A, b, S n + of the feasible set F x as P Fx .We bear in mind that B * (x) is a block-diagonal matrix.But it is written as a single matrix for convenience in order to describe FR for the symmetry reduced program below.
Since B1 , . . ., Bd are block diagonal, symmetric matrices, the symmetry reduced formulation is typically much smaller than the original problem, i.e., where t(k) = k(k + 1)/2 is the triangular number.

Facial reduction for the symmetry reduced program
In this section, we show how to apply FR to the symmetry reduced SDP (2.14).The key is using the exposing vector view of facial reduction, [18].Formally speaking, if an exposing vector of the minimal face of the SDP (2.1) is given, then we are able to construct a corresponding exposing vector of the minimal face of the symmetry reduced program (2.14).In fact, we show that all the exposing vectors of the symmetry reduced program can be obtained from the exposing vectors of the original program.In general, one can find exposing vectors from the original program by exploiting the structure.However, this is lost after the SR and results in a more difficult task in finding an exposing vector.
In addition, we follow the above theme on simply adding on the nonnegativities and extend many of the results to the DNN program.We include results on the singularity degree to emphasize the importance of FR for stability and that SR does not increase the singularity degree.

Rank preserving
We begin with showing the maximum rank preserving properties of SR.Note that max{rank where face(F X ) is the minimal face of S n + containing the feasible set.We let F ¢ K denote that F is a face of the cone K.
Proof.Let X ∈ F X be the matrix with maximum rank r.Then X is in the relative interior of the minimal face f ¢ S n + containing F X , i.e., The nonsingular congruence P T XP is feasible for each P ∈ G, and also has rank r.Note that A, B ∈ S n + =⇒ rank(A + B) ≥ max{rank(A), rank(B)}.Therefore, applying the Reynolds operator, we have and it has rank r, where Q is the orthogonal matrix given above in (2.10).Conversely, if B * (x) ∈ S x with rank r, then Note that in the proof of Theorem 3.1 we exploit the following known properties of the Reynolds operator: rank(R G (X)) ≥ rank(X), which is valid for all X that are positive semidefinite, and R G (F X ) = F X ∩ A G .
Corollary 3.2.The program (2.1) is strictly feasible if, and only if, its symmetry reduced program (2.14) is strictly feasible.Remark 3.3.From the proof of Theorem 3.1, if there is a linear transformation X = L(x) with a full rank feasible X ∈ range(L), X = L(x), then in general we can conclude that the substitution X = L(x) results in a smaller SDP with strict feasibility holding at x, i.e., X 0,

Exposing vectors
For many given combinatorial problems, the semidefinite relaxation is not strictly feasible, i.e., it is degenerate, ill-posed, and we can apply FR [18,46,55].From Section 3.1 above, we see that this implies that the symmetry reduced problem is degenerate as well.Although both SR and FR can be performed separately to obtain two independent problems, there has not been any study that implements these techniques simultaneously and efficiently for SDPs , i.e., to obtain a symmetry reduced problem that also guarantees strict feasibility.Recall that [39] combines partial FR and SR for solving SOS programs.
In what follows, we show that the exposing vectors of the symmetry reduced program (2.14) can be obtained from the exposing vectors of the original program (2.1).This enables us to facially reduce the symmetry reduced program (2.14) using the structure from the original problem.
Let W = U U T , with U ∈ R n×(n−r) full column rank; and let W be a nonzero exposing vector of a face of S n + containing the feasible region F X of (2.1).Let V ∈ R n×r be such that range(V ) = null(U T ).
Then FR means that we can use the substitution X = V * (R) = V RV T and obtain the following equivalent, smaller, formulation of (2.1): If V exposes the minimal face containing F X , then strict feasibility holds.In fact, R strictly feasible corresponds to X = V * ( R) ∈ relint(F X ).
The following results show how to find an exposing vector that is in the commutant A G .
Lemma 3.4.Let W be an exposing vector of rank d of a face F ¢ S n + , F X ⊆ F. Then there exists an exposing vector Proof.Let W be the exposing vector of rank d, i.e., W 0 and Since (2.1) is G-invariant, P XP T ∈ F X for every P ∈ G, we conclude that W, P XP T = P T W P, X = 0. Therefore, P T W P 0 is an exposing vector of rank d.Thus W G = 1 |G| P ∈G P T W P is an exposing vector of F.
That the rank is at least d follows from taking the sum of nonsingular congruences of W 0.
Lemma 3.4 shows that A G contains exposing vectors.This result is a valuable addition to the list of objects that exhibit symmetry, see for example: dual solutions and the central path in [31]; solutions on the central path and some search directions of primal-dual interior-point methods, in [29]; and infeasibility certificates, in [45].
Note that one can obtain an exposing vector W G ∈ A G from an exposing vector W by using the Reynolds operator, as done in Lemma 3.4.However, in some cases W G can be more easily derived, as our examples in the later numerical sections show.We now continue and show that Q T W G Q is also an exposing vector.
Then, by construction Z is a block-diagonal matrix, say Z = Blkdiag(Z 1 , . . ., Z t ).Now, since W is an exposing vector of the face of S n + containing F X we have where W = Q T W Q 0. Thus, W is an exposing vector of a proper face of

Since we may assume
Theorem 3.6.Let W ∈ A G be an exposing vector of face(F X ), the minimal face of S n + containing F X .Then the block-diagonal matrix W = Q T W Q exposes face(S x ), the minimal face of S n + containing S x .
+ containing S x , and let Let Ṽi be a full rank matrix whose columns form a basis for the orthogonal complement to the columns of Ũi , i = 1, . . ., t.Take Ṽ = Blkdiag( Ṽ1 , . . ., Ṽt ).Then, the facially reduced formulation of (2.14) is where Ṽk Rk Ṽ T k is the corresponding k-th block of B * (x), and R = Blkdiag( R1 , . . ., Rt ).Note that some of the blocks B * k (x), and corresponding Rk , might be the same and thus can be removed in the computation, see Theorem 2.6.
Remark 3.7.We have assumed that an exposing vector of the minimum face of the original SDP (2.1) is available.If this is not the case, then we can find a strictly feasible formulation of (2.1), and an exposing vector of the minimum face for the original problem, by using a finite number (at most min{m, n − 1}) facial reduction steps, e.g., [18,55,56].
We note here that reduction techniques based on the Constrained Set Invariance Conditions, such as * -algebra techniques, can obtain strict feasibility by removing zero blocks after the appropriate projection, see [46].

Order of reductions
To obtain the combined symmetry and facially reduced semidefinite program (3.2), we first apply SR to (2.1), and then follow this with FR to the form in (2.14).A natural question is whether the order of reduction matters.
Note that the objective V T CV, R and the constraints 3), see also (3.1), depend on the choice of V .We now show that the choice of this V is crucial when reversing the order of reductions FR and SR.For a naive choice of V , we can lose all symmetries structure for SR in Section 2.2.For example, assume the data matrices C, A 1 , . . ., A m of the original problem are invariant under a non-trivial permutation group G, i.e., they are in the commutant A G , see (2.4).However the data matrices V T CV, V T A 1 V, . . ., V T A m V of the facially reduced problem may not be invariant under any nontrivial group of permutation matrices for the given V .Note that we can always replace V ← V S using any invertible S. Then an arbitrary invertible congruence S T V T A i V S will destroy the symmetry structure in the constraint matrices.Lemma 3.8.Let V, Ṽ , Q be given as in the above paragraph, and in Theorem 3.6 and Proof.
From Lemma 3.8, we can set V = Q Ṽ for FR.The objective and constraints become As C, Ãi and Ṽ are block-diagonal matrices with appropriate sizes, the data matrices Ṽ T C Ṽ and Ṽ T Ãi Ṽ are block-diagonal as well.Since R is also a block-diagonal matrix, this choice of V implicitly exploits symmetry of the original problem data.The reduction in this case is a special case of a general reduction technique known as a projection-based method, see [46] and Remark 2.1 above.
We conclude that if FR is implemented first, then for SR to follow it is crucial to find a suitable matrix V to retain the symmetry structure in the facially reduced problem.Therefore, it is more convenient for our approach to apply symmetry reduction before facial reduction and exploit the simple relation with the exposing vectors.However, for some other symmetry reduction methods it might be more appropriate to do first FR and then SR , assuming that some symmetry is preserved after FR and that the SR and FR procedures have comparable costs, see e.g., [39].

Doubly nonnegative, DNN, program
In this section, the theory above for SDP is extended to doubly nonnegative program.We show that if a maximal exposing vector W (see Definition 3.11) of the DNN program (3.3) is given, then we can construct an exposing vector for the minimal face of the symmetry reduced DNN program (3.4).This results in a strictly feasible symmetry reduced DNN program (3.5).
Note that in addition to positive definiteness, we need X > 0, elementwise positivity, for strict feasibility to hold for the DNN relaxation.We denote the cone of nonnegative symmetric matrices of order n by N n .The following extends [27,Proposition 2.3] for the intersection of faces to include exposing vectors.Theorem 3.9.Let F S ¢ S n + , and let Proof.Note that since N n is a polyhedral cone, and both S n + , N n are self-dual, we get that the dual cone (nonnegative polar) We can now take the sum of the exposing vectors and it is clearly an exposing vector on the intersection of the faces.
2. For our application, we note that the intersection F S ∩F N is characterized by the facial representation X ∈ V S r + V T and X ij = 0 for appropriate indices i,j.FR on the P SD cone allows one to obtain a new SDP problem of lower dimension, since any face of the P SD cone is isomorphic to a smaller P SD cone.However, the DNN cone does not possess this property.Namely, a face of a DNN cone is not necessarily isomorphic to a smaller DNN cone.However, our splitting still allows for a simplification based on the Cartesian product of a SDP cone and a N n cone, see (2.9), the paragraph after (2.9), and Remark 5.1, below.
The DNN program is defined as The symmetry reduced formulation of the DNN program (3.3) is see (2.11) for the definition of B * k (x).Recall that the symmetry reduced formulation of the SDP program (2.1) is (2.14).The ambient cone of the symmetry reduced program (3.4) is the Cartesian product of cones (R d + , S n 1 + , . . ., S nt + ).Let W ∈ DNN * be an exposing vector of (3.3).Then W = W S + W N for some W S ∈ S n + and W N ∈ N n .The exposing vector W ∈ DNN * satisfies W, X = 0 for every feasible X of (3.3).Since it also holds that W S , X ≥ 0 and W N , X ≥ 0, we have for every feasible X of (3.3).
We are going to construct an exposing vector for the symmetry reduced program (3.4) by using W .Here the exposing vectors ( W n 1 , . . ., W nt ) for the semidefinite cones (S n 1 , . . ., S nt ) can be derived in the same way as before.Therefore we only have to find an exposing vector for the nonnegative cone R d + .Let x be feasible for (3.4).Then X = B * (x) is feasible for (3.3).We have Define w := B(W N ).Since W N is nonnegative and (B(W N )) i = B i , W N for some zero-one matrix B i , the vector w is nonnegative.Then w, x = 0 implies that w is an exposing vector for the cone R d + of (3.4).Thus facial reduction for the nonnegative cone R d + simply removes the entries x i associated to positive entries w i > 0 from the program.Let x be the vector obtained by removing these entries from x. Define the new data matrices c, Ā correspondingly.The facial reduction for the semidefinite cones are handled in the same way as before.This yields the following facially reduced formulation of (3.4) We show below that if W is a maximal exposing vector of (3.(i) rank(W S ) is maximal; (ii) the number of positive entries in W N is maximal, i.e., supp(W N ) ⊆ supp(W N ) for any other exposing vector W N .
Note that if W N , W N ∈ N n are exposing vectors for a DNN program, then W N + W N ∈ N n is also an exposing vector.Therefore the support of W N in the decomposition of a maximal exposing vector W is unique in Definition 3.11.Theorem 3.12.Assume that W = W S +W N , where W S ∈ S n + , W N ∈ N n , is a maximal exposing vector of the DNN program (3.3).Then W S and W N are exposing vectors for the minimal face of the symmetry reduced program (3.4), or equivalently, the facially and symmetry reduced program (3.5) is strictly feasible.
Proof.Assume, for the sake of contradiction, that (3.5) is not strictly feasible.The existence of a feasible solution for (3.5) such that Rk 0 for k = 1, . . ., t can be derived in the same way as before, see Theorem 3.6.Therefore, we consider here only the case that there does not exist feasible x for (3.5) that is strictly positive.Then there exists an exposing vector w ∈ R d + for (3.4) such that supp(w) supp(w ).Let W N := B * (w ) ∈ N n .Then supp(W N ) supp(W N ).Let X ∈ DNN be feasible for (3.3).Then R G (X) = B * (x) ∈ DNN for some x feasible for (3.4), and thus , this means that W N , X = 0. Thus W N is an exposing vector for (3.3) such that supp(W N ) supp(W N ).This contradicts the maximality of W . Thus the program (3.5) is strictly feasible.Note that the fact that we could move the nonnegativity to the reduced variable x was essential for obtaining the Slater condition.
Remark 3.13.One can also prove Theorem 3.12 by exploiting the following properties of R G (X), the Reynolds operator of G, see (2.5).For all feasible X, it holds that rank (R G (X)) ≥ rank(X), supp(R G (X)) ⊇ supp(X).

Facial reduction from the ground set for DNN
Our applications involve quadratic models of hard combinatorial problems.We now see that the view of strict feasibility and FR in [61, Theorems 3.1, 3.2] can be easily extended from SDP to DNN.
We follow the notation in [61] and define the feasible set or ground set of a quadratically constrained program as: where A is a linear transformation.The relaxation, lifting, is then given by Let the gangster set for Q be defined as Let the gangster set for Q be defined as Note that here the gangster sets are equal G Q = G Q , with appropriate indices.However, for a general DNN, we are not given the ground set and the gangster set is defined for the lifted problem only.
In what follows we use e k or e when the meaning is clear, to denote the vector of all ones of order k.Theorem 3.14 (Slater).Suppose that conv(Q ) is full dimensional and that G Q = ∅.Then the Slater condition holds for Q .
Proof.By the assumption, we can choose the finite set of vectors As in [61], we choose an affine linear independent set {x i } n+1 i=1 ⊆ Q , and form the matrix by adding ones and the v i,j defined in (3.6): We lift and get the Slater point We now extend this to obtain FR.We use our exposing vector viewpoint rather than the internal view in [61,Theorem 3.2].We note that we can not move here the nonnegativity constraints onto R as is done for our applications after SR.Moreover, though the Slater condition holds for the FR feasible set in (3.7), it is not necessarily true that the Mangasarian-Fromovitz constraint qualification holds, since some of the linear equality constraints typically become redundant after FR.We can however discard redundant equality constraints.
A T and V be full column rank with range(V ) = null(U ).Then there exists a Slater point R for the FR, DNN feasible set where (V RV T ) S is the vector with indices chosen from the set S, and Proof.The proof is as for Theorem 3.14 after observing that U U T is an exposing vector.More precisely, from we see that Y U U T = 0 for all lifted Y , and therefore also for the minimal face.Thus U U T is an exposing vector.The result follows after restricting the selection in (3.6) to the complement G c Q .

Singularity degree
The singularity degree defined for the semidefinite feasibility problem P F (2.2), and denoted by sd(P F ), is the minimum number of steps with a nonzero exposing vector, for the FR algorithm to terminate with the minimal face.For P SDP this means we terminate with a strictly feasible problem.Singularity degree was introduced for P SDP in [58] to show that SDP feasibility problems always admit a Hölder error bound,3 more precisely, let d = sd(P F ), L = {X | A(X) = b}, U ⊂ S n be compact.Let dist denote the norm-distance to a set.Then it is shown in [58] that there exists c > 0 such that dist(X, Remarkably, the exponent 2 −d is independent of m, n and the rank of the matrices in F X .It strongly indicates the importance of FR for SDP, especially when obtaining approximate solutions with splitting type methods.This is illustrated by our numerical results, see Table 5.7 below, where lower bounds obtained by ADMM are dramatically better than those for IPM.
In this section, we show that the singularity degree of a symmetry reduced program is equal to the singularity degree of the original problem, see Theorem 3.17.Thus, we provide a heuristic indication that this error measure does not grow when applying SR.Of course, after completing FR, the singularity degree is optimal, 0.
The facial reduction algorithm applied to the semidefinite program (2.1) is described as follows.At the k-th step, FR finds an exposing vector of the feasible region of the reduced SDP of (2.1) Here V is a given matrix updated after each FR step.In the first step, V is the identity matrix and (3.10) is the feasible region F X of the original problem (2.1).An exposing vector is then obtained by solving the following auxiliary system for y: If y exists, then W = A * V (y) ∈ R r k ×r k is an exposing vector of the feasible region (3.10).We then do as follows: (3.12)At the k-th step, we have computed a vector y and a matrix V that determines the facially reduced formulation at the next step.Choosing exposing vectors with maximum possible rank leads to the fewest iterations, see e.g., [40,56].For completeness, we now show that the number of iterations in the facial reduction algorithm only depends on the choice of y and not on the choice of V .Lemma 3.16.The total number of facial reduction steps does not depend on the choice of V and V in (3.12).
Proof.Assume that y satisfies the auxiliary system (3.11) for the feasible region (3.10).If we replace V ∈ R n×r k in (3.10), with V S ∈ R n×r k , for some invertible matrix S ∈ R r k ×r k , then the same vector y satisfies the new auxiliary system, as b T y ≤ 0 and Since S is invertible, it holds that W S = 0 and rank(W S ) = rank(W ).Thus, we obtain the same reduction in the problem size at the k-th step.
As null(W S ) = S −1 null(W ), we have S −1 V ∈ null(W S ), where V satisfies range(V ) = null(W ) as in (3.12).For any invertible matrix T ∈ R r (k+1) ×r (k+1) , we have that V S = S −1 V T ∈ null(W S ).Thus, in the second step of (3.12), we have This means we can can repeat our argument to show the reduction at each subsequent step is the same.Now we describe the facial reduction algorithm applied to the symmetry reduced program (2.14).The facial reduction algorithm at the k-th step considers the feasible region in variables (x, R1 , . . ., Rt ) determined by for some Ṽ = Blkdiag( Ṽ1 , . . ., Ṽt ) with Ṽi ∈ R n i ×r k , see also (3.2).Here blkdiag = Blkdiag * .In the first step, Ṽ is the identity matrix and we obtain the feasible region F x of the symmetry reduced program (2.14).
The auxiliary system for (3.13) is to find (y, W 1 , . . ., W t ) such that Then Blkdiag( Ṽ T 1 W 1 Ṽ1 , . . ., Ṽ T t W t Ṽt ) is an exposing vector of the symmetry reduced problem.Let Ṽ i be the matrix whose independent columns span null( Ṽ T i W i Ṽi ).In the facial reduction algorithm, we replace the matrix Ṽi by Ṽi Ṽ i .Then we repeat the algorithm until the auxiliary system (3.14) has no solution.
Our main result in this section is that the singularity degree of the symmetry reduced SDP (2.14) is equal to the singularity degree of the original SDP (2.1).Theorem 3.17.sd(P Fx ) = sd(P F ).
Proof.We show first the inequality sd(P Fx ) ≤ sd(P F ).In particular, we show that if we apply the facial reduction algorithm to the original SDP (2.1), then the solution of the auxiliary system (3.11) can be used to construct a solution to the auxiliary system (3.14) of the symmetry reduced problem (2.14).
Let y be a solution to the auxiliary system (3.11) in the k-th facial reduction step.Let W = A * V (y) ∈ A G (see Lemma 3.4) and W = Q T W Q, where Q is as specified in Theorem 2.6.Further, let W j ∈ S n j + be the j-th block of W (j = 1, . . ., t).
Let k > 1 and V = Q Ṽ where V and Ṽ are derived in the previous iterate of the FR algorithm.Then, we have that 0, we have that each block Ṽ T j W j Ṽj (j = 1, . . ., t) is positive semidefinite.It also holds that b T y ≤ 0 and B( W ) = A T y.Thus (y, W 1 , . . ., W t ) satisfies the auxiliary system (3.14).Further, we have that rank (A * V (y)) = t j=1 rank( Ṽ T j W j Ṽj ).Let V and Ṽ j (j = 1, . . ., t) be matrices whose independent columns span null (A * V (y)) and null( Ṽ T k W k Ṽk ) (j = 1, . . ., t), respectively.As A * V (y) = Ṽ T W Ṽ is block diagonal we can simply take V = Ṽ .Thus after updating V ← V V and Ṽ ← Ṽ Ṽ , we have V = Q Ṽ in the next step.We can repeat the same argument until the facial reduction algorithm terminates.
Next, we show that sd(P Fx ) ≥ sd(P F ). Let us assume that (y, W 1 , . . ., W t ) satisfies the auxiliary system (3.14) in the first facial reduction step.Recall that in the first step, Ṽ is the identity matrix.For Q defined as in Theorem 2.6, we have that To show that y satisfies the auxiliary system (3.11),we have to prove that A * (y) 0. It holds that see also (3.15).The second equality above uses the feasibility of (3.14 In addition, it follows from construction of W and Lemma 3.16 that we can take V and Ṽ such that V = Q Ṽ in the next FR step. The following Corollary 3.18 follows from [61, Theorem 3.2] in that the linear manifold is represented by a concrete constraint and is applied to finding an exposing vector.More precisely, if we can find the affine span of our original feasible set in the ground space, then we can always find the representation using a linear mapping as in (3.9).This means we can always find the appropriate exposing vector and obtain singularity degree one, see also [18].Note that this includes the hard combinatorial problems we work with below.
Corollary 3.18.Consider the quadratic model as given in Theorem 3.15, and suppose that the matrix A is part of the given data of the problem, Then the singularity degree is one.
Proof.The proof uses A to construct the exposing vector.Therefore, one step of the FR algorithm is sufficient, see (3.9).More precisely, the linear constraint in the ground set is lifted into the SDP as in (3.9).
Remark 3.19.The definition of singularity degree can be extended to DNN, and to a general cone optimization problem, to be the minimum number of steps in the FR [7, Algor.B].Here this means we continue to find the minimum number of steps with nonzero exposing vectors.An interesting question is to find the relationship between the singularity degree of the SDP and the DNN.It appears that the DNN is at most one more than for SDP.Moreover it is known that the singularity degree of the DNN n is at most n, see [40,Corollary 20].

Simplifications
After FR , some of the constraints become redundant in the facially reduced program (3.1).We show here that the same constraints are also redundant in the facially and symmetry reduced program (3.2).Proof of Lemma 3.20 is clear.Lemma 3.20.For any subset I ⊆ [m] := {1, . . ., m}, we define the spectrahedron Although a proof of Corollary 3.21 follows directly from Lemma 3.20, we provide it due to our intricate notation.
Then the constraints 2), i.e., the facially and symmetry reduced program (3.2) is equivalent to Using the feasibility and (2.13), it holds that Thus all the linear constraints in (3.18) are satisfied, and R 0 is feasible for (3.18).By assumption, R is also feasible for (3.1).Thus the constraints A i , V RV T = b i , ∀i / ∈ I are satisfied as well.This shows that (x, R) is feasible for (3.2).
To obtain a formulation for the facially and symmetry reduced program (3.2) in variable R only, we can replace x in terms of R using the constraint B * (x) = Ṽ R Ṽ T .This substitution can be done easily by rewriting the constraints as The objective can be similarly changed.This method, however, does not work for DNN relaxations.This difficulty can be resolved as follows.
Theorem 3.22.Consider the facially and symmetry reduced relaxation (3.2) with nonnegativity constraints, where R = Blkdiag( R1 , . . ., Rt ).Equate x with and Q is specified in Theorem 2.6.Then (3.20) is equivalent to where diagonal blocks of the block diagonal matrix R are set to be equal for the corresponding repeating blocks in As w > 0 and BB * = Diag(w), we have x = f ( R) and thus R is feasible for (3.21).
Let R be feasible for (3.21).Since Ṽ R Ṽ T is a block-diagonal matrix in the algebra It follows from Theorem 2.6 that there exists a unique x such that V RV T = B * (x).Then we must have x = f ( R) and thus (x, R) is feasible for (3.20).
For the Hamming scheme, we have an explicit expression for the orthogonal matrix Q used in Theorem 3.22, see Example 2.5 and Section 5.1.In general, we do not know the corresponding orthogonal matrix explicitly.In Section 5.2, we use the heuristics from [15] to compute a block diagonalization of A G .In this case, the equivalence in Theorem 3.22 may not be true, and (3.21) may be weaker than (3.20).However our computational results indicate that all the bounds remain the same, see Table 5.7 below.

The alternating direction method of multipliers, ADMM
It is well known that interior-point methods do not scale well for SDP.Moreover, they have great difficulty with handling additional cutting planes such as nonnegativity constraints.In particular, solving the doubly nonnegative relaxation, DNN, using interior-point methods is extremely difficult.The alternating direction method of multipliers is a first-order method for convex problems developed in the 1970s, and rediscovered recently.This method decomposes an optimization problem into subproblems that may be easier to solve.In particular, it is extremely successful for splittings with two cones.This feature makes the ADMM well suited for our large-scaled DNN problems.For state of the art in theory and applications of the ADMM, we refer the interested readers to [8].
Oliveira, Wolkowicz and Xu [44] propose a version of the ADMM for solving an SDP relaxation for the Quadratic Assignment Problem (QAP ).Their computational experiments show that the proposed variant of the ADMM exhibits remarkable robustness, efficiency, and even provides improved bounds.

Augmented Lagrangian
We modify the approach from [44] for solving our SR and FR reduced DNN relaxation (3.2).We have a greatly simplified structure as we applied SR to the SDP relaxation, and we were then able to move the nonnegativity constraints to a simple vector x ≥ 0 contraint.We in particular obtain a more efficient approach for solving the x-subproblem.
The alternating direction method of multipliers, ADMM, uses the augmented Lagrangian, L(x, R, Z), and essentially solves the max-min problem max where P is a simple polyhedral set of constraints on x, e.g., linear constraints Ax = b and nonnegativity constraints, see (4.3) below.The advantage of the method is the simplifications obtained for the constraints by taking advantage of the splitting in the variables.We then find the following updates (x + , R+ , Z+ ): Here, γ ∈ (0, 1+ √ 5 2 ) is the step size for updating the dual variable Z.In the following sections we explain in details how to solve each subproblem.

On solving the R-subproblem
The R-subproblem can be explicitly solved.We complete the square and get the equivalent problem R+ = min Here, we normalize each block Ṽk such that Ṽ T k Ṽk = I, and thus Ṽ T ( B * (x) β Z) Ṽ corresponding to Rk .So we only need to solve k small problems whose optimal solutions are where P S + (M ) is the projection onto the cone of positive semidefinite matrices.

On solving the x-subproblem
For the x-subproblem, we have For many combinatorial optimization problems, some of the constraints Ax = b in (2.14) become redundant after FR of their semidefinite programming relaxations, see Corollary 3.21.Thus, the set P often collapses to a simple set.This often leads to an analytic solution for the x-subproblem; e.g., this happens for the quadratic assignment, graph partitioning, vertex separator, and shortest path problems.
For some interesting applications, the x-subproblem is equivalent to the following special case of the weighted, relaxed, quadratic knapsack problem: where Y is a given matrix and T * (x) = q i=1 x i T i for some given symmetric matrices T i .The problem (4.3) is a projection onto the weighted simplex.We consider the following assumption on a linear transformation T : S n → R q and its adjoint.
Proof.The Lagrangian function of the problem is where τ ∈ R and λ ∈ R q + are the Lagrangian multipliers.The KKT optimality conditions for the problem are given by Note that Diag(w) is the matrix representation of T •T * .This means that T i , T j = 0, ∀i = j, and we can simplify the first condition. 4This yields Define the data vector y := T (Y ).The complementary slackness λ T x = 0 implies that if x i > 0, then λ i = 0 and Thus the zero and positive entries of the optimal solution x correspond to the smaller than −τ and the larger than −τ entries of (w −1 i y i ) q i=1 , respectively.Let us assume, without loss of generality, that (w −1 i y i ) q i=1 , x are sorted in non-increasing order: The condition w T x = c implies that and thus Therefore, one can solve the problem by simple inspection once k is known.The following algorithm finds an optimal solution x to the problem (4.3).The correctness of the algorithm is then similar to the projection onto the (unweighted) simplex problem, see [10,11].
In our examples, see Sections 5.1 and 5.2, the x-subproblem (4.2) satisfies Assumption 4.1.Moreover, we have the following lemma.We remind the reader that J denotes the matrix of all ones.
where w i = trace(B T i B i ).Furthermore, J, B * (x) = w T x with w = (w i ) ∈ R q .Thus we set T = B and note that T (T * (x)) = Diag(w)x.

Numerical results
We now demonstrate the efficiency of our new approach on two classes of problems: the quadratic assignment problem, QAP, and several types of graph partitioning problem, GP.
Our tests were on: Dell PowerEdge M630 computer; two Intel Xeon E5-2637v3 4-core 3.5 GHz (Haswell) CPU; 64GB memory; linux.The interior point solver was Mosek, see [1].We had to use a different computer to accommodate some of the larger problems when using an interior point approach, see description of Table 5.4.
We report only the ADMM solver time, since the preprocessing time is very small in comparison with the ADMM part.In particular, we find exposing vectors explicitly in all our examples.Further, for most of the test instances we know generators of automorphism groups as well as the orthogonal matrix Q.For instances for which we need to find generators e.g., the instances in Table 5.3 the preprocessing time is less than one second.
The stopping conditions and tolerances are outlined at the start of Section 5.1.3,in Definition 5.10.Our results include huge problems of sizes up to n = 512 for the QAP, yielding of the order n 2 SDP constraints and n 4 nonnegativity constraints. 5.1 The quadratic assignment problem, QAP

Background for the QAP
The Quadratic Assignment Problem was introduced in 1957 by Koopmans and Beckmann as a model for location problems that take into account the linear cost of placing a new facility on a certain site, plus the quadratic cost arising from the product of the flow between facilities and distances between sites.The QAP contains the traveling salesman problem as a special case and is therefore NP-hard in the strong sense.It is generally considered to be one of the hardest of the NP-hard problems.
Let A, B ∈ S n , and let Π n be the set of n × n permutation matrices.The QAP (with the linear term with appropriate C in brackets) can be stated as follows: The QAP is extremely difficult to solve to optimality, e.g., problems with n ≥ 30 are still considered hard.It is well known that SDP relaxations provide strong bounds, see e.g., [15,69].However even for sizes n ≥ 15, it is difficult to solve the resulting SDP relaxation by interior point methods if one cannot exploit special structure such as symmetry.Solving the DNN relaxation is significantly more difficult.
Here, we first consider the DNN relaxation for the QAP from Povh and Rendl [50], i.e., min trace where and E ii = u i u T i , where u i ∈ R n is i-th unit vector.The authors in [15,Theorem 7.1] show that one can take where aut(A) := {P ∈ Π n : AP = P A} is the automorphism group of A.
Remark 5.1.The DNN relaxation (5.1) is known to be theoretically equivalent, yielding the same optimal value, to the DNN relaxation denoted (QAP R3 ) in Zhao et al. [69].The constraints Recall that svec is the linear transformation that vectorizes symmetric matrices, [69].We define gsvec to do this vectorization of symmetric matrices while ignoring the elements set to zero by the gangster constraints.Then we can eliminate the gangster constraints completely and replace the DNN constraints to get the equivalent problem to (5.1): This form is now similar to our final SR reduced form before FR, see (2.14); and this emphasizes that the DNN can be represented in a split form.
In the following lemma we derive the null space of the feasible solutions of (5.1), see also Corollary 2.2.7 in [60].
and let Y be in the relative interior of the feasible set of (5.1).Then null(Y ) = range U ⊗ e n e n ⊗ U .
Proof.Let X ∈ Π n .Then Xe n = X T e n = e n , and thus Thus range U ⊗ e n e n ⊗ U ⊆ null( Ŷ ), where It is proven in [60] that Ŷ is in the relative interior of the feasible set of ( 5 This square submatrix clearly has rank 2(n − 1), and thus the statement follows.
Let us now derive an exposing vector of the SDP relaxation (5.1) ignoring the nonnegativity, as we have shown we can add the nonnegativity on after the reductions.Lemma 5.3.Consider (5.1) without the nonnegativity constraints.Then and is an exposing vector of rank 2(n − 1) in A G .
Proof.Let U be defined as in Lemma 5.2.Using the properties of the Kronecker product, we have 0 as U U T = nI − J. From Lemma 5.2, we have W is an exposing vector of rank 2(n − 1).Let P be any permutation matrix of order n.Then P T (U U T )P = U U T by construction.We now have (P 1 ⊗ P 2 ) T W (P 1 ⊗ P 2 ) = W , for any P 1 , P 2 ∈ Π n ; and thus W ∈ A G .
In the rest of this section we show how to do FR for the symmetry reduced program of (5.1).We continue to add on nonnegativity constraints to SDP relaxations as discussed above.The facially reduced formulation of (5.1) is also presented in [60].We state it here for later use.

Lemma 5.4 ( [60]
).The facially reduced program of the doubly nonnegative, DNN (5.1) is given by min where, by abuse of notation, G : S n 2 → S n 2 is a linear operator defined by G(Y ) := (J − (I ⊗ , and the columns of V ∈ R n 2 ×(n−1) 2 +1 form a basis of the null space of W , see Lemma 5.3.
Note that the constraints I ⊗ E ii , Y = 1 and E ii ⊗ I, Y = 1 have become redundant after FR in (5.6).
We now discuss the symmetry reduced program.The symmetry reduced formulation of (5.1) is studied in [16].We assume that the the automorphism group of the matrix A is non-trivial.To simplify the presentation, we assume where {A 0 , . . ., A d } is the basis of the commutant of the automorphism group of A. For instance the matrices A i (i = 0, 1, . . ., d) may form a basis of the Bose-Mesner algebra of the Hamming scheme, see Example 2.5.Further, we assume from now on that A 0 is a diagonal matrix, which is the case for the Bose-Mesner algebra of the Hamming scheme.Here, we do not assume any structure in B. However the theory applies also when B has some symmetry structure and/or A 0 is not diagonal; see our numerical tests for the minimum cut problem in Section 5.2, below.
If the SDP (5.1) has an optimal solution Y ∈ S n 2 + , then it has an optimal solution of the form 2) and Section 2.2.We write these matrix variables in a more compact way as y = (vec where Q is the orthogonal matrix block-diagonalizing A i (i = 0, . . ., d).
Lemma 5.5.The symmetry reduced program of the DNN relaxation (5.1) is given by where B * k (y) is the k-th block from (5.7), and offDiag(Y 0 ) = 0 is the linear constraints that the off-diagonal elements are zero.
It remains to facially reduce the symmetry reduced program (5.8).Note that W ∈ A G can be written as W = d i=0 A i ⊗ W i , for some matrices W 0 , . . ., W d ∈ R n×n .Theorem 3.6 shows that the block-diagonal matrix is an exposing vector of the symmetry reduced program (5.8).Further, we denote by W k (k = 1, . . ., t) the k-th block of W .Let us illustrate this with Example 5.6.
Example 5.6.Consider Example 2.5, where A i (i = 0, . . ., d) form a basis of the Bose-Mesner algebra of the Hamming scheme.Then, the exposing vector W ∈ S n 2 + defined in Lemma 5.3 can be written as W = d i=0 A i ⊗ W i , where Let W k ∈ S n be the k-th block of W , see (5.9).Then there are d + 1 distinct blocks given by W k = d i=0 p i,k W i ∈ S n for k = 0, . . ., d, where p i,k are elements in the character table P of the Hamming scheme, see Example 2.5.Using the fact that P e = (n, 0, . . ., 0) T and p 1,k = 1, for every k = 0, . . ., d, we have and the matrices Ṽk , whose columns form a basis of the null space of W k ∈ S n , are given by Ṽ0 = e n and Ṽk = Similar results can be derived when one uses different groups.Now we are ready to present an SDP relaxation for the QAP that is both facially and symmetry reduced.
Proposition 5.7.The facially reduced program of the symmetry reduced DNN relaxation (5.8) is given by min (5.13) Here, the columns of Ṽk ∈ R n k ×n k form a basis of the null space of the W k ∈ S n .Proof.Applying Theorem 3.6 to the block-diagonal matrix (Q⊗I + for every k = 1, . . ., t.Finally, we apply Corollary 3.21 to remove redundant constraints, see also Lemma 5.4.This yields the formulation (5.13).
Note that in the case that the basis elements A i (i = 0, . . ., d) belong to the Hamming scheme, see Example 5.6, it follows that t = d + 1 in the above Proposition 5.7.

On solving QAP with ADMM
Now we discuss how to use ADMM to solve the DNN relaxation (5.13), for the particular case when A i (i = 0, 1, . . ., d) form a basis of the Bose-Mesner algebra of the Hamming scheme.We proceed as in Section 4, and exploit properties of the known algebra, see Example 2.5.Clearly, for any other algebra we can proceed in a similar way.We assume without loss of generality that all the matrices Ṽj in this section have orthonormal columns.
First, we derive the equivalent reformulation of the DNN relaxation (5.13), by exploiting the following.
(1) Since we remove the repeating blocks of positive semidefinite constraints, to apply ADMM we have to reformulate the DNN in such a way that Assumption 4.1 is satisfied.Let us first derive an expression for the objective function as follows. trace Then, we multiply the coupling constraints B * i (y) = Ṽi Ri Ṽ T i by the square root of its multiplicities.Thus, for the Bose-Mesner algebra, we end up with (2) In many applications, it is not necessary to compute high-precision solutions, and the ADMM can be terminated at any iteration.Then, one can use the dual variable Zj from the current iteration to compute a valid lower bound, see Lemma 5.9.By adding redundant constraints, this lower bound is improved significantly when the ADMM is terminated with low-precision.Therefore we add the following redundant constraints Y 0 = 1 n I, trace( Rj ) = √ µ j p 0,j for j = 0, . . ., d. ( To see the redundancy of the last d + 1 constraints above, we use the fact that the columns of Ṽj are orthonormal, and that diag(Y i ) = 0, i = 1, . . ., d, to derive This technique can also be found in [26,35,44,48].
We would like to emphasize that the techniques above are not restricted to the Bose-Mesner algebra of the Hamming scheme.Let us present our reformulated DNN relaxation for ADMM .Define and R := {( R0 , . . ., Rd ) | trace( Rj ) = √ µ j p 0,j , Ri ∈ S n + , i = 0, . . ., d}. (5.16) We obtain the following DNN relaxation for our ADMM . (5.17) The augmented Lagrangian is The Y -subproblem, the R-subproblem and the dual update are represented below.
1.The Y -subproblem: (5.18) 2. The R-subproblems, for j = 0, . . ., d: (5.19) 3. Update the dual variable: Clearly, the R-subproblems can be solved in the same way as (4.1).To see that the Ysubproblem can also be solved efficiently, let us show that it is a problem of the form (4.3), and thus satisfies Assumption 4.1.
To efficiently solve the Y -subproblem for the QAP , we use Algorithm 4.3.Finally we describe how to obtain a valid lower bound when the ADMM model is solved approximately.The important problem of getting valid lower bounds from inaccurate solvers is recently discussed in [19].Lemma 5.9.Let P be the feasible set defined in (5.15), and consider the problem in (5.17).For any Z = ( Z0 , . . ., Zd ), the objective value i.e., it provides a lower bound to the optimal value p * of (5.17).
Proof.The dual of (5.17It follows from the Rayleigh Principle, that the optimal value of the second minimization problem is − d j=0 µ j p 0,j λ max ( Ṽ T j Zj Ṽj ).Using strong duality, we have g( Z) ≤ d * = p * .

Numerical results for the QAP
In this section we provide numerical results on solving the facially and symmetry reduced DNN relaxation (5.13).We first present our general stopping conditions and tolerances in Definition 5.10.
Definition 5.10 (tolerances, stopping conditions).Given a tolerance parameter, , we terminate the ADMM when one of the following conditions is satisfied.
• The primal and dual residuals are smaller than , i.e., pres := • Let p k be the ADMM objective value, and d k := g( Z) the dual objective value at some dual feasible point at the k-th iteration, see (5.22).If the duality gap is not improving significantly, i.e., for 20 consecutive integers k, then we conclude that there is stagnation in the objective value.We measure the gap only every 100-th iteration due to the expense of computing the dual objective value d k .) In our QAP experiments, we use = 10 −12 if n ≤ 128, and = 10 −5 when n = 256, 512.The objective value from the ADMM is denoted by OBJ , and the valid lower bound obtained from the dual feasible solution is denoted by LB , see Lemma 5.9.The running times in all tables are reported in seconds.We also list the maximum of the primal and dual residuals, i.e., res := max{pres, dres}.If a result is not available, we put -in the corresponding entry.For details see [42].In rand 256 and rand 512 instances, A is the Hamming distance matrix of appropriate size and B is a random matrix.
In the first column of Table 5.1 we list the instance names where the sizes of the QAP matrices are indicated after the underscore.Upper bounds are given in the column two.For instances with up to 128 nodes we list the upper bounds computed in [42], and for the remaining instances we use our heuristics.Since data matrices for the Harper instances are integer, we round up lower bounds to the closest integer.In the column three (resp.four) we list SDP -based lower bounds (resp.computation times in seconds) from [42].The bounds from [42] are obtained by solving an SDP relaxation having several matrix variables on order n.The bounds in [42] were computed on a 2.67GHz Intel Core 2 computer with 4GB memory.In the columns five to seven, we present the results obtained by using our ADMM algorithm.2. The second set of test instances are Eschermann, Wunderlich, esc, instances from the QAPLIB library [9].In esc nx instance, the distance matrix A is the Hamming distance matrix of order n = 2 d , whose automorphism group is the automorphism group of the Hamming graph H(d, 2).In [15] the authors exploit symmetry in esc instances to solve the DNN relaxation (5.8) by the interior point method.That was the first time that SDP bounds for large QAP instances were computed by exploiting symmetry.In particular, the authors from [15] needed 13 seconds to compute the SDP bound for esc64a, and 140 seconds for computing the esc128 SDP bound, see also Table 5.2.The bounds in [15] are computed by the interior point solver SeDuMi [57] using the Yalmip interface [38] and Matlab 6.5, implemented on a PC with Pentium IV 3.4 GHz dual-core processor and 3GB of memory.Computational times in [15] include only solver time, not the time needed for Yalmip to construct the problem.
In [44] the authors approximately solve the DNN relaxation (5.8) using the ADMM algorithm, but do note exploit symmetry.Here, we compare computational results from [44] with the approach we present in this paper.All the instances from [44] were tested on an Intel Xeon Gold 6130 2.10 Ghz PC with 32 cores and 64 GB of memory and running on 64-bit Ubuntu system.
An efficient solver, called SDPNAL+, for solving large scale SDPs is presented in [59,68].SDPNAL+ implements an augmented Lagrangian based method.In particular, the implementation in [68] is based on a majorized semismooth Newton-CG augmented Lagrangian method, and the implementation in [59] is based on an inexact symmetric Gauss-Seidel based semi-proximal ADMM .In [59,68], the authors present extensive numerical results that also include solving (5.1) on various instances from the QAPLIB library [9].However, they do not perform FR and SR.In Table 5.2 we include results from [59] for solving esc nx, with n = 16, 32.There are no results for n = 64, 125 presented in their paper.Moreover the authors emphasize that SDPNAL+ is for solving SDPs where the maximum matrix dimension is assumed to be less than 5000.Due to the use of different computers, the times in Table 5.2 are not comparable.For example, the authors from [59] use an Intel Xeon CPU E5-2680v3, 2.50 GHz with 12 cores and 128 GB of memory.
In Table 5.2 we present the numerical result for the esc instances.In particular, we compare bounds and computational times of the relaxation (5.1) (no reductions, solved in [59]), the facially reduced relaxation (5.6) (solved in [44]), the symmetry reduced relaxation (5.8) (solved in [15]), and facially and symmetry reduced relaxation (5.13) (solved by our ADMM ).
We conclude that: 1.There are notably large differences in computation times between the ADMM algorithm presented here and the one from [44], since the latter does not exploit symmetry.
2. Even if the use of different computers is taken into account, this would likely not be enough to account for the time differences observed between our ADMM and SDPNAL+ [59].Moreover, SDPNAL+ was not able to solve several instances.
3. In [15], the authors use SeDuMi to solve a relaxation equivalent to the symmetry reduced program (5.8); and they obtain a LB of 53.0844 for esc128.However, the bounds for this instance for the facially and symmetry reduced program (5.13) computed by the Mosek interior point method solver is 51.7516; and our ADMM algorithm reports 51.7518.This illustrates our improved numerical accuracy using FR and SR, and validates the statements about singularity degree, see Section 3.4.We note in addition that we provide a theoretically guaranteed lower bound, as well as solve huge instances that are intractable for the approach in [15].

The graph partition problem (GP)
The graph partition problem is the problem of partitioning the vertex set of a graph into a fixed number of sets of given sizes such that the sum of edges joining different sets is minimized.The problem is known to be NP-hard.The GP has many applications such as VLSI design, parallel computing, network partitioning, and floor planing.Graph partitioning also plays a role in machine learning (see e.g., [34]) and data analysis (see e.g., [47]).There exist several SDP relaxations for the GP of different complexity and strength, see e.g., [30,53,54,63,67].

The general GP
Let G = (V, E) be an undirected graph with vertex set V , |V | = n and edge set E, and k ≥ 2 be a given integer.We denote by A the adjacency matrix of G.The goal is to find a partition of the vertex set into k (disjoint) subsets S 1 , . . ., S k of specified sizes m 1 ≥ . . .≥ m k , where k j=1 m j = n, such that the sum of weights of edges joining different sets S j is minimized.Let denote the set of all partitions of V for a given m = (m 1 , . . ., m k ).In order to model the GP in binary variables we represent the partition S ∈ P m by the partition matrix X ∈ R n×k where the column j is the incidence vector for the set S j .
The GP can be stated as follows min is the set of partition matrices.
with each Y (ij) being a k × k matrix, and Here G(•) is the gangster operator for the GP .To compute DNN bounds for the GP , we apply FR for symmetry reduced relaxation (5.27).The details are similar to the QAP , and thus omitted.
We present numerical results for different graphs from the literature.Matrix can161 is from the library Matrix Market [4], matrix grid3dt5 is 3D cubical mesh, and gridtxx matrices are 2D triangular meshes.Myciel7 is a graph based on the Mycielski transformation and 1 FullIns 4 graph is a generalization of the Mycielski graph.Both graphs are used in the COLOR02 symposium [28].
1.In Table 5  we provide information on the graphs and the considered 3-partition problems.In particular, the first column specifies graphs, the second column provides the number of vertices in a graph, the third column is the number of orbits after symmetrization, the fourth column lists the number of blocks in Q T AQ.Here, the orthogonal matrix Q is computed by using the heuristic from [15].The last column specifies sizes of partitions.
2. In Table 5 we list lower bounds obtained by using Mosek and our ADMM algorithm.The table also presents computational times required to compute bounds by both methods as well as the number of interior point method iterations.The results show that the ADMM with precision = 10 −3 provides competitive bounds in much shorter time than the interior point method solver.In Table 5.4, some instances are marked by * .This means that our 64GB machine did not have enough memory to solve these instances by the interior point method solver, and therefore they are solved on a machine with an Intel(R) Xeon(R) Gold 6126, 2.6 GHz quad-core processor and 192GB of memory.However, the ADMM algorithm has much lower memory requirements, and thus the ADMM is able to solve all instances from Table 5.4 on the smaller machine.Then m 3 + 1 is a lower bound for the vertex separator problem with respect to the choice of m.One may tend to solve (5.27) for all possible m 3 between 0, 1, . . ., |V | − 1 to find the largest m 3 for which the DNN bound is positive.However, the optimal value of (5.27) is monotone in m 3 , and thus we find the appropriate m 3 using binary search starting with m 3 = n 2 .We present the lower bound on the vertex separator, i.e., m 3 + 1 in the third column of Table 5.6.The total number of problems solved is listed in the fourth column of the same table.The running time given in the last two columns is the total amount of time used to find a positive lower bound for (5.27) for some m 3 by using Mosek and our ADMM algorithm, respectively.This task is particularly suitable for the ADMM , as we can terminate the ADMM once the lower bound in an iterate is positive.For example, it takes 786 seconds to solve the min-cut relaxation on queen12 12 by Mosek, see Table 5  We conclude from Tables 5.6 and 5.7 that • For small instances, the interior point algorithm is faster than the ADMM as shown in Table 5.7.For larger instances, the interior point algorithm has memory issues.However, the ADMM algorithm can still handle large instances due to its low memory demand.
• To obtain bounds on the vertex separator of a graph, one does not need to solve the DNN relaxation to high-precision.The ADMM is able to exploit this fact, and find a lower bound on the size of the vertex separator in significantly less amount of time than the interior point algorithm, see Table 5.6.
• The symmetry reduced program without FR is heavily ill-conditioned, and the interior point method is not able to solve it correctly for any of the instances.The running time is also significantly longer than the symmetry and facially reduced program, see Table 5.7.
Note that we have solved the queen10 10 problem with high accuracy with FR .The distance between the optimal solutions in norm was very large with no decimals of accuracy.This emphasizes the importance of FR in obtaining accuracy in solutions, see e.g., [56].

Conclusion
In this paper we propose a method to efficiently implement facial reduction to the symmetry reduced SDP relaxation, and we demonstrated the efficiency by solving large scale NP-hard problems.More specifically, if an exposing vector of the minimal face for the input SDP is given, then we are able to construct an exposing vector of the minimal face for the symmetry reduced SDP.The resulting relaxation is symmetry reduced, satisfies the Slater condition, and thus can be solved with improved numerical stability.
We then extend our reduction technique to doubly nonnegative, DNN, programs.In fact, our approach allows the matrix variable of the original SDP, to be passed to simple nonnegative vector for the DNN.Again we exploit exposing vectors of DNN as a decomposition into a sum of a semidefinite and nonnegative exposing vectors.Further, we discuss the importance of the order of the reductions in our theory.We also show that the singularity degree of the symmetry reduced program is equal to the singularity degree of the original program.
We apply our technique to many combinatorial problems and their DNN relaxations, i.e., we facially and symmetry reduce them.The obtained relaxations can be solved extremely efficiently using the alternating direction method of multipliers.We also show that interior point methods are more efficient on a symmetry and facially reduced relaxation.As a result, we are able to compute improved lower bounds for some QAP instances in significantly less time.

Theorem 3 .
15 (facial reduction).Suppose that the affine hull, aff(conv(Q )) = L and dim(L) = d.Then there exist A and b with A full row rank such that

Lemma 4 . 4 .
The x-subproblem (4.2) satisfies Assumption 4.1, if 0 are generally called the gangster constraints, see Lemma 5.4.The third and fourth lines of constraints in (5.1) arise from the row and column sum constraints.
the matrices W k are the exposing vectors of the symmetry reduced program (5.8), and thus W k B * k (y) = 0 for every k = 1, . . ., t.This means that there exists a full column rank matrix Ṽk ∈ R n k ×n k such that B * k (y) = Ṽk Rk Ṽ T k , where Rk ∈ S n k

1 .
The first set of test instances are from Mittelmann and Peng[42], where the authors compute SDP bounds for the QAP with A being the Hamming distance matrix.Choices of the matrix B 7 differ for different types of instances.In particular, in the Harper instance Harper n where n = 2 d we have B ij = |i − j| for all i, j = 1, . . ., 2 d .Further eng1 n and end9 n with n = 2 d , d = 4, . . ., 9 refer to the engineering problems, and VQ n instances have random matrices B.
(J k − I k )X T ),whereM m = {X ∈ {0, 1} n×k | Xe k = e n , X T e n = m}(5.26) the matrix of all ones (resp.the identity matrix) of appropriate size.The basis (2.6) forms a so-called coherent configuration.
Definition 2.2 (coherent configuration).A set of zero-one n × n matrices {B 1 , . . ., B d } is called a coherent configuration of rank d if it satisfies the following properties: 1. i∈I B i = I for some I ⊂ {1, . . ., d}, and d i=1 Recall that we assumed that the data matrices of the SDP problem (2.1) are contained in the matrix * -algebra A G , see Section 2.2.Let k > 1.Using V = Q Ṽ where V and Ṽ are derived in the previous iterate of the FR algorithm, (3.14) and (3.16), we have that ).Since we have that A * (y) ∈ A G and W ∈ A G , it follows from (3.17) that W = A * (y), and from (3.16) and (3.14) that A * (y) 0.

Table 5 .
1shows that we significantly improve bounds for all eng1 n and eng9 n instances.Moreover, we are able to compute bounds for huge QAP instances with n = 256 and n = 512 in a reasonable amount of time.Recall that for given n, the order of the matrix variable in the DNN relaxation of the QAP (5.1) is n 2 .However, for each instance xx n of Table5.1 we have that n = 2 d , and that the DNN relaxation (5.1) boils down to d + 1 positive semidefinite blocks of order n.In particular, we obtain the bound for each instance in

Table 5 .
1: Lower and upper bounds for different QAP instances.

Table 5 .
5: The Queen graphs and partitions.2. In Table 5.6 we provide the numerical results for the vertex separator problem.More specifically, we are computing the largest integer m 3 such that the solution value of the DNN relaxation (5.27) is positive with partition .7.However, though not shown in the table, it takes ADMM only 120 seconds to conclude that the optimal value is positive.

Table 5 .
6:The vertex separator problem on the Queen graphs.3.In Table5.7 we compare bounds and computational times required to solve, for fixed m, symmetry reduced DNN relaxation (5.27) by the interior point algorithm, as well as symmetry and facially reduced relaxation (5.27) by using Mosek and our ADMM algorithm.

Table 5 .
7:The min-cut problem on the Queen graphs.