1 Introduction

Computing the crossing number \(\mathop {\textrm{cr}}\limits (K_{m,n})\) of the complete bipartite graph \(K_{m,n}\) is a long-standing open problem, which goes back to Turán in the 1940s. In 1956, Zarankiewicz [28] conjectured that \(\mathop {\textrm{cr}}\limits (K_{m,n})= Z(m,n)\), where Z(mn) is the Zarankiewicz number

$$\begin{aligned} Z(m,n):=\left\lfloor \tfrac{m-1}{2}\right\rfloor \left\lfloor \tfrac{m}{2}\right\rfloor \left\lfloor \tfrac{n-1}{2}\right\rfloor \left\lfloor \tfrac{n}{2}\right\rfloor =\left\lfloor \tfrac{1}{4}(m-1)^2\right\rfloor \left\lfloor \tfrac{1}{4}(n-1)^2\right\rfloor . \end{aligned}$$

Zarankiewicz claimed to have a proof for his conjecture, but this turned out to be false. The conjecture thus remains a notorious open problem. As Erdős and Guy [9] wrote in 1973: ‘Almost all questions that one can ask about crossing numbers remain unsolved’, which is still true today. It is known that \(\mathop {\textrm{cr}}\limits (K_{m,n}) \le Z(m,n)\), by exhibiting an explicit drawing of \(K_{m,n}\) in the plane with Z(mn) crossings—see Fig. 1 for an example.

In this paper, we use semidefinite programming and representation theory to prove the following lower bounds.

Theorem 1

For all integers n,

$$\begin{aligned} \mathop {\textrm{cr}}\limits (K_{10,n})&\ge 4.87057 n^2 - 10n, \\ \mathop {\textrm{cr}}\limits (K_{11,n})&\ge 5.99939 n^2-12.5n, \\ \mathop {\textrm{cr}}\limits (K_{12,n})&\ge 7.25579 n^2 - 15n, \\ \mathop {\textrm{cr}}\limits (K_{13,n})&\ge 8.65675 n^2-18n. \end{aligned}$$

This theorem and Corollary 1 below yield the best known lower bounds on all fixed \(\mathop {\textrm{cr}}\limits (K_{m,n})\) with \(m,n \ge 10\). The best previously known lower bounds for \(m,n\ge 10\) are \(\mathop {\textrm{cr}}\limits (K_{m,n}) \ge \tfrac{(m-1)m}{72} (3.86760n^2-8n)\), cf. [6]. For an overview of known results regarding Zarankiewicz’s conjecture, see the survey by Székely [26], or the survey about crossing numbers by Schaefer [23].

Fig. 1
figure 1

Optimal drawing of \(K_{7,5}\)

We now sketch how these lower bounds are derived. For \(m \in {\mathbb {N}}\), let \(Z_m\) be the set of permutations of \([m]:=\{1,\ldots ,m\}\) consisting of a single orbit, i.e., \(Z_m\) is the set of all m-cycles from \(S_m\) and \(|Z_m|=(m-1)!\). Let \(K_{m,n}\) have colour classes \(\{1,\ldots ,m\}\) and \(\{b_1,\ldots ,b_n\}\). For any given drawing of \(K_{m,n}\) in the plane, define \(\gamma (b_i)\) to be the cyclic permutation \((1,i_2,\ldots ,i_m) \in Z_m\) with the property that the edges leaving \(b_i\) in clockwise order go to \(1,i_2,\ldots ,i_m\) (Table 1).

Table 1 Some of our new lower bounds on \(\mathop {\textrm{cr}}\limits (K_{n,n})\)

Let Q be the \(Z_m \times Z_m\) matrix with for any \(\sigma ,\tau \in Z_m\), the entry \(Q_{\sigma ,\tau }\) is equal to the minimum number of crossings in any drawing of \(K_{m,2}\) with \(\gamma (b_1) =\sigma \) and \(\gamma (b_2)=\tau \). This matrix was defined in [5] and later also used in [6]. An algorithm to compute \(Q_{\sigma ,\tau }\) was used by Kleitman [15] and more details were described by Woodall [27]. For example, \(Q_{\sigma ,\sigma }=\left\lfloor \tfrac{1}{4} (m-1)^2\right\rfloor \) for all \(\sigma \in Z_m\). Let \({\varvec{1}} \in {\mathbb {R}}^{Z_m}\) denote the all-ones vector. Consider the following quadratic program.

$$\begin{aligned} q_m := \textrm{min}\left\{ x^\textsf{T}Q x \, | \, x \in {\mathbb {R}}^{Z_m}_{\ge 0}, \, x^\textsf{T}{\varvec{1}} =1 \right\} . \end{aligned}$$
(1)

Theorem 2

(De Klerk et al. [5]) \(\mathop {\textrm{cr}}\limits (K_{m,n})\ge \tfrac{1}{2}n^2q_m - \tfrac{1}{2} n \left\lfloor \tfrac{1}{4} (m-1)^2\right\rfloor \) for all mn.

Proof

Suppose a drawing of \(K_{m,n}\) with \(\mathop {\textrm{cr}}\limits (K_{m,n})\) crossings is given. For each \(\sigma \in Z_m\), let \(c_{\sigma }\) be the number of vertices \(b_i\) with \(\gamma (b_i)=\sigma \). We view c as a vector in \({\mathbb {R}}^{Z_m}\) and define \(x:= n^{-1} c\). Then x satisfies the conditions in (1), so \(q_m \le x^\textsf{T}Qx\). For \(i,j \in [n]\) let \(d_{i,j}\) be the number of crossings of edges leaving \(b_i\) with edges leaving \(b_j\). By definition of Q, if \(i\ne j\), then \(d_{i,j} \ge Q_{\gamma (b_i),\gamma (b_j)}\). This implies

$$\begin{aligned} \tfrac{1}{2} n^2 q_m&\le \tfrac{1}{2} n^2 x^\textsf{T}Qx = \tfrac{1}{2}c^\textsf{T}Q c = \tfrac{1}{2}\sum _{i,j=1}^n Q_{\gamma (b_i),\gamma (b_j)} \le \sum _{\begin{array}{c} i< j \end{array}} d_{i,j} + \tfrac{1}{2}\sum _{i=1}^n Q_{\gamma (b_i),\gamma (b_i)} \\&{\le } \mathop {\textrm{cr}}\limits (K_{m,n}) + \tfrac{1}{2}n\left\lfloor \tfrac{1}{4}(m-1)^2\right\rfloor , \end{aligned}$$

where the last inequality follows from \(Q_{\sigma ,\sigma }=\left\lfloor \tfrac{1}{4} (m-1)^2\right\rfloor \) for all \(\sigma \in Z_m\). (In fact, the last inequality is an equality as one may assume that in an optimal drawing edges incident to a common vertex do not cross, cf. [10].) \(\square \)

The following semidefinite programming parameter \(\alpha _m\) is a lower bound on \(q_m\).

$$\begin{aligned} \alpha _m := \textrm{min}\left\{ \langle Q, X\rangle \, | \, X \in {\mathbb {R}}^{Z_m \times Z_m}_{\ge 0}, \, \langle J,X \rangle =1, \, X \succeq 0 \right\} . \end{aligned}$$
(2)

Here \(X\succeq 0\) means ‘X symmetric and positive semidefinite’. It is clear that \(q_m \ge \alpha _m\), as any feasible x for \(q_m\) gives a feasible \(X=xx^\textsf{T}\) for \(\alpha _m\) with the same objective value. The values \(\alpha _m\) for \(m\le 7\) were computed by De Klerk, Maharry, Pasechnik, Richter, and Salazar [5]. Dobre and Vera [7] computed a better lower bound on \(q_7\) using semidefinite approximations of the copositive cone. The values \(\alpha _8\) and \(\alpha _9\) were computed by De Klerk, Pasechnik and Schrijver [6], who used the regular \(*\)-representation to reduce the semidefinite programs in size. The regular \(*\)-representation found several applications (see, e.g., [17] for an application in coding theory). In this paper, we show how a full block-diagonalization can be obtained, where we exploit properties of the representation theory of the symmetric group for computational efficiency. This allows us to compute the value \(\alpha _{10}\).

A full symmetry reduction for computing \(\alpha _m\) has been developed before by Hymabaccus and Pasechnik [13]. Their method can be used to decompose representations of finite groups exactly. Due to the generic nature of their algorithm, they work with representation matrices instead of vectors in the representative sets. This costs a lot of memory (and time), so they only reach \(\alpha _7\) with their method. In the crossing number case, the coefficients in their block-diagonalization contain irrational numbers, potentially leading to rounding issues in floating-point computations. An advantage of our approach is, apart from being more memory and time efficient, that it results in an exact block-diagonalization with integer coefficients.

Our symmetry reduction consists of three steps. First, we use classical representation theory of the symmetric group \(S_m\) to decompose a well-known permutation module. Secondly, we we use an elementary but crucial observation given in Proposition 1, to transform this decomposition into a decomposition of \({\mathbb {R}}^{Z_m}\) as \(S_m\)-module. Proposition 1 has potential for a wide array of applications, for example, it can also be directly applied to a problem in coding theory, which we describe in Remark 1 below. The third and final step in our block-diagonalization takes into account a separate \(\{\pm 1 \}\)-action, in Proposition 3.

Inspired by our symmetry reduction, we also formulate a new relaxation of \(\alpha _m\), which we call \(\beta _m\). The value \(\beta _m\) is obtained from (2) by only requiring that one specified block, which is described in Sect. 4 below, in the block-diagonalization is positive semidefinite instead of the full matrix X. So \(\beta _m \le \alpha _m\), and our experiments show that the new bound \(\beta _m\) is remarkably close to \(\alpha _m\). We give a combinatorial desciption of the vectors which underly the block-diagonalization of \(\beta _m\) in Proposition 4. Also, we compute the value \(\beta _m\) for \(m\le 13\). The values are provided in Table 2. Inserting our newly computed values \(\alpha _{10}\), \(\beta _{11}\), \(\beta _{12}\), \(\beta _{13}\) in Theorem 2 instead of \(q_k\) (using the fact that \(\beta _k \le \alpha _k \le q_k\)), we directly obtain our new bounds in Theorem 1.

Table 2 The full semidefinite bound \(\alpha _m\) from (2) and our relaxation \(\beta _m\) which is described in Sect. 4. We solved the SDPs with multiple precision versions of SDPA [19], and then rounded the dual solutions to rational feasible dual solutions, see Sect. 5.5

1.1 Outline of the paper

In Sect. 2 we explain the consequences of Theorem 1: we investigate to which bounds it leads and relate these bounds to the literature. In Sect. 3 we explain how the symmetry can be used to significantly reduce the problem: we develop a full block-diagonalization. To do this, we use representation theory from the symmetric group and linear algebra. After that, we explain in Sect. 4 how our new relaxation \(\beta _m\) of \(\alpha _m\) is defined, which is inspired by the symmetry reduction. We give a combinatorial desciption of the vectors which underly the block-diagonalization of \(\beta _m\). Finally, in Sect. 5 we give details about our computations. Here we explain how \(\beta _m\) can be computed in practice: using the dual description in combination with an iterative procedure, we are able to compute \(\beta _m\) for \(m \le 13\) up to high precision on a desktop computer.

2 Derived lower bounds

Suppose that \(2 \le k \le m\) and that \(n \in {\mathbb {N}}\). There are \(\left( {\begin{array}{c}m\\ k\end{array}}\right) \) distinct copies of \(K_{k,n}\) in \(K_{m,n}\), and in any drawing of \(K_{m,n}\), each crossing appears in \(\left( {\begin{array}{c}m-2\\ k-2\end{array}}\right) \) distinct copies of \(K_{k,n}\). This implies that

$$\begin{aligned} \mathop {\textrm{cr}}\limits (K_{m,n}) \ge \frac{\mathop {\textrm{cr}}\limits (K_{k,n}) \left( {\begin{array}{c}m\\ k\end{array}}\right) }{\left( {\begin{array}{c}m-2\\ k-2\end{array}}\right) } = \frac{\mathop {\textrm{cr}}\limits (K_{k,n}) \cdot m(m-1)}{k(k-1)}. \end{aligned}$$
(3)

So any lower bound on \(q_k\) gives lower bounds on \(\mathop {\textrm{cr}}\limits (K_{m,n}) \) for all \(m \ge k\) and all n. Combining (3) with our new lower bounds \(\alpha _{10}\), \(\beta _{11}\), \(\beta _{12}\), \(\beta _{13}\) presented in Table 2 gives the following.

Corollary 1

For all integers n we have

$$\begin{aligned} \hbox { for all~}\ m \ge 10,\,\, \mathop {\textrm{cr}}\limits (K_{m,n})&\ge 0.0541m(m-1)n^2 - \tfrac{1}{9}m(m-1)n, \\ \hbox { for all~}\ m \ge 11,\,\, \mathop {\textrm{cr}}\limits (K_{m,n})&\ge 0.0545m(m-1)n^2 - \tfrac{5}{44}m(m-1)n, \\ \hbox { for all~}\ m \ge 12,\,\, \mathop {\textrm{cr}}\limits (K_{m,n})&\ge 0.0549m(m-1)n^2 - \tfrac{5}{44}m(m-1)n, \\ \hbox { for all~}\ m \ge 13,\,\, \mathop {\textrm{cr}}\limits (K_{m,n})&\ge 0.0554m(m-1)n^2 - \tfrac{3}{26}m(m-1)n. \end{aligned}$$

Proof

By Theorem 2, we have \(\mathop {\textrm{cr}}\limits (K_{k,n})\ge \tfrac{1}{2}n^2q_k - \tfrac{1}{2} n \left\lfloor \tfrac{1}{4} (k-1)^2\right\rfloor \) for all kn. We also have \(q_k \ge \alpha _k \ge \beta _k\) for all k, hence the inequality holds upon replacing \(q_k\) by \(\alpha _k\) or \(\beta _k\). Combining this equation with our computed values \(\alpha _{10}\), \(\beta _{11}\), \(\beta _{12}\), \(\beta _{13}\) results in lower bounds on \(\mathop {\textrm{cr}}\limits (K_{10,n})\), \(\mathop {\textrm{cr}}\limits (K_{11,n})\), \(\mathop {\textrm{cr}}\limits (K_{12,n})\) and \(\mathop {\textrm{cr}}\limits (K_{13,n})\), respectively. Inserting these lower bounds in equation (3) for \(\mathop {\textrm{cr}}\limits (K_{k,n})\) yields the corollary.

The lower bounds also allow to give statements about limits, using the following lemma.

Lemma 1

(De Klerk et al. [5]) \(\displaystyle {\lim _{n \rightarrow \infty } \frac{\mathop {\textrm{cr}}\limits (K_{m,n})}{Z(m,n)} \ge \frac{8 q_k}{k(k-1)} \frac{m}{m-1}}\) for all \(k \le m\).

Proof

First, note that the limit exists: the sequence \((\text {cr}(K_{m,n})/\genfrac(){0.0pt}1{n}{2})_{n \in {\mathbb {N}}}\) for fixed m is nondecreasing (by the same calculation as in (3) but now applied to n instead of m) and bounded (using \(\text {cr}(K_{m,n}) \le Z_{m,n}\)), hence has a limit. For fixed m, both \(Z_{m,n}\) and \(\genfrac(){0.0pt}1{n}{2}\) grow quadratically in n, so the limit \(\frac{\mathop {\textrm{cr}}\limits (K_{m,n})}{Z(m,n)})_{n \in {\mathbb {N}}}\) exists too. The lemma now follows from an elementary calculation using the bounds previously given. By Theorem 2, we have \(\mathop {\textrm{cr}}\limits (K_{k,n})\ge \tfrac{1}{2}n^2q_k - \tfrac{1}{2} n \left\lfloor \tfrac{1}{4} (k-1)^2\right\rfloor \) for all kn. Now, we use (3) and find, for \(m\ge k\):

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{\mathop {\textrm{cr}}\limits (K_{m,n})}{Z(m,n)}&\ge \lim _{n \rightarrow \infty } \frac{m(m-1)(\tfrac{1}{2}n^2q_k - \tfrac{1}{2} n \left\lfloor \tfrac{1}{4} (k-1)^2\right\rfloor )}{k(k-1)Z_{m,n}} \\ {}&= \lim _{n \rightarrow \infty } \frac{m(m-1)(\tfrac{1}{2}n^2q_k - \tfrac{1}{2} n \left\lfloor \tfrac{1}{4} (k-1)^2\right\rfloor )}{k(k-1)\left\lfloor \tfrac{1}{4}(m-1)^2\right\rfloor \left\lfloor \tfrac{1}{4}(n-1)^2\right\rfloor } \\ {}&= \frac{2 q_k}{k(k-1)} \frac{m(m-1)}{\left\lfloor \tfrac{1}{4}(m-1)^2\right\rfloor } \ge \frac{8 q_k}{k(k-1)} \frac{m}{m-1}. \end{aligned}$$

\(\square \)

As \(q_k \ge \alpha _k \ge \beta _k\), the lemma also holds upon replacing \(q_k\) by \(\alpha _k\) or \(\beta _k\). So our computed values \(\alpha _{10}\), \(\beta _{11}\), \(\beta _{12}\), \(\beta _{13}\) give asymptotic lower bounds on \(\lim _{n \rightarrow \infty } \frac{\mathop {\textrm{cr}}\limits (K_{m,n})}{Z(m,n)} \) for \(m \ge k\). In the following lemma, we provide the lower bound for \(m \ge 13\), using our computed value \(\beta _{13}\). The lower bounds for \(m=10,11,12\) are displayed in Table 2.

Corollary 2

For all \(m \ge 13\), \( \displaystyle {\lim _{n \rightarrow \infty } \frac{\mathop {\textrm{cr}}\limits (K_{m,n})}{Z(m,n)} \ge 0.8878 \tfrac{m}{m-1}.} \)

A direct result of this corollary is

$$\begin{aligned} \displaystyle {\lim _{n \rightarrow \infty } \frac{\mathop {\textrm{cr}}\limits (K_{n,n})}{Z(n,n)} \ge 0.8878.} \end{aligned}$$
(4)

The previously best known published lower bound on \( \lim _{n \rightarrow \infty } \frac{\mathop {\textrm{cr}}\limits (K_{n,n})}{Z(n,n)} \) is 0.8594 (which follows using \(\alpha _9\)), cf. De Klerk et al. [6]. Norin and Zwols obtained a lower bound of 0.905 using flag algebras which they presented at a workshop [20]. This bound is stronger than our bound in (4) but however remains unpublished. In [1], Balogh, Lidický and Salazar prove very strong asymptotic lower bounds on the crossing number of the complete graph using flag algebras. It might be possible to improve upon (4) and Norin and Zwols’ bound by using a similar approach considering high levels in the flag algebra hierarchy.

However, in order to prove asymptotic bounds it is also worthwhile to further investigate the quadratic programming hierarchy from De Klerk et al. [5] which we consider in this paper. One might hope to prove lower bounds \(t_k\) on \(\alpha _k\) such that \(8t_k /(k(k-1)) \rightarrow 1\) as \(k \rightarrow \infty \), thereby proving \(\lim _{n \rightarrow \infty } \frac{\mathop {\textrm{cr}}\limits (K_{n,n})}{Z(n,n)}=1\), i.e., asymptotically proving Zarankiewicz’ conjecture. Figure 2 gives rise to the question whether \(8\beta _k /(k(k-1)) \rightarrow 1\) as \(k \rightarrow \infty \).

Fig. 2
figure 2

We have the lower bound \(\lim _{n \rightarrow \infty } \text {cr}(K_{m,n})/Z(m,n) \ge (8\gamma _k/(k(k-1))) m/(m-1)\) for each \(m\ge k\) and \(\gamma _k \in \{\alpha _k,\beta _k\}\). The values \(8\alpha _k/(k(k-1))\) are plotted in green (connected by the green dashed line) and the values \(8\beta _k/(k(k-1))\) are plotted in blue (colour figure online).

In Fig. 2, the increases are larger for odd k than for even k, a trend which was already noted in [6]. We now see that this trend continues for some larger k. As noted in [6], this is reminiscent of the fact that Zarankiewicz’s conjecture holds for \(K_{2m,n}\) if it holds for \(K_{2m-1,n}\).

3 Exploiting the symmetry of the problem

Recall that \(Z_m\) is the set of permutations of [m] consisting of a single orbit, i.e., \(Z_m\) is the set of all m-cycles from \(S_m\) and \(|Z_m|=(m-1)!\). The group \(G_m:=S_m \times \{ \pm 1\}\) acts on \(Z_m\) via

$$\begin{aligned} (\pi ,\varepsilon ) \cdot \sigma = \pi \sigma ^{\varepsilon } \pi ^{-1}, \end{aligned}$$

for \(\sigma \in Z_m\), \((\pi ,\varepsilon ) \in G_m= S_m \times \{ \pm 1 \}\). If X is any optimum solution for the program (2) defining \(\alpha _m\), also \(g \cdot X\) with \((g \cdot X)_{\sigma , \tau } = X_{g \cdot \sigma , g \cdot \tau }\) is feasible for all \(g \in G_m\): the matrix \(g \cdot X\) is obtained from X by simultaneously permuting rows and columns, which preserves positive semidefiniteness, entrywise nonnegativeness and the total sum of the entries. Moreover, the objective values corresponding to X and \(g \cdot X\) are the same: Indeed, as \(g \cdot Q = Q\) for all \(g \in G_m\), one has \(\langle Q, X \rangle = \langle g \cdot Q, g\cdot X \rangle =\langle Q, g\cdot X \rangle \). As \(G_m\) is a finite group and the feasible region in (2) is convex, we can replace any optimum solution X by the group average \((1/|G_m|)\sum _{g \in G_m} g \cdot X\) to obtain a \(G_m\)-invariant optimum solution. So we may assume our optimum solution is \(G_m\)-invariant, i.e., its entries are constant on \(G_m\)-orbits of \(Z_m \times Z_m\). Hence the number of variables is the cardinality of \(\Omega _m:= (Z_m \times Z_m)/G_m\) (where \(G_m\) acts on both copies of \(Z_m\) simultaneously). The set \(\Omega _m\) is also known as the set of orbitals of \(G_m\) acting on \(Z_m\), and \(|\Omega _m|\) as the rank of the action of \(G_m\), see, e.g., [4]. The number of variables can be reduced further since X is symmetric, so the value of X on the orbit of \((\sigma ,\tau )\) is the same as the value of X on the orbit of \((\tau ,\sigma )\). We write \(\Omega _m'\) to be the collection of these ‘symmetric’ \(G_m\)-orbits on \(Z_m \times Z_m\), in which orbits of \((\sigma ,\tau )\) and \((\tau ,\sigma )\) are identified. This gives a significant reduction in the number of variables which was already used in [5].

It is also possible to reduce the size of the matrix X in the semidefinite programming formulation. In [6], the regular \(*\)-representation was used, which reduced checking whether a \(G_m\)-invariant matrix X is positive semidefinite into checking whether a matrix of order \(|\Omega _m| \times |\Omega _m|\) is positive semidefinite. In this paper, we will reduce the matrix X further, by developing a full block-diagonalization. For any finite group G acting on a vector space V, we write \(V^G\) for the subspace of V of G-invariant elements. The block-diagonalization is a bijective linear map

$$\begin{aligned} \Phi \, : \, \left( {\mathbb {C}}^{Z_m \times Z_m}\right) ^{G_m} \rightarrow \bigoplus _{i=1}^k {\mathbb {C}}^{m_i \times m_i}, \end{aligned}$$
(5)

for some integer k and integers \(m_i\) for \(i \in [k]\), such that \(X \in \left( {\mathbb {C}}^{Z_m \times Z_m}\right) ^{G_m} \) is positive semidefinite if and only if \(\Phi (X)\) is positive semidefinite. It has the property that \(\sum _{i=1}^k m_i^2 = |(Z_m \times Z_m)/G_m|= |\Omega _m|\), which is considerably smaller than \(|Z_m|^2\).

3.1 Preliminaries on representation theory

We here describe the preliminaries on representation theory which we will use throughout the paper, based on a combination of the notation and definitions used in references [3, 8, 18, 22]. If G is a finite group acting on a complex vector space V of finite dimension, V is called a G-module. Any G-invariant subspace of V is called a submodule. If V and W are G-modules, a G-homomorphism is a linear map \(\psi \,: \, V \rightarrow W\) with \(g \cdot \psi (v) = \psi (g \cdot v)\) for all \(g \in G\) and \(v \in V\). The modules V and W are equivalent (or G-isomorphic) if there is a bijective G-homomorphism (called a G-isomorphism) from V to W. A G-module V is irreducible if \(V \ne 0\) and its only nonzero submodule is V. The centralizer algebra of the action of G on V, denoted by \(\text {End}_G(V)\), is the algebra of G-homomorphisms \(V \rightarrow V\).

Let again G be a finite group acting on a complex finite dimensional vector space V. Then one can decompose \( V = \bigoplus _{i=1}^k \bigoplus _{j=1}^{m_i} V_{i,j}, \) for some unique number k and numbers \(m_1,\ldots ,m_k\) (which are unique up to permutation), such that the \(V_{i,j}\) are irreducible submodules of V with the property that \(V_{i,j}\) is isomorphic to \(V_{i',j'}\) if and only if \(i=i'\).

Definition 1

(Representative set) For each \(i \le k\) and \(j \le m_i\) let \(u_{i,j} \in V_{i,j}\) be a nonzero vector, such that for each \(i \le k\) and \(j,j' \le m_i\) there exists a G-isomorphism from \(V_{i,j}\) to \(V_{i,j'}\) which maps \(u_{i,j}\) to \(u_{i,j'}\). Define, for each \(i \le k\), the tuple \(U_i:= (u_{i,1},\ldots , u_{i,m_i})\). Call any set \(\{U_1,\ldots ,U_k\}\) obtained in this way a representative set for the action of G on V.

We can view the \(U_i\) as matrices by seeing the vectors \(u_{i,j}\) (for \(j =1,\ldots ,m_i\)) as its columns, and we will do so depending on the context.

The space V has a G-invariant inner product \(\langle , \rangle \). Let \(\{U_1,\ldots ,U_k\}\) be any representative set for the action of G on V, and define the map \( \Phi :\text {End}_G(V) \rightarrow \bigoplus _{i=1}^k {\mathbb {C}}^{m_i \times m_i}\) which maps \( A \mapsto \bigoplus _{i=1}^k \left( \langle A u_{i,j'}, u_{i,j} \rangle \right) _{j,j' =1}^{m_i}\). This map is linear and bijective, and it has the property that \(A \succeq 0\) if and only if \(\Phi (A) \succeq 0\). This follows from classical representation theory. For a proof, see e.g., [21, Proposition 2.4.4]. We apply it to the following. Suppose that G is a finite group acting on a finite set Z, hence on the vector space \(V:={\mathbb {C}}^Z\). Then \(\text {End}_G(V)\) can be naturally identified with \(\left( {\mathbb {C}}^{Z \times Z}\right) ^G\), and the map \(\Phi \) becomes

$$\begin{aligned} \Phi :({\mathbb {C}}^{Z \times Z})^{G} \rightarrow \bigoplus _{i=1}^k {\mathbb {C}}^{m_i \times m_i} \,\, \text { with } \,\, A \mapsto \bigoplus _{i=1}^k U_i^* A U_i. \end{aligned}$$
(6)

It will turn out that all representative sets in this paper consist of real matrices. So we can replace \({\mathbb {C}}\) by \({\mathbb {R}}\) in the above equation: \(\Phi \) is a linear bijective map \(({\mathbb {R}}^{Z \times Z})^{G} \rightarrow \bigoplus _{i=1}^k {\mathbb {R}}^{m_i \times m_i}\) such that \(A \succeq 0\) if and only if \(\Phi (A)\succeq 0\) for all \(A \in ({\mathbb {R}}^{Z \times Z})^{G}\).

Representation theory of the symmetric group. A partition \(\lambda \) of n is a sequence of integers \(\lambda _1 \ge \cdots \ge \lambda _h >0\) with \(\lambda _1 + \cdots +\lambda _h = n\) for some \(h \in {\mathbb {N}}\) which is called the height of \(\lambda \). We write \(\lambda \vdash n\) to denote that \(\lambda \) is a partition of n. The (Young) shape of \(\lambda \vdash n\) is an array consisting of n boxes divided into h rows where for each \(1 \le i \le h\), the i-th row contains \(\lambda _i\) boxes. As an example, consider the shape corresponding to \((4,1,1) \vdash 6\):

A Young tableau of shape \(\lambda \) is a filling \(\tau \) of the boxes of the Young shape \(\lambda \) with the integers \(1,\ldots ,n\), where each number appears once. Two Young tableaux t, \(t'\) of shape \(\lambda \vdash n\) are (row) equivalent, written as \(t \sim t'\) if corresponding rows of the two tableaux contain the same elements. A tabloid of shape \(\lambda \) is an equivalence class of tableaux: \(\{t\} = \{ t ' \,: \, t ' \sim t\}\). We denote a tabloid by an array with lines between the rows, e.g.,

Any permutation \(\pi \in S_n\) acts on a tableau \(t=t_{i,j}\) by acting on its content, i.e., \(\pi t = (\pi (t_{i,j}))\). The column stabilizer \(C_{t}\) of a tableau \(\tau \) is the subgroup of \(S_n\) which leaves the columns of t invariant. The action of \(\pi \in S_n\) on a tableau t extends to a well-defined action on tabloids via \(\pi \{t\} = \{ \pi t\}\). For each \(\lambda \vdash n\) the permutation module \(M^{\lambda }\) corresponding to \(\lambda \) is defined as

$$\begin{aligned} M^{\lambda } = {\mathbb {C}}\{ \{t_1\},\ldots ,\{t_k\}\}, \end{aligned}$$

where \(\{t_1\}, \ldots , \{t_k\}\) is a complete set of \(\lambda \)-tabloids. For any tableau t, we the associated polytabloid is \( e_{t}:= \sum _{c \in C_{t}} \text {sgn}(c) c \{t\}. \) The Specht module \(S^{\lambda }\) corresponding to \(\lambda \) is the submodule of \(M^{\lambda }\) spanned by the polytabloids \(e_{t}\), where t is a tableau of shape \(\lambda \). The module \(S^{\lambda }\) is irreducible, and it is generated by any given polytabloid: \(S^{\lambda } = {\mathbb {C}}S_n\cdot e_{t}\) for any fixed \(\lambda \)-tableau t.

A generalized Young tableau of shape \(\lambda \vdash n\) is a (Young) shape filled with integers, where we allow repeated entries. Depending on the context, we often omit the word ‘generalized’. A generalized Young tableau is standard if its rows and columns are strictly increasing, and semistandard if its rows are nondecreasing and its columns are strictly increasing. We say that a generalized tableau of shape \(\lambda \vdash n\) has content \(\mu =(\mu _1,\ldots ,\mu _h) \vdash n\) if it contains \(\mu _i\) times the integer i, for all \(1 \le i \le h\). If T is any tableau of shape \(\lambda \) and content \(\mu \), the map

$$\begin{aligned} \vartheta _T: M^{\lambda }&\rightarrow M^{\mu },\\ \{t\}&\mapsto \sum _{T' \sim T}t[T'] \quad {\text {(extended linearly\ to}}\ M^\lambda ), \end{aligned}$$

where \(\{t\}\) is any tabloid in \(M^\lambda \), and where

$$\begin{aligned} t[T']:= \{\text {tableau with entry } t_{i,j} \text { in its}\; T_{i,j}' \text { -th row}\}, \end{aligned}$$

is an \(S_n\)-homomorphism. Moreover, a basis of \(\text {Hom}(S^{\lambda }, M^{\mu })\) is given by (cf. Sagan [22])

$$\begin{aligned} \{\vartheta _T \, | \, T \text { semistandard of shape}~\lambda \text { and content}~\mu \}. \end{aligned}$$

Unless specified otherwise, we from now on assume that t is the \(\lambda \)-tableau containing the integers \(1,\ldots ,n\) in this order from left to right, from top to bottom. Sometimes we write \(t_{\lambda }\) instead of t. It follows that a representative set for the action of \(S_n\) on \(M^{\mu }\) is given by

$$\begin{aligned} \{(\vartheta _T(e_{t_{\lambda }}) \,\,| \,\, T \text { semistandard of shape}~\lambda \text { and content}~\mu ) \,\, | \,\, \lambda \vdash n\}. \end{aligned}$$
(7)

Induced representations. Let G be a finite group, and H a subgroup of G. Let \(R=\{r_1,\ldots ,r_t\}\) be a full set of representatives for the left cosets of H in G, so \(|R|=[G\,: \, H]\). If V is an H-module, the induced module \(\mathop {\textrm{Ind}}\limits _H^G(V)\) is defined as follows. The elements of \(\mathop {\textrm{Ind}}\limits _H^G(V)\) are (formal) sums of the form

$$\begin{aligned} \lambda _1 (r_1,v_1) + \cdots + \lambda _t(r_t,v_t)\quad \text {for } v_1,\ldots ,v_t \in V, \, \lambda _1,\ldots ,\lambda _t \in {\mathbb {C}}. \end{aligned}$$

(So as vector space \(\mathop {\textrm{Ind}}\limits _H^G(V) = \oplus _{r \in R} V\).) The action of an element \(g \in G\) on \((r_i,v)\) is defined via \(g \cdot (r_i,v) = (r_j, h\cdot v)\), where \(r_j \in R\) and \(h \in H\) are uniquely determined by the equation \(g r_i = r_j h\).

3.2 The block-diagonalization for computing \(\alpha _k\)

We aim to decompose the space \({\mathbb {C}}^{Z_m}\) as a \({G_m}\)-module. The derivation will consist of three steps.

  1. 1.

    Derive a representative set of matrices for the action of \(S_m\) on \(M^{(1^m)}\) from the elementary representation theory of the symmetric group.

  2. 2.

    There is a natural surjective G-homomorphism \(f: M^{(1^m)} \rightarrow {\mathbb {C}}^{Z_m}\). For each matrix in the representative set for the action of \(S_m\) on \(M^{(1^m)}\), construct a new matrix consisting of a minimal linearly independent set of columns of the original matrix after applying the map f. The new matrices together form a representative set for the action of \(S_m\) on \({\mathbb {C}}^{Z_m}\), as we will show. In general: suppose G is a finite group acting on finite dimensional vector spaces V and W, and \(f:V\rightarrow W\) is a surjective G-homomorphism. We show how to derive a representative set for the action of G on W from a representative set for the action of G on V.

  3. 3.

    Use the additional \(S_2 \cong \{\pm 1\}\)-action to finally obtain a representative set for the action of \(S_m \times S_2\) on \({\mathbb {C}}^{Z_m}\). In general: suppose that H is a finite group acting on a complex finite dimensional vector space V, and that also \(S_2\) acts on V. We show how to derive a representative set for the action of \(H \times S_2\) on V from a representative set for the action of H on V, provided that the H- and \(S_2\)-actions on V commute.

So we first consider the action of the subgroup \({S_m \cong S_m\times \{+1\} < S_m \times \{\pm 1\}}\) acting on \(Z_m\) by conjugation, and give an algorithm to determine a representative set for this action. Afterwards, we consider the additional \(S_2 \cong \{\pm 1\}\)-action to reduce the representative set further.

3.2.1 The \(S_m\)-action on \(Z_m\)

The starting point to find a representative set for the action of \(S_m\) on \({\mathbb {C}}^{Z_m}\) is a representative set for the action of \(S_m\) on \(M^{(1^m)}\) given in (7). We consider the natural projection

$$\begin{aligned} f: M^{(1^m)} \rightarrow {\mathbb {C}}^{Z_m}, \end{aligned}$$
(8)

mapping a tabloid which is filled row-wise with \(i_1\) up to \(i_m\) to the indicator vector in \({\mathbb {C}}^{Z_m}\) corresponding to \((i_1 i_2 \ldots i_m)\).

The map f is linear and surjective, and it respects the \(S_m\)-action, as for each \(\pi \in S_m\) we have

We now use the following fact (which follows from elementary representation theory, see, e.g., [14, 25]) to derive a representative set for the action of \(S_m\) on \({\mathbb {C}}^{Z_m}\).

Proposition 1

Suppose that a finite group G acts on two finite-dimensional complex vector spaces V and W, and suppose that \(f: V \rightarrow W\) is a surjective G-homomorphism. Let \(\{U_1,\ldots ,U_k\}\) be a representative set for the action of G on V, with \(U_i = (u_{i,j} \,|\, j=1,\ldots ,m_i)\). Then the set \(\{U_1', \ldots ,U_k'\}\) is representative for the action of G on W, where \(U_i'\) (for \(i \in [k]\)) is a tuple consisting of a minimal spanning set among the \(f(u_{i,j})\), with \(j=1,\ldots ,m_i\).

Proof

For each \(i \in [k]\), let \(s_i \in {\mathbb {N}}\) and \(\ell _{1}^{(i)},\ldots ,\ell _{s_i}^{(i)} \in [m_i]\) be such that

$$\begin{aligned} U_i'=(f(u_{i,\ell _{1}^{(i)}}), \ldots , f(u_{i,\ell _{s_i}^{(i)}})) \end{aligned}$$

is the chosen tuple consisting of a minimal spanning set among the \(f(u_{i,j})\) for \(j=1,\ldots ,m_i\). Define

$$\begin{aligned} V':= \bigoplus _{i=1}^k \bigoplus _{j=1}^{s_i} {\mathbb {C}}G u_{i,\ell _{j}^{(i)}} \subseteq V, \end{aligned}$$

i.e., \(V'\) is the restriction of the direct sum decomposition of V to the components corresponding to the chosen minimal spanning sets.

The restriction \(f'\,: \, V' \rightarrow W\) of f to \(V'\) is a bijection. Surjectivity of \(f'\) is clear, as the image of f is W and is spanned by the \(f(u_{i,\ell _{j}^{(i)}})\) \((i \in [k], j \in [s_i])\). If \(f'\) is not injective, then \(\text {Ker}(f')\) contains an irreducible submodule M of \(V'\). Schur’s lemma implies that the projection of M onto the components \(\oplus _{j=1}^{s_i} {\mathbb {C}}G u_{i,\ell _{j}^{(i)}}\) is zero for all but one \(i \in [k]\). Any nonzero element of M now gives rise to a nontrivial linear combination of the \(u_{i,\ell _{j}^{(i)}}\) that is in the kernel of f (for the i for which the projection of M onto \(\oplus _{j=1}^{s_i} {\mathbb {C}}G u_{i,\ell _{j}^{(i)}}\) is nonzero) contradicting the fact that the \(f(u_{i,\ell _{j}^{(i)}})\) (\(j =1,\ldots ,s_i\)) are linearly independent. So \(f'\) is indeed a bijection.

Since by definition the set \(\{(u_{i,\ell _{j}^{(i)}}\, | \, j=1,\ldots , s_i)\,| \, i=1,\ldots , k \} \) is representative for the action of G on \(V'\), the set

$$\begin{aligned} \{U_1',\ldots ,U_m'\} = \left\{ \left( f'\left( u_{i,\ell _{j}^{(i)}}\right) \, \big | \, j=1,\ldots , s_i\right) \,\big | \, i=1,\ldots , k \right\} \end{aligned}$$

is representative for the action of G on W, as was needed to prove. \(\square \)

Recall that a representative set for the action of \(S_m\) on \(M^{(1^m)}\) is given by

$$\begin{aligned} \{\vartheta _T(e_t) \, | \, T \text { semistandard of shape}~\lambda \text { and content}\,(1^m)\}. \end{aligned}$$

Note that any semistandard tableaux of shape \(\lambda \vdash m\) and content \((1^m)\) is standard. Consider for each \(\lambda \vdash n\) a tuple \(U_{\lambda }\) consisting of a minimal spanning set among the vectors

$$\begin{aligned} \{f (\vartheta _T(e_t)) \, | \, T \text { standard of shape}~ \lambda \text { and content}~(1^m)\} \subseteq {\mathbb {C}}^{Z_m}. \end{aligned}$$
(9)

Corollary 3

The set \(\{U_{\lambda } \, | \, \lambda \vdash n\}\) is representative for the action of \(S_m\) on \({\mathbb {C}}^{Z_m}\).

Proof

Apply Proposition 1 with \(V=M^{(1^m)}\), \(W={\mathbb {C}}^{Z_m}\), and f from (8). \(\square \)

We note that it is useful to maintain for each \(\lambda \) a list of the Young tableaux which give rise to the minimal spanning set among the vectors in (9). They can help to compute the coefficients in the block-diagonalizations more efficiently (but still exponential in m), see Sect. 5.2.

Remark 1

Proposition 1 has a wide potential for application. For instance, for computing bounds on the cardinality of error-correcting codes, a block-diagonalization for matrices indexed by ordered k-tuples of codewords can be obtained using existing tools [11, 21]. With Proposition 1, one may further reduce this into a block-diagonalization for matrices indexed by unordered sets of codewords of size \(\le k\).

Discussion about finding the minimal spanning set faster. It is also natural to identify \({\mathbb {C}}^{Z_m}\) with \(M^{(1^m)}/({\mathbb {Z}}/m{\mathbb {Z}})\), where \({\mathbb {Z}}/m{\mathbb {Z}}\) permutes the rows of a tabloid in \(M^{(1^m)}\) cyclically. Brosch [3] developed a fast method in the context of flag algebras to decompose any module \(M^{\mu }/F\), where F is a group acting on the rows of \(\mu \) via permutations. However, the computational results presented in this paper can be obtained without this speed-up: we can compute the representative set for \(\alpha _{k}\) for \(k \le 10\) using the method from Proposition 1, and the representative set for our new relaxation \(\beta _k\) is described explicitly in Sect. 4.

The method of Brosch [3] allows to avoid working with the vectors \(\vartheta _T(e_t)\) explicitly, which is desirable given the high dimension of \(M^{(1^m)}\). The key observation is

$$\begin{aligned} \textrm{Hom}(S^\lambda ,M^{(1^m)}/({\mathbb {Z}}/m{\mathbb {Z}})) = \mathcal {R}_{{\mathbb {Z}}/m{\mathbb {Z}}}(\textrm{Hom}(S^\lambda ,M^{(1^m)})), \end{aligned}$$

by identifying the quotient \(M^{(1^m)}/({\mathbb {Z}}/m{\mathbb {Z}})\) with the elements v in \(M^{(1^m)}\) with \(\sigma (v)=v\) for all \(\sigma \in {\mathbb {Z}}/m{\mathbb {Z}}\). Here \(\mathcal {R}_{{\mathbb {Z}}/m{\mathbb {Z}}}\) denotes the Reynolds operator of \({\mathbb {Z}}/m{\mathbb {Z}}\) on \(\textrm{Hom}(S^\lambda ,M^{(1^m)})\), which averages over the group

$$\begin{aligned} \mathcal {R}_{{\mathbb {Z}}/m{\mathbb {Z}}} (\vartheta _T) :=\frac{1}{m}\sum _{\sigma \in {\mathbb {Z}}/m{\mathbb {Z}}}\sigma (\vartheta _T). \end{aligned}$$

The action of \({\mathbb {Z}}/m{\mathbb {Z}}\) on homomorphisms \(\vartheta _T\) is given by \(\sigma (\vartheta _T) = \vartheta _{\sigma (T)},\) where \(\sigma \) is applied to T entrywise. The method of [3] results in a matrix representation of \(\mathcal {R}_{{\mathbb {Z}}/m{\mathbb {Z}}}\) in the semistandard basis, so that one can choose the homomorphisms corresponding to a spanning set of rows to find a basis of \(\textrm{Hom}(S^\lambda ,M^{(1^m)}/({\mathbb {Z}}/m{\mathbb {Z}}))\). The advantage is that one works in a space of dimension \(\textrm{dim}(\textrm{Hom}(S^\lambda , M^{(1^m)}))\) instead of \(\textrm{dim}(M^{(1^m)}) = m!\).

As mentioned before, knowing the description of the columns \(\vartheta _T(e_t)\) of the representative set in terms of tableaux is useful for the computations, see Sect. 5.2.

The multiplicities of the irreducible representations. It can be shown that the module \({\mathbb {C}}^{Z_m}\) is \(S_m\)-isomorphic to a module which has been described in the literature. This allows us to immediately obtain the multiplicities of the irreducible representations of \({\mathbb {C}}^{Z_m}\) as an \(S_m\)-module.

Proposition 2

As \(S_m\)-modules, we have \({\mathbb {C}}^{Z_m} \cong \mathop {\textrm{Ind}}\limits _{{\mathbb {Z}}/m {\mathbb {Z}}}^{S_m} 1\).

Proof

Define the map \(\phi :{\mathbb {C}}^{Z_m} \rightarrow \mathop {\textrm{Ind}}\limits _{{\mathbb {Z}}/m {\mathbb {Z}}}^{S_m} 1\) by mapping the standard basis vector \(e_\sigma \) corresponding to \(\sigma =(\sigma _1 \,\sigma _2 \ldots \sigma _m) \in Z_m\) with \(\sigma _1=1\) to the basis element (r, 1) in \(\mathop {\textrm{Ind}}\limits _{{\mathbb {Z}}/m {\mathbb {Z}}}^{S_m} 1\), where r is the permutation which maps i to \(\sigma _i\) for each \(i \in [m]\). Then

$$\begin{aligned} \phi (\pi \cdot e_{\sigma }) = \phi (e_{\pi \sigma \pi ^{-1}}) = \phi (e_{(\pi \sigma _1 \,\pi \sigma _2 \,\ldots \,\pi \sigma _m)}) = (\overline{\pi r},1) = \pi \cdot \phi (e_{\sigma }), \end{aligned}$$
(10)

for each \(\pi \in S_m\), where \(\overline{\pi r}\) is the representative of the class of the permutation \(\pi r\) with \(\overline{\pi r}(1) = 1\). So \(\phi \) respects the \(S_m\)-action. As \(\phi \) is also a bijection between the bases of \({\mathbb {C}}^{Z_m}\) and \(\mathop {\textrm{Ind}}\limits _{{\mathbb {Z}}/m {\mathbb {Z}}}^{S_m} 1\), its linear extension is an \(S_m\)-isomorphism. \(\square \)

It is known [16] that

$$\begin{aligned} \text {Ind}_{{\mathbb {Z}}/m {\mathbb {Z}}}^{S_m} 1 \cong \bigoplus _{\lambda \vdash m} a_{\lambda } S^{\lambda }, \end{aligned}$$
(11)

where \(a_{\lambda }\) is the number of standard tableaux T of shape \(\lambda \) with \(c(T)=0 \pmod {m}\), where 

$$\begin{aligned}&c(T) \text { is }\text {the sum of all}~ a \text { in}~T \text { for which}~a+1 \text { appears in a row} \nonumber \\&\text {strictly below}~ a\text { 's row}. \end{aligned}$$
(12)

So it is not hard to determine the multiplicities of the irreducible representations of \({\mathbb {C}}^{Z_m}\) as \(S_m\)-module. We however need the decomposition explicitly, to obtain an explicit representative set.

3.2.2 The \(S_2\cong \{\pm 1\}\)-action on \(Z_m\)

The \(S_m\)-action and the \(S_2 \cong \{ \pm 1 \}\)-action on \({\mathbb {C}}^{Z_m}\) commute. This enables us to compute a representative set for the action of \(S_m \times S_2\) on \({\mathbb {C}}^{Z_m}\), starting with a given representative set for the action of \(S_m\) on \(Z_m\). We first state the setting in a general form, and then prove a proposition which allows us to derive the full symmetry reduction.

3.2.3 Representative set of \(H \times S_2\)-action

Let H be a finite group acting on a finite-dimensional complex vector space V and suppose a representative set \(\{U_1,\ldots ,U_k\}\) where \(U_i=(u_{i,1},\ldots ,u_{i,m_i})\) (for \(i \le k\)) for the action of H on V is given. Suppose that also \(S_2=\{1,\eta \}\) acts on V, and that the actions of H and \(S_2\) on V commute. Let \(L_{\pm }:= \{ x \, | x= \pm \eta x\}\), so that \(L_+\) and \(L_-\) are the eigenspaces of \(\eta \). We show to obtain a representative set for the action of \(H \times S_2\) on V, generalizing [12, Section 3.4] (which considers \(S_2\)-actions on a finite set Z).

Proposition 3

A representative set for the action of \(H\times S_2\) on V is the set \(\{U_1^+, U_1^-, \ldots , U_k^+, U_k^-\}\), where \(U_i^+\) is a tuple consisting of a linearly independent subset among the vectors \(u_{i,j}^+:= u_{i,j} + \eta \cdot u_{i,j}\) (for \(j=1,\ldots ,m_i\)), and \(U_i^-\) is a tuple consisting of a linearly independent subset among the vectors \(u_{i,j}^-:= u_{i,j} - \eta \cdot u_{i,j}\) (for \(j=1,\ldots ,m_i\)).

Proof

Since the actions of H and \(S_2\) on V commute, both \(L_+\) and \(L_-\) are \(H \times S_2\)-invariant subspaces of V. The maps \(f^+: V \rightarrow L_+\) and \(f^-: V \rightarrow L_-\) given by \(f^+(v)=(I+\eta )v\) and \(f^-(v)=(I-\eta )v\) are surjective \(H \times S_2\)-homomorphisms. From Proposition 1 it now follows that \(\{U_1^+,\ldots ,U_k^+\}\) and \(\{U_1^-,\ldots ,U_k^-\}\) are representative sets for the actions of \(H\times S_2\) on \(L_+\) and \(L_-\), respectively.

Note that \(V=L_+ \oplus L_-\). Also, if \(W_1 \subseteq L_+\) and \(W_2 \subseteq L_-\), are irreducible \(H\times S_2\)-modules, then they are non-isomorphic: indeed, if \(\psi :W_1\rightarrow W_2\) were an \(H\times S_2\)-isomorphism, then for each \(x\in W_1\) we have \(\psi (x)=\psi (\eta x)=\eta \psi (x)\), as \(x \in L_+\), but also \(\psi (x)=-\eta \psi (x)\), as \(\psi (x)\in L_-\), so \(\psi (x)=0\). So the union \(\{U_1^+,\ldots ,U_k^+\} \cup \{U_1^-,\ldots ,U_k^-\}\) of representative sets for the actions of \(H \times S_2\) on \(L_+\) and \(L_-\) is a representative set for the action of \(H\times S_2\) on V. \(\square \)

For our semidefinite program this means that, in the block-diagonalization for the action of \(S_m\) on \({\mathbb {C}}^{Z_m}\), the block corresponding to the matrix \(U_\lambda \) will split into two blocks in the block-diagonalization for the action of \(S_m \times S_2\) on \({\mathbb {C}}^{Z_m}\): one corresponding to \(U_{\lambda }^+\) and one corresponding to \( U_{\lambda }^-\).

4 The relaxation \(\beta _m\)

When computing \(\alpha _m\), we use the symmetry reduction from the previous section and require that all blocks in the block-diagonalization of X are positive semidefinite. As \(\alpha _m\) is a minimization problem, only requiring one block to be positive semidefinite will yield a lower bound on \(\alpha _m\). From our computer experiments it follows that one small block seems ‘special’: only requiring this block to be positive semidefinite yields a remarkably good lower bound on \(\alpha _m\). It is the block corresponding to \(U_{\lambda }^-\), where \(\lambda =(m-2,1,1) \vdash m\). This observation gives rise to a new relaxation \(\beta _m\) of \(\alpha _m\), in which we only require the mentioned block to be positive semidefinite. The primal of the program \(\beta _m\) is

$$\begin{aligned} \beta _m = \textrm{min}\left\{ \langle Q, X\rangle \, | \, X \in {\mathbb {R}}^{Z_m \times Z_m}_{\ge 0}, \, \langle J,X \rangle =1, \, (U_{\lambda }^{-})^\textsf{T}X U_{\lambda }^{-} \succeq 0 \right\} , \end{aligned}$$
(13)

where \(\lambda =(m-2,1,1)\). It turns out that we can explicitly describe the columns of the matrix \(U_{\lambda }^{-}\) using Young tableaux. We first describe the matrix \(U_{\lambda }\). Define the tableau

Proposition 4

The matrix \(U_{\lambda }\) can be taken to consist of the columns \(f(\vartheta _{M_i}(e_t))\) for \(i=3,\ldots , \left\lfloor \tfrac{m+1}{2}\right\rfloor +1\).

Proof

First, we calculate \(a_{\lambda }\) from (11) for the partition \(\lambda =(m-2,1,1) \vdash m\). Recall that \(a_{\lambda }\) is the number of semistandard tableaux T with \(c(T)= 0 \pmod {m}\). Suppose that a standard tableau T has a and b as entry in its second and third row, so \(1<a<b\le m\). Moreover c(T) is zero modulo m if and only if \((a-1)+(b-1)=0 \pmod {m}\). There are exactly \(\left\lfloor \tfrac{m-1}{2}\right\rfloor \) pairs ab satisfying \(1<a<b\le m \) and \(a+b=m+2\), so \(a_{\lambda } = \left\lfloor \tfrac{m-1}{2}\right\rfloor \). So the number of columns of the matrix \(U_{\lambda }\) is \( \left\lfloor \tfrac{m-1}{2}\right\rfloor \), which is exactly the number of vectors \(f(\vartheta _{M_i}(e_t))\) given in this proposition.

We now show that if T is any standard tableau of shape \(\lambda \), then \( f(\vartheta _{T}(e_t)) = f(\vartheta _{M_i}(e_t))\) for some \(i= 3,\ldots , \left\lfloor \tfrac{m+1}{2}\right\rfloor +1\). It then follows that the given set of columns is a spanning set for the column space of \(U_{\lambda }\), and by the previous paragraph it has the correct size, so it is minimal and we are done. Note that if

are standard of shape \(({m-2},1,1)\) and content \((1^m)\), with \(b_1-a_1 = b_2-a_2\), then \(f(\vartheta _{{T_1}}(e_t)) = f(\vartheta _{T_2}(e_t))\). To see this, note that

(14)

where each sum is over all tabloids of shape and content \((1^m)\) with the given fixed entries in rows \(a_1\) and \(b_1\). Thus, each sum is over \((m-2)!\) tabloids. The vector \(\vartheta _{T_2}(e_t)\) is obtained from (14) upon replacing \(a_1\) and \(b_1\) by \(a_2\) and \(b_2\), respectively. As \(b_1-a_1=b_2-a_2\), each term in the sum expansions of \(\vartheta _{{T_1}}(e_t)\) and \(\vartheta _{{T_2}}(e_t)\) represent, after projection, the same element of \(Z_m\). So \(f(\vartheta _{{T_1}}(e_t)) = f(\vartheta _{T_2}(e_t))\). So the vector \(f(\vartheta _{T_1}(e_t))\) is the same as one of the \(f(\vartheta _{M_i}(e_t))\) with \( 3 \le i \le m\), namely the one with \(i-2=b-a\).

The proof is completed by observing that \(f(\vartheta _{M_{m-i}}(e_t))= f(\vartheta _{M_{i+4}}(e_t))\) for all \(i=0,\ldots ,m-4\), as the projection of any \(\vartheta _{M_j}(e_t)\) onto \(Z_m\) only depends on the distance between j and 2 mod m. The distinct nonzero distances mod m between i and 2 are \(1,\ldots ,\left\lfloor \tfrac{m-1}{2}\right\rfloor \), which corresponds to \(i= 3,\ldots , \left\lfloor \tfrac{m+1}{2}\right\rfloor +1\). So if T is any standard tableau of shape \(\lambda \), then \( f(\vartheta _{T}(e_t)) = f(\vartheta _{M_i}(e_t))\) for some \(i= 3,\ldots , \left\lfloor \tfrac{m+1}{2}\right\rfloor +1\). \(\square \)

It is not hard to verify using (14) that \(\eta \cdot f(\vartheta _{T_i}(e_t)) = -f(\vartheta _{T_i}(e_t))\), where \(\eta \) is the inversion action on \(Z_m\). From this it follows that the columns of \(U_{\lambda }^{-}\) can be taken to be the same columns as the columns of \(U_{\lambda }\), and that the matrix \(U_{\lambda }^{+}\) is the zero matrix. In Sect. 5 we will therefore only work with the matrix \(U_{\lambda }\) and not with the matrix \(U_{\lambda }^-\).

5 Computation

In this section, we comment on the computation. First we explain how we compute the entries of Q, taking into account its symmetries. After that, we describe how to compute the entries in the block-diagonalizations more efficiently. Then we give the dual semidefinite program of \(\beta _m\), which has nice features: a small matrix block which is required to be positive semidefinite, and few variables. However, it has \(|\Omega _m'|\) linear constraints, which is a very large number.Footnote 1 In the final section we explain how we computed \(\beta _m\) using this dual description in practice.

5.1 Computing the matrix Q with Dijkstra’s algorithm

To compute the entries of the matrix Q, we follow Woodall [27]. Construct a graph \(\Gamma _m\) with vertex set \(Z_m\), and \(\{\sigma ,\gamma \}\) is an edge if \(\gamma \) can be obtained from \(\sigma \) by one transposition of adjacent elements of \(\sigma \). Then the entry \(Q_{\sigma ,\tau }\) is equal to the length of a shortest path from \(\sigma \) to \(\tau ^{-1}\) in \(\Gamma _m\), which can be computed with Dijkstra’s shortest path algorithm. We only apply Dijkstra with the source node \(\sigma =(12\ldots m)\), as we only want the value of \(Q_{\sigma ,\tau }\) on \(G_m\)-orbits of \(Z_m \times Z_m\).

A speed-up inside Dijkstra algorithm which takes into account symmetry is based on the observation that \(\sigma = (12\ldots m)\) is fixed by the elements \((\sigma ,1)\) and \((\rho ,-1)\) of \(G_m\), where \(\rho \) is such that \(\rho \sigma ^{-1} \rho ^{-1} = \sigma \). So the subgroup \(H_m\) of \(G_m\) generated by these two elements fixes \(\sigma \), and hence has the property that \(Q_{\sigma , h \cdot \tau } = Q_{h \cdot \sigma , h \cdot \tau } = Q_{\sigma , \tau }\) for any \(h \in H_m\) and \(\tau \in Z_m\). We represent each \(H_m\)-orbit of \(Z_m\) by its lexicographically smallest element. We maintain a priority queue S of elements with their distances, and a set L of visited orbit representatives of \(Z_m\) under \(H_m\), and a distance \(d:=0\). The priority queue S initially consists of \((12\ldots m)\) with distance 0, and L consists of \(\sigma =(12\ldots m)\).

As long as there are orbits in S, we pop the element \(\tau \) from S with the smallest distance, increase d by 1, and check all cycles in \( Z_m\) reachable from \(\tau \) with one swap of adjacent elements in \(\tau \). These cycles are replaced with the unique representatives of their orbits, and the new orbit representatives are added to L, as well as to the queue S with distance d. This is repeated until S is empty.

5.2 Computing the inner products

Let \(\lambda \vdash m\) and \(u_{T_1}=f(\vartheta _{T_1}(e_{t_{\lambda }}))\), \(u_{T_2}=f(\vartheta _{T_2}(e_{t_{\lambda }}))\) be columns of \(U_{\lambda }\). Let \(X \in ({\mathbb {C}}^{Z_m \times Z_m})^{G_m}\). The inner products are of the form

$$\begin{aligned} ((1 + \eta ) \cdot u_{T_1})^\textsf{T}X ((1 + \eta )\cdot u_{T_2}) \,\,\,\,\, \text { or }\,\,\,\,\, ((1 - \eta )\cdot u_{T_1})^\textsf{T}X ((1 - \eta )\cdot u_{T_2}). \end{aligned}$$

By symmetry one has \((\eta \cdot u_{T_1})^\textsf{T}X (\eta \cdot u_{T_2}) = u_{T_1}^\textsf{T}X u_{T_2}\) and \((\eta \cdot u_{T_1})^\textsf{T}X u_{T_2} = u_{T_1}^\textsf{T}X (\eta \cdot u_{T_2})\). So to compute the inner products, we must compute expressions of the form \( u_{T_1}^\textsf{T}X u_{T_2}\) and \((\eta \cdot u_{T_1})^\textsf{T}X u_{T_2}\). Note that

$$\begin{aligned} u_{T_1}^\textsf{T}X u_{T_2} =\sum _{\begin{array}{c} T_1' \sim T_1, T_2' \sim T_2 \end{array}} \sum _{c,c' \in C_{t}} \text {sgn}(cc') x_{\omega (f(t[c T_1']), f(t[c' T_2']))}, \end{aligned}$$
(15)

where f from (8) maps a tabloid to the corresponding m-cycle in \(Z_m\), and \(\omega (\sigma ,\tau ) \in \Omega _m'\) denotes the orbit of \((\sigma ,\tau )\in Z_m \times Z_m\). If we have (15), then one can also obtain \((\eta \cdot u_{T_1})^\textsf{T}X u_{T_2}\) from it by replacing each variable \(x_{\omega (f(t[c T_1']), f(t[c' T_2']))}\) by \(x_{\omega (\eta \cdot f(t[c T_1']), f(t[c' T_2']))}\). So we now focus on computing (15). One can compute the inner products by using (15) (and we succeeded to compute \(\alpha _{10}\) in that way). We now describe a method which is faster in practice and which we used in our implementation. Since \(|\Omega _m'|\) is exponential in m, one cannot hope for a running time polynomial in m. Let \(Y(\lambda )\) be the set of (row,column)-coordinates indicating the boxes of \(\lambda \). Define the polynomial

$$\begin{aligned} p_{T_1,T_2}(Z):=\sum _{\begin{array}{c} T_1' \sim T_1, T_2' \sim T_2 \end{array}} \sum _{c,c' \in C_{t}} \text {sign}(cc')\prod _{y \in Y(\lambda )} z_{c T_1'(y),c'T_2'(y)}, \end{aligned}$$
(16)

for \(Z=(z_{j,h})_{j,h=1}^m \in {\mathbb {R}}^{m \times m}\). One can express \(p_{T_1,T_2}\) as a linear combination of monomials with the algorithms of [11] or [18]. This allows to compute the inner product fast in many instances for error-correcting codes (see e.g., [11, 21]). The method was generalized to be applicable to arbitrary permutation modules in the setting of flag algebras (cf. [3]).

There is a one-to-one correspondence between \(S_m\)-orbits of pairs of tabloids \((t[cT_1], t[c' T_2])\) and monomials \( \prod _{y \in Y(\lambda )} z_{cT_1'(y),c'T_2'(y)}\) via their overlap, i.e., the numbers of elements of each row of the first tabloid which appear in each row of the second. The overlap of two tabloids \(\{t_1\}\) and \(\{t_2\}\) can be described by a monomial \( \prod _{i,j=1}^m z_{i,j}^{ (| \{t_1\}_i \cap \{t_2\}_j|)}, \) where m is the number of parts of \(\lambda \) and \(\{t\}_i\) denotes the set of elements in the i-th row of a tabloid \(\{t\}\). So to compute (15), we can compute (16), and then replace each monomial of degree m in the variables \(z_{i,j}\) by the variable \(x_{\omega (t[cT_1], t[c' T_2])}\), where \((t[cT_1], t[c' T_2])\) is any element in the \(S_m\)-orbit of pairs of tabloids corresponding to the monomial in \(z_{i,j}\).

Computing (16). We here state the method from [11], which is easy to implement and uses only methods for addition, multiplication, and differentiation of polynomials. Given two generalized Young tableaux \(T_1,T_2\), define

$$\begin{aligned}&r(s,j) := \hbox {number of}\,\, s \hbox {'s in row} \,\,j \,\,\hbox {of}\,\, T_1, \quad{} & {} u(s,j) := \hbox {number of}\,\, s \hbox {'s in row}\,\, j\,\, \hbox {of} \,\,T_2,\\&d_{s \rightarrow j}:= \sum _{i=1}^m x_{s,i} \frac{\partial }{\partial x_{j,i}}, \,\, \text { and }{} & {} d_{j \rightarrow s}^*:= \sum _{i=1}^m x_{i,s} \frac{\partial }{\partial x_{i,j}}. \end{aligned}$$

Also, define the polynomial \( P_{\lambda }(Z):= \prod _{k=1}^m \left( k! \, \text {det} \left( (z_{i,j})_{i,j=1}^{k} \right) \right) ^{\lambda _{k}-\lambda _{k+1}}\) in variables \(z_{i,j}\), where \(i,j \in [m]\) and \(\lambda _{m+1}:=0\). Then it holds [11, Theorem 7] that

$$\begin{aligned} p_{T_1, T_2 }(X) = \left( \prod _{j=1}^{m-1} \prod _{s=j+1}^m \frac{1}{r(s,j)!\,u(s,j)!} (d_{s \rightarrow j})^{r(s,j)} (d_{j \rightarrow s}^*)^{u(s,j)}\right) \cdot P_{\lambda }(Z). \end{aligned}$$

5.3 The dual semidefinite program

First, note that the dual of the original semidefinite program \(\alpha _m\) is

$$\begin{aligned} \alpha _m= \textrm{max}\left\{ t \, | \, Q - tJ - Y \succeq 0, Y\in {\mathbb {R}}^{Z_m \times Z_m}_{\ge 0} \right\} .\end{aligned}$$
(17)

To show that this is indeed an equality, one needs to show that strong duality holds. This is indeed the case, as the primal (2) is strictly feasible (set \(X=aJ + bI\), where \(a=\tfrac{1}{2((m-1)!)^2}\) and \(b=\tfrac{1}{2(m-1)!}\)), while the dual is feasible with \(t=0\) and \(Y=Q-\Delta (Q)\), where \(\Delta (Q)\) is a matrix which is zero outside the diagonal and which has the same diagonal entries as Q.

We now describe the dual of \(\beta _m\). The primal of the program \(\beta _m\) is

$$\begin{aligned} \beta _m = \textrm{min}\left\{ \langle Q, X\rangle \, | \, X \in {\mathbb {R}}^{Z_m \times Z_m}_{\ge 0}, \, \langle J,X \rangle =1, \, U_{\lambda }^\textsf{T}X U_{\lambda } \succeq 0 \right\} , \end{aligned}$$
(18)

where \(\lambda =(m-2,1,1)\). For each \(\omega \in \Omega _m'\), let \(K_{\omega }\) be the indicator matrix of \(\omega \), i.e., the \((Z_m \times Z_m)\)-matrix with \((K_{\omega })_{\sigma ,\tau }=1\) if \((\sigma ,\tau ) \in \omega \) and \((K_{\omega })_{\sigma ,\tau }=0\) otherwise. As X is \(G_m\)-invariant, we may write \(X= \sum _{\omega \in \Omega _m'} K_{\omega } x_{\omega }\). We define for each \(\omega \in \Omega _m'\) the constant matrix \(A_{\omega }:= U_{\lambda }^\textsf{T}K_{\omega } U_{\lambda }\). Let \(q_{\omega }\) denote the common value of \(Q_{(\sigma ,\tau )}\) for \((\sigma ,\tau ) \in \omega \). So we may rewrite (18) as

$$\begin{aligned} \beta _m&= \textrm{min}\left\{ \sum _{\omega \in \Omega _m'} |\omega | x_{\omega }q_{\omega } \, : \, x_{\omega } \ge 0 \, \forall \omega \in \Omega _m', \,\sum _{\omega \in \Omega _m'} |\omega | x_{\omega }= 1, \, \sum _{\omega \in \Omega _m'} x_{\omega } A_{\omega } \succeq 0 \right\} . \end{aligned}$$

The dual of this semidefinite program is (again strong duality holds)

$$\begin{aligned} \beta _m = \textrm{max}\left\{ t \, : \, Y \in {\mathbb {R}}^{\left\lfloor \tfrac{m-1}{2}\right\rfloor \times \left\lfloor \tfrac{m-1}{2}\right\rfloor }, \, Y \succeq 0, \, \forall \omega \in \Omega _m'\, : \, \langle Y, A_{\omega }\rangle + |\omega | t \le |\omega |q_{\omega } \right\} . \end{aligned}$$
(19)

This dual has few variables and only a very small matrix block which is required to be positive semidefinite. The main difficulty is that there are many linear constraints, as can be seen in Table 3.

Table 3 The number of variables in our SDP is \(|\Omega _m'| = \sum m_i(m_i+1)/2\), and for the block sizes \(m_i\) for computing \(\alpha _m\) we have \(\sum m_i^2 = |\Omega _m|=|(Z_m \times Z_m)/G_m|\). The block sizes are given in the format \((\text {block size})^{\text {multiplicity}}\)

Remark 2

We observed some structure in the optimal solutions Y of the dual (19) of \(\beta _m\) computationally. Up to \(m=13\), the rank of the optimal Y is one if m is odd, and 2 if m is even (and \(m>4\)). Furthermore, the eigenvector of the cases where m is odd behaves similarly for each m, as can be seen in Fig. 3. This gives us some hope that the optimal solutions can be constructed analytically, potentially leading to improved bounds for bigger m in the future.

Fig. 3
figure 3

The vectors \(v_m \in {\mathbb {R}}^{\left\lfloor \tfrac{m-1}{2}\right\rfloor }\) such that the optimal solution of the dual (19) of \(\beta _m\) is given by \(Y = \frac{1}{(m-1)!}v_mv_m^\textsf{T}\). Note that \(v_m\) can be indexed by \(M_i\) (\(i=3,\ldots , \lfloor \tfrac{m+1}{2}\rfloor +1\)) as in Proposition 4. Each plotted function corresponds to the coefficients of one \(v_m\), where a point at position \((M_i, x)\) signifies that the coordinate of \(v_m\) corresponding to \(M_i\) is x

5.4 Iterative procedure to obtain the bounds \(\beta _m\)

To solve (19) on the computer, we follow a cut generation method: First the semidefinite program is solved without the linear constraints. Then:

  • All of the constraints are evaluated. (As m grows, this takes up most of the runtime.)

  • We add the most violated constraint as a new constraint to the semidefinite program. When there are ties, we choose the most violated constraint that was evaluated first.

  • The semidefinite program is solved again.

These steps are repeated, until no constraints are violated anymore. In theory this procedure could take \(|\Omega _m'|\) iterations. In practice however, the number of iterations is much smaller, and we are able to compute \(\beta _m\) for \(m \le 13\) up to high precision on a desktop computer—see Table 2.Footnote 2

5.5 Verifying the bounds

We explain the procedure used to verify our lower bounds. For the bound \(\beta _m\), the starting point is formulation (19). For the bound \(\alpha _m\), one can derive the following analogous formulation. For \(\lambda \vdash m\) and \(\varepsilon \in \{\pm 1\}\), let \(m_{\lambda }^{\varepsilon }\) denote the number of columns of \(U_{\lambda }^{\varepsilon }\) in the representative set for the action of \(S_m \times S_2\) on \({\mathbb {C}}^{Z_m}\) we derived in Sect. 2. Also, for \(\omega \in \Omega _m'\), define the matrix \(C_{\omega }:= \oplus _{\lambda \vdash m, \, \varepsilon \in \{\pm 1\}} (U_{\lambda }^{\varepsilon })^\textsf{T}K_{\omega } U_{\lambda }^{\varepsilon }\). Then

$$\begin{aligned} \alpha _m = \textrm{max}\left\{ t \, :\, Y \in \,\bigoplus _{\begin{array}{c} \lambda \vdash n\\ \varepsilon \in \{\pm 1 \} \end{array}} {\mathbb {R}}^{m_{\lambda }^{\varepsilon } \times m_{\lambda }^{\varepsilon }}, \, Y \succeq 0, \, \forall \omega \in \Omega _m'\, :\, \langle Y, C_{\omega }\rangle + |\omega | t \le |\omega |q_{\omega } \right\} . \end{aligned}$$
(20)

Note that all our SDP’s contain integer data after block-diagonalization, so in the SDP-input there is no rounding. However, the high-precision interior-point solution (tY) to (19) or (20) obtained from the solver may exhibit tiny infeasibilities. To obtain a rational feasible solution, we do the following:

  • Round t to a rational number \(t'\), and round the eigenvalues \(\lambda _i\) and eigenvectors \(v_i\) of Y to rationals \(\hat{\lambda }_i\) and rational vectors \({\hat{v}}_i\). Construct a new matrix \(Y':=\sum _{i'} \hat{\lambda }_{i'} {\hat{v}}_{i'} {\hat{v}}_{i'}^\textsf{T}\) from the nonnegative rounded eigenvalues and the corresponding rounded eigenvectors. Then \(Y' \succeq 0\).

  • Check each of the inequalities (involving only rational numbers) in (19) or (20) using the rational matrix \(Y'\). If the inequality corresponding to \(\omega \) is violated, replace \(t'\) by \((|\omega |q_{\omega }-\langle Y', C_{\omega }\rangle )/|\omega |\) so that the inequality is not violated anymore.

In this way, we obtain rational feasible solutions \((t', Y')\) to (19) or (20) and thus guaranteed lower bounds on \(\alpha _m\) and \(\beta _m\). The obtained lower bounds coincide with the approximations of \(\alpha _m\) and \(\beta _m\) computed by the solver for all decimals given in Table 2. (At least 40 decimals are correct for all computed bounds except \(\alpha _{10}\) using SDPA-GMP [19], and at least 13 decimals are correct for \(\alpha _{10}\) using SDPA-DD.)