We now consider symmetric sums of squares. It was already observed in [12] that invariance under a group action allows us to demand sum of squares decompositions which put strong restrictions on the underlying squares. First, we explain the general approach, which uses representation theory and can be used for other groups as well. Our presentation follows the ideas of [12] which we present in a slightly different way. The interested reader is advised to consult there for more details.
Invariant Sums of Squares
Let G be a finite group acting linearly on \({\mathbb {R}}^{n}\). As G acts linearly on \({\mathbb {R}}^n\) also the \({\mathbb {R}}\)-vector space \({\mathbb {R}}[X]\) can be viewed as a G-module and by Maschke’s theorem (the reader may consult for example [34] for basics in linear representation theory) there exists a decomposition of the form
$$\begin{aligned} {\mathbb {R}}[X] = V^{(1)} \oplus V^{(2)} \oplus \cdots \oplus V^{(h)} \end{aligned}$$
(4.1)
with \(V^{(j)} = W^{(j)}_1 \oplus \cdots \oplus W^{(j)}_{\eta _j}\) and \(\nu _j := \dim W^{(j)}_i\). Here, the \(W^{(j)}_i\) are the irreducible components and the \(V^{(j)}\) are the isotypic components, i.e., the direct sums of isomorphic irreducible components. The component with respect to the trivial irreducible representation is the invariant ring \({\mathbb {R}}[X]^G\). The elements of the other isotypic components are called semi-invariants. It is classically known that each isotypic component is a finitely generated \({\mathbb {R}}[X]^{G}\)-module (see [36, Theorem 1.3]). To any element \(f\in H_{n,d}\) we can associate a symmetrization by which we mean its image under the following linear map:
Definition 4.1
For a finite group G the linear map \({\mathcal {R}}_G:H_{n,d}\rightarrow H_{n,d}^{G}\) defined by
$$\begin{aligned} {\mathcal {R}}_G(f):=\frac{1}{|G|}\sum _{\sigma \in G}\sigma (f) \end{aligned}$$
is called the Reynolds operator of G. In the case of \(G={\mathcal {S}}_n\) we say that \({\mathcal {R}}_{{\mathcal {S}}_n}(f)\) is a symmetrization of f and we write \({{\,\mathrm{sym}\,}}f\) in this case.
For a set of polynomials \(f_1,\ldots ,f_l\) we will write \(\sum {\mathbb {R}}\{f_1,\ldots ,f_l\}^2\) to refer to the sums of squares of elements in the linear span of the polynomials \(f_1,\ldots ,f_l\). It has already been observed by Gaterman and Parrilo [12] that invariant sums of squares can be written as sums of squares of semi-invariants using Schur’s Lemma. However, a closer inspection of the situation allows in many cases—as for example in the case of \({\mathcal {S}}_n\)—a finer analysis of the decomposition into sums of squares. Consider a set of forms \(\{f_{1,1},\ldots ,f_{1,\eta _1},f_{2,1},\ldots ,f_{h,\eta _h}\}\) such that for fixed j the forms \(f_{j,i}\) generate irreducible components of \(V^{(j)}\). Further assume that they are chosen in such a way, that for each j and each pair (l, k) there exists a G-isomorphism \(\rho _{l,k}^{(j)}:V^{j}\rightarrow V^{j}\) which maps \(f_{j,l}\) to \(f_{j,k}\). Now for every j we consider the set \(\{f_{j,1},\ldots ,f_{j,\eta _j}\}\) which contains only one polynomial per irreducible module. However, since every irreducible module is generated by the G-orbit of only one element, every such set uniquely describes the chosen decomposition. We call such a set a symmetry basis and show that invariant sums of squares are in fact symmetrizations of sums of squares of a symmetry basis. The following theorem, which we state in a slightly more general setup, highlights the use of a symmetry basis.
Theorem 4.2
Let G be a finite group and assume that all real irreducible representations \(V\subset H_{n,d}\) are also irreducible over their complexification. Let p be a form of degree 2d that is invariant with respect to G. If p is a sum of squares, then p can be written in the form
$$\begin{aligned} p=\sum _{j}^{h} q_j,\quad \text {where each}\quad q_j\in \sum {\mathbb {R}}\{f_{j,1},\ldots f_{j,\eta _j}\}^2. \end{aligned}$$
The main tool for the proof is Schur’s Lemma, and we remark that a dual version of this theorem can be found in [28, Thm. 3.4] and [25].
Proof
Let \(p\in H_{n,2d}\) be a G-invariant sum of squares. Then there exists a symmetric positive semidefinite bilinear form
$$\begin{aligned} B:H_{n,d}\times H_{n,d}\rightarrow {\mathbb {R}}\end{aligned}$$
which is a Gram matrix for p, i.e., for every \(x\in {\mathbb {R}}^{n}\) we can write \(p(x)=B(X^{d},X^{d})\), where \(X^{d}\) stands for the d-th power of x in the symmetric algebra of \({\mathbb {R}}^{n}\). Since p is G-invariant, we have \(p={\mathcal {R}}_G(p)\) and by linearity we may assume that B is a G-invariant bilinear form. Now decompose \(H_{n,2d}\) as in (4.1) and consider the restriction of B to
$$\begin{aligned} B^{ij}:V^{(i)}\times V^{(j)}\rightarrow {\mathbb {R}}\quad \text {with}\quad i\ne j. \end{aligned}$$
For every \(v\in V^{(i)}\) the quadratic form \(B^{ij}\) defines a linear map \(\phi _v:V^{(j)}\rightarrow {\mathbb {R}}\) via \(\phi _v(w):=B^{ij}(v,w)\) and so the form \(B^{ij}\) naturally can be seen as an element of \({{\,\mathrm{Hom}\,}}^{G}\bigl ({V^{(i)}}^{*},V^{(j)}\bigr )\). Since real representations are self-dual we have that \({V^{(i)}}^{*}\) and \(V^{(j)}\) are not isomorphic and thus by Schur’s Lemma we find that \(B^{ij}(v,w)=0\) for all \(v\in V^{(i)}\) and \(w\in V^{(j)}\). So the isotypic components are orthogonal with respect to B and hence it suffices to look at
$$\begin{aligned} B^{jj}:V^{(j)}\times V^{(j)}\rightarrow {\mathbb {R}}\end{aligned}$$
individually. We have \(V^{(j)}:=\bigoplus _{k=1}^{l} W^{(j)}_{k}\), where each \(W^{(j)}_k\) is generated by a semi-invariant \(f_{j,k}\), i.e., there is a basis \(f_{j,k,1},\ldots ,f_{j,k,\nu _j}\) for every \(W^{(j)}_k\) such that the basis elements \(f_{j,k,i}\) are taken from the orbit of \(f_{j,k}\) under G. To again use Schur’s Lemma we identify \(B_j\) with its complexification \(B_j^{{\mathbb {C}}}\), which is possible since we assumed that all representations are irreducible also over \({\mathbb {C}}\). Consider a pair \(W^{(j)}_{k_1}, W^{(j)}_{k_2}\), where we allow \(k_1=k_2\). To apply Schur’s Lemma we relate the quadratic from \(B^{jj}\) to a linear map \(\psi ^{(j)}_{k_1,k_2}:W^{(j)}_{k_1}\rightarrow W^{(j)}_{k_2}\) defined on the generating set \(f_{j,k_1,1},\ldots ,f_{j,k_1,\nu _j}\) by
$$\begin{aligned} \psi ^{(j)}_{k_1,k_2}(f_{j,k_1,u}):=\sum _{v}B^{jj}(f_{j,k_1,u},f_{j,k_2,v}) f_{j,k_2,v}. \end{aligned}$$
Since we assumed that \(W^{(j)}_{k}\) are absolutely irreducible we have by Schur’s Lemma
$$\begin{aligned} \dim {{{\,\mathrm{Hom}\,}}^G\bigl (W^{(j)}_{k_1},W^{(j)}_{k_2}\bigr )}=1 \end{aligned}$$
and we can conclude that this map is unique up to scalar multiplication. Therefore it can be represented in the form \(\psi ^{(j)}_{k_1,k_2}=c_{k_1,k_2}\rho _{k_1,k_2}\), where \(\rho _{k_1,k_2}\) is the G-isomorphism with \(\rho _{k_1,k_2}(f_{j,k_1})=f_{j,k_2}\) as above. It therefore follows that
$$\begin{aligned} B^{jj}(f_{j,k_1,u},f_{j,k_2,v})=\delta _{u,v}c_{k_1,k_2}, \end{aligned}$$
where \(\delta _{u,v}\) denotes the Kronecker delta. By considering the matrix of B with respect to the basis \(f_{j,k,l}\) of \(H_{n,d}\) we see that p has the desired decomposition. \(\square \)
Remark 4.3
The above statement also holds true in the situation where one looks at sums of squares of elements of an arbitrary G-closed submodule \(T\subset {\mathbb {R}}[X]\).
In some situations it is convenient to formulate the above Theorem 4.2 in terms of matrix polynomials, i.e., matrices with polynomial entries. Given two \(k\times k\) symmetric matrices A and B define their inner product as \(\langle A,B \rangle ={\text {trace}}{(AB)}\). Define a block-diagonal symmetric matrix A with h blocks \(A^{(1)},\dots ,A^{(h)}\) with the entries of each block given by:
$$\begin{aligned} A^{(j)}_{ik}=g_{ik}^{(j)}={\mathcal {R}}_G(f_{j,i}\cdot f_{j,k}). \end{aligned}$$
Then Theorem 4.2 is equivalent to the following statement:
Corollary 4.4
With the conditions as in Theorem 4.2 let \(p\in {\mathbb {R}}[X]^G\). Then p is a sum of squares of polynomials in T if and only if p can be written as \(p=\langle A,B \rangle \), where B is a positive semidefinite matrix with real entries.
We now aim to apply Theorem 4.2 to a symmetric form \(p \in H_{n,2d}^S\). In order to do this we need to identify an explicit representative in every irreducible \({\mathcal {S}}_n\)-submodule of \(H_{n,d}\). We first recall some useful facts from the representation theory of \({\mathcal {S}}_n\). The irreducible representations in this case are the so-called Specht modules, which we will define in the following section. We refer to [18, 31] for more details.
Specht Modules as Polynomials
Let \(\lambda =(\lambda _1,\lambda _2,\ldots ,\lambda _l)\vdash n\) be a partition of n. A Young tableau of shape \(\lambda \) consists of l rows, with \(\lambda _i\) entries in the i-th row. Each entry is an element in \(\{1, \ldots , n\}\), and each of these numbers occurs exactly once. A standard Young tableau is a Young tableau in which all rows and columns are increasing. An element \(\sigma \in {\mathcal {S}}_n\) acts on a Young tableau by replacing each entry by its image under \(\sigma \). Two Young tableaux \(T_1\) and \(T_2\) are called row-equivalent if the corresponding rows of the two tableaux contain the same numbers. The classes of row-equivalent Young tableaux are called tabloids, and the equivalence class of a tableau T is denoted by \(\{T\}\). The stabilizer of a row-equivalence class is called the row-stabilizer, denoted by \({{\,\mathrm{RStab}\,}}_T\). If \(R_1,\ldots ,R_l\) are the rows or a given Young tableau T this group can be written as
$$\begin{aligned} {{\,\mathrm{RStab}\,}}_T = {\mathcal {S}}_{R_1}\times {\mathcal {S}}_{R_2}\times \cdots \times {\mathcal {S}}_{R_l}, \end{aligned}$$
where \({\mathcal {S}}_{R_i}\) is the symmetric group on the elements of row i. The action of \({\mathcal {S}}_n\) on the equivalence classes of row-equivalent Young tableaux gives rise to the permutation module \(M^{\lambda }\) corresponding to \(\lambda \) which is the \({\mathcal {S}}_n\)-module defined by
$$\begin{aligned} M^\lambda ={\mathbb {R}}\{ \{T_1\}, \ldots ,\{T_s\}\}, \end{aligned}$$
where \(\{T_1\}, \ldots , \{T_s\}\) is a complete list of \(\lambda \)-tabloids and \({\mathbb {R}}\{ \{T_1\}, \ldots ,\{T_s\}\}\) denotes their \({\mathbb {R}}\)-linear span.
Let T be a Young tableau for \(\lambda \vdash n\), and let \(C_i\) be the entries in the i-th column of t. The group
$$\begin{aligned} {{\,\mathrm{CStab}\,}}_T= {\mathcal {S}}_{C_1}\times {\mathcal {S}}_{C_2}\times \cdots \times {\mathcal {S}}_{C_\nu }, \end{aligned}$$
where \({\mathcal {S}}_{C_i}\) is the symmetric group on the elements of columns i, is called the column stabilizer of T. The irreducible representations of the symmetric group \({\mathcal {S}}_n\) are in 1-1-correspondence with the partitions of n, and they are given by the Specht modules, as explained below. For \(\lambda \vdash n\), the polytabloid associated with T is defined by
$$\begin{aligned} e_T \,= \sum _{\sigma \in {{\,\mathrm{CStab}\,}}_t}{{\,\mathrm{sgn}\,}}(\sigma )\sigma \{t\}. \end{aligned}$$
Then for a partition \(\lambda \vdash n\), the Specht module \(S^{\lambda }\) is the submodule of the permutation module \(M^\lambda \) spanned by the polytabloids \(e_T\). The dimension of \(S^{\lambda }\) is given by the number of standard Young tableaux for \(\lambda \vdash n\), which we will denote by \(s_\lambda \).
A classical construction of Specht realizes Specht modules as submodules of the polynomial ring (see [35]): For \(\lambda \vdash n\) let \(T_{\lambda }\) be a standard Young tableau of shape \(\lambda \) and \({\mathcal {C}}_1,\ldots ,{\mathcal {C}}_{\nu }\) be the columns of \(T_\lambda \). To \(T_\lambda \) we associate the monomial \(X^{t_{\lambda }}:=\prod _{i=1}^{n}X_i^{m(i)-1}\), where m(i) is the index of the row of \(T_{\lambda }\) containing i. Note that for any \(\lambda \)-tabloid \(\{T_{\lambda }\}\) the monomial \(X^{T_{\lambda }}\) is well defined, and the mapping \(\{T_{\lambda }\} \mapsto X^{T_{\lambda }}\) is an \({\mathcal {S}}_n\)-isomorphism. For any column \({\mathcal {C}}_i\) of \(T_\lambda \) we denote by \({\mathcal {C}}_i(j)\) the element in the j-th row and we associate to it a Vandermonde determinant:
$$\begin{aligned} {{\,\mathrm{Van}\,}}_{{\mathcal {C}}_{i}}:=\,\det {\begin{pmatrix} X_{ {\mathcal {C}}_i(1)}^0&{}\quad \ldots &{}\quad X_{{\mathcal {C}}_i(k)}^0 \\ \vdots &{}\quad \ddots &{}\quad \vdots \\ X_{ {\mathcal {C}}_i(1)}^{k-1}&{}\quad \ldots &{}\quad X_{{\mathcal {C}}_i(k)}^{k-1} \end{pmatrix}}=\,\prod _{j<l}(X_{{\mathcal {C}}_i(j)}-X_{{\mathcal {C}}_i(l)}). \end{aligned}$$
The Specht polynomial \(sp_{T_{\lambda }}\) associated to \(T_\lambda \) is defined as
$$\begin{aligned} sp_{T_{{\lambda }}} := \prod _{i=1}^{\nu } {{\,\mathrm{Van}\,}}_{{\mathcal {C}}_{i}}\,=\sum _{\sigma \in {{\,\mathrm{CStab}\,}}_{T_{\lambda }}}{{\,\mathrm{sgn}\,}}(\sigma )\sigma (X^{T_{\lambda }}), \end{aligned}$$
where \({{\,\mathrm{CStab}\,}}_{T_{\lambda }}\) is the column stabilizer of \(T_\lambda \). By the \({\mathcal {S}}_n\)-isomorphism \(\{T_{\lambda }\} \mapsto X^{t_{\lambda }}\), \({\mathcal {S}}_n\) acts on \(sp_{T_{{\lambda }}}\) in the same way as on the polytabloid \(e_{T_{\lambda }}\). If \(T_{\lambda ,1},\ldots ,T_{\lambda ,k}\) denote all standard Young tableaux associated to \(\lambda \), then the polynomials \(sp_{T_{\lambda ,1}},\ldots ,sp_{T_{\lambda ,k}}\) are called the Specht polynomials associated to \(\lambda \). We then have the following proposition; see [35].
Proposition 4.5
The Specht polynomials \(sp_{T_{\lambda ,1}},\ldots ,sp_{T_{\lambda ,k}}\) span an \({\mathcal {S}}_n\)-submodule of \({\mathbb {R}}[X]\) which is isomorphic to the Specht module \(S^{\lambda }\).
The Specht polynomials identify a submodule of \({\mathbb {R}}[X]\) isomorphic to \({\mathcal {S}}^{\lambda }\). In order to get a decomposition of the entire ring \({\mathbb {R}}[X]\) we will use a generalization of this construction which is described in the next section.
Higher Specht Polynomials and the Decomposition of \({\mathbb {R}}[X]\)
In what follows we will need to understand the decomposition of the polynomial ring \({\mathbb {R}}[X]\) and \({\mathcal {S}}_n\)-module \(H_{n,d}\) in terms of \({\mathcal {S}}_n\)-irreducible representations. Notice that such a decomposition is not unique. It is classically known that the ring \({\mathbb {R}}[X]\) is a free module of dimension n! over the ring of symmetric polynomials. Similarly, every isotypic component is a free \({\mathbb {R}}[X]^{{\mathcal {S}}_n}\)-module. Therefore, one general strategy in order to get a symmetry basis of \({\mathbb {R}}[X]\) consists in building a free module basis for \({\mathbb {R}}[X]\) over \({\mathbb {R}}[X]^{{\mathcal {S}}_n}\) which additionally is symmetry adapted, i.e., which respects a decomposition into irreducible \({\mathcal {S}}_n\)-modules. One such construction, which generalizes Specht’s original construction presented above, is due to Ariki et al. [1].
Definition 4.6
Let \(n\in {\mathbb {N}}\).
-
(i)
A finite sequence \(w=(w_1,\ldots ,w_n)\) of non-negative integers is called a word of length n. A word w of length n is called a permutation if the set of non-negative integers forming a word of length n is \(\{1,\ldots ,n\}\).
-
(ii)
Given a word w and a permutation u we define the monomial associated to the pair as \(X_u^{w}:=X_{u_1}^{w_1}\cdots X_{u_n}^{w_n}\).
-
(iii)
Given a permutation w we associate to w its index, denoted by i(w), by constructing the following word of length n. The word i(w) contains 0 exactly at the same position where 1 occurs in w and the other entries we define recursively with the following rule: Suppose that the entry in i(w) at a given position is c and that k occurs in w at the same position, then i(w) should be also c if it lies to the right of k and it should be \(c+1\) if it lies to the left of k.
-
(iv)
For \(\lambda \vdash n\) and T being a standard Young tableau of shape \(\lambda \), we define the word of T, denoted by w(T), by collecting the entries of T from the bottom to the top in consecutive columns starting from the left.
-
(v)
For a pair (T, V) of standard \(\lambda \)-tableaux we define the monomial associated to this pair as \(X_{w(V)}^{i(w(T))}\).
Example 4.7
Consider the tableau
The resulting word is given by \(w(T)=31524\), with \(i(w(T))=10001\). Taking
we obtain \(X_{w(V)}^{i(w(T))}=X_1^0X_2^1X_3^0X_4^2X_5^1\).
Definition 4.8
Let \(\lambda \vdash n\) and T be a \(\lambda \)-tableau. Then the Young symmetrizer associated to T is the element in the group algebra \({\mathbb {R}}[{\mathcal {S}}_n]\) defined to be
$$\begin{aligned} \varepsilon _T\,=\sum _{\sigma \in {{\,\mathrm{RStab}\,}}_T}\sum _{\tau \in {{\,\mathrm{CStab}\,}}_T} {{\,\mathrm{sgn}\,}}( \tau ) \tau \sigma . \end{aligned}$$
Now let T be a standard Young tableau, and define the higher Specht polynomial associated with the pair (T, V) to be
$$\begin{aligned} F_V^T(X_1,\ldots ,X_n):=\varepsilon _V\bigl (X_{w(V)}^{i(w(T))}\bigr ). \end{aligned}$$
For \(\lambda \vdash n\) we will denote by
$$\begin{aligned} {\mathcal {F}}_\lambda =\bigl \{F_V^{T}:T,V \text { run over all standard } \lambda \text {-tableaux}\bigr \} \end{aligned}$$
the set of all standard higher Specht polynomials corresponding to \(\lambda \) and by
$$\begin{aligned} {\mathcal {F}}=\bigcup _{\lambda \vdash n} {\mathcal {F}}_\lambda \end{aligned}$$
the set of all standard higher Specht polynomials.
Remark 4.9
Let \(s_{\lambda }\) denote the number of standard Young tableaux of shape \(\lambda \). It follows from the so-called Robinson–Schensted correspondence (see [31]) that
$$\begin{aligned} \sum _{\lambda \vdash n} s_\lambda ^2=n! \end{aligned}$$
Therefore the cardinality of \({\mathcal {F}}\) is exactly n!
The importance of the higher Specht polynomials now is summarized in the following theorem which can be found in [1, Thm. 1].
Theorem 4.10
The following holds for the set of higher Specht polynomials.
-
(i)
The set \({\mathcal {F}}\) is a free basis of the ring \({\mathbb {R}}[X]\) over the invariant ring \({\mathbb {R}}[X]^{{\mathcal {S}}_n}\).
-
(ii)
For any \(\lambda \vdash n\) and standard \(\lambda \)-tableau T, the space spanned by the polynomials in
$$\begin{aligned} {\mathcal {F}}^T_\lambda :=\bigl \{F^T_V:V \text { runs over all standard } \lambda \text {-tableaux}\bigr \} \end{aligned}$$
is an irreducible \({\mathcal {S}}_n\)-module isomorphic to the Specht module \(S^{\lambda }\).
For every \(\lambda \vdash n\) we denote by \(V_0^{\lambda }\) the standard \(\lambda \)-tableau with entries \(\{1,\ldots ,\lambda _1\}\) in the first row, \(\{\lambda _1+1,\ldots ,\lambda _2\}\) in the second row, and so on. Consider the set
$$\begin{aligned} {\mathcal {Q}}_\lambda :=\bigl \{F^T_{V_0^{\lambda }}:T \text { runs over all standard } \lambda \text {-tableaux}\bigr \}, \end{aligned}$$
which is of cardinality \(s_\lambda \). The set \({\mathcal {Q}}_\lambda \) is a symmetry basis of the vector space spanned by \({\mathcal {F}}\). Using these polynomials we define \(s_\lambda \times s_\lambda \) matrix polynomials \(Q^{\lambda }\) by:
$$\begin{aligned} Q^\lambda ({T,T'}):={{\,\mathrm{sym}\,}}{F^T_{V_0^{\lambda }}F^{T'}_{V_0^{\lambda }}}, \end{aligned}$$
(4.2)
where \(T,T'\) run over all standard \(\lambda \)-tableaux. Since by (i) in Theorem 4.10 we know that every polynomial \(h\in {\mathbb {R}}[X]\) can be uniquely written as a linear combination of elements in \({\mathcal {F}}\) with coefficients in \({\mathbb {R}}[X]^{{\mathcal {S}}_n}\), the following theorem can be thought of as a generalization of Corollary 4.4 to sums of squares from an \({\mathcal {S}}_n\)-module with coefficients in an \({\mathcal {S}}_n\)-invariant ring (see also [12, Thm. 6.2]):
Theorem 4.11
Let \(p\in {\mathbb {R}}[X]^{{\mathcal {S}}_n}\) be a symmetric polynomial. Then p is a sum of squares if and only if it can be written in the form
$$\begin{aligned} p=\sum _{\lambda \vdash n}\langle B^\lambda ,Q^\lambda \rangle , \end{aligned}$$
where \(Q^\lambda \) is defined in (4.2) and each \(B^\lambda \in {\mathbb {R}}[X]^{s_\lambda \times s_\lambda }\) is a sum of symmetric squares matrix polynomial, i.e., \(B^\lambda (x)=L^{t}(x)L(x)\) for some matrix polynomial L(x) whose entries are symmetric polynomials.
Each entry of the matrix \(Q^{\lambda }\) is a symmetric polynomial and thus can be represented as a polynomial in any set of generators of the ring of symmetric polynomials. We will use the power means \(p_1,\ldots ,p_n\) to phrase the next theorem. However, any other choice works similarly. With this choice of basis it follows that there exists a matrix polynomial \({\tilde{Q}}^{\lambda }(z_1,\ldots ,z_n)\) in n variables \(z_1,\ldots ,z_n\) such that
$$\begin{aligned} {\tilde{Q}}^{\lambda }(p_1(x),\ldots ,p_n(x))=Q^{\lambda }(x). \end{aligned}$$
(4.3)
With this notation one can restate Theorem 4.11 in the following way:
Theorem 4.12
Let \(f\in {\mathbb {R}}[X]^{{\mathcal {S}}_n}\) be a symmetric polynomial and \(g\in {\mathbb {R}}[z_1,\ldots ,z_n]\) such that \(f=g(p_1,\ldots ,p_n)\). Then f is a sum of squares if and only if g can be written in the form
$$\begin{aligned} g=\sum _{\lambda \vdash n} \langle B^\lambda ,{\tilde{Q}}^\lambda \rangle , \end{aligned}$$
where \({\tilde{Q}}^\lambda \) is defined in (4.3) and each \(B^\lambda \in {\mathbb {R}}[z]^{s_\lambda \times s_\lambda }\) is a sum of squares matrix polynomial, i.e., \(B^\lambda :=L(z)^{t}L(z)\) for some matrix polynomial L.
While Theorems 4.11 and 4.12 give a characterization of symmetric sums of squares in a given number of variables, we need to understand the behavior of the \({\mathcal {S}}_n\)-module \(H_{n,d}\) for polynomials of a fixed degree d in a growing number of variables n. This will be done in the next section.
The Cone \(\Sigma _{n,2d}^S\)
A symmetric sum of squares \(f \in \Sigma ^S_{n,2d}\) has to be a sum of squares from \(H_{n,d}\). Therefore we now consider restricting the degree of the squares in the underlying sum of squares representation. With a little abuse of notation we denote by \({\mathcal {F}}_{n,d}\) the vector space spanned by higher Specht polynomials for the group \({\mathcal {S}}_n\) of degree at most d. Further, for a partition \(\lambda \vdash n\) let \({\mathcal {F}}_{\lambda ,d}\) denote the span of the higher Specht polynomials of degree at most d corresponding to the Specht module \({\mathcal {S}}^{\lambda }\), i.e., \({\mathcal {F}}_{\lambda ,d}\) is exactly the isotypic component of \({\mathcal {F}}_{n,d}\) corresponding to \({\mathcal {S}}^{\lambda }\). In order to describe this isotypic component combinatorially, recall that the degree of the higher Specht polynomial \(F_T^S\) is given by the charge c(S) of S. Thus, it follows from the above construction that
$$\begin{aligned} {\mathcal {F}}_{\lambda ,d}={\text {span}}{\bigl \{F_T^{S}: S,T \text { are standard } \lambda \text {-tableaux and } c(S)\le d \bigr \}}. \end{aligned}$$
We now show that sums of squares of degree 2d in n variables can be constructed by symmetrizing sums of squares in 2d variables. So we first consider the case \(n=2d\). Let
$$\begin{aligned} {\mathcal {F}}_{2d,d}=\bigoplus _{\lambda \vdash 2d} m_{\lambda }S^{\lambda } \end{aligned}$$
be the decomposition of \({\mathcal {F}}_{2d,d}\) as an \({\mathcal {S}}_{2d}\)-module. The following proposition gives the multiplicities of the different \({\mathcal {S}}_n\)-modules appearing in the vector space of homogeneous polynomials of degree d.
Proposition 4.13
The multiplicities \(m_{\lambda }\) of the \({\mathcal {S}}_n\)-modules \({\mathcal {S}}^\lambda \) which appear in an isotypic decomposition \(H_{n,d}\) coincide with the number of standard \(\lambda \)-tableaux S with the charge of S at most d: \(c(S)\le d\).
For a partition \(\lambda \vdash 2d\) and \(n \ge 2d\) define a new partition \(\lambda ^{(n)} \vdash n\) by simply increasing the first part of \(\lambda \) by \(n-2d\): \(\lambda ^{(n)}_1=\lambda _1+n-2d\) and \(\lambda ^{(n)}_i=\lambda _i\) for \(i\ge 2\). Then the decomposition Theorem 4.10 in combination with [28, Thm. 4.7] yields that
$$\begin{aligned} {\mathcal {F}}_{n,d}=\bigoplus _{\lambda \vdash 2d} m_{\lambda }{\mathcal {S}}^{\lambda ^{(n)}}. \end{aligned}$$
For every \(\lambda \vdash 2d\) we choose \(m_\lambda \) many higher Specht polynomials \(q_1^{\lambda },\ldots ,q_{m_\lambda }^{\lambda }\) that form a symmetry basis of the \(\lambda \)-isotypic component of \({\mathcal {F}}_{2d,d}\). Let \(q_\lambda =(q_1^{\lambda },\ldots ,q_{m_\lambda }^{\lambda })\) be the vector with entries \(q_i^{\lambda }\). As before we construct the matrix \(Q^{\lambda }_{2d}\) by
$$\begin{aligned} Q^{\lambda }_{2d}={{\,\mathrm{sym}\,}}_{2d}q_{\lambda }^t q_\lambda , \qquad Q_{2d}^\lambda ({i,j})={{\,\mathrm{sym}\,}}_{2d}q^{\lambda }_{i}q^{\lambda }_{j}. \end{aligned}$$
Further, we define the matrix \(Q_n^{\lambda }\) by
$$\begin{aligned} Q_{n}={{\,\mathrm{sym}\,}}_{n}q_{\lambda }^t q_\lambda , \qquad Q_n^{\lambda }(i,j)={{\,\mathrm{sym}\,}}_nq_i^{\lambda }q_j^{\lambda }. \end{aligned}$$
By construction we have the following:
Proposition 4.14
The matrix \(Q_n^{\lambda }\) is the \({\mathcal {S}}_n\)-symmetrization of the matrix \(Q_{2d}^{\lambda }\):
$$\begin{aligned} Q_n^{\lambda }={{\,\mathrm{sym}\,}}_n Q_{2d}^{\lambda }. \end{aligned}$$
We now give a parametric description of the family of cones \(\Sigma ^S_{n,2d}\). Note again that this statement is given in terms of a particular basis, but similarly can be stated with any set of generators.
Theorem 4.15
Let \(f:=\sum _{\lambda \vdash 2d} c_\lambda p_\lambda ^{(n)}\in H_{n,2d}^S\). Then f is a sum of squares if and only if it can be written in the form
$$\begin{aligned} f=\sum _{\lambda \vdash 2d} \langle B^\lambda ,Q_n^\lambda \rangle , \end{aligned}$$
where each \(B^\lambda \in {\mathbb {R}}\bigl [p_1^{(n)},\ldots ,p_{d}^{(n)}\bigr ]^{m_\lambda \times m_\lambda }\) is a sum of squares matrix of power sum polynomials, i.e., \(B^\lambda =L_{\lambda }^{t}L_{\lambda }\) for some matrix polynomial \(L_{\lambda }\bigl (p_1^{(n)},\ldots ,p_{d}^{(n)}\bigr )\) whose entries are weighted homogeneous forms. Additionally, we have for every column k of \(L_{\lambda }\),
$$\begin{aligned} \deg _w Q_n^{\lambda }(i,k)+2\deg _w L_{\lambda }(k,i)=2d \end{aligned}$$
or, equivalently, every entry \(B^{\lambda }(i,j)\) of \(B^{\lambda }\) is a weighted homogeneous form such that
$$\begin{aligned} \deg _w Q_n^{\lambda }(i,j)+\deg _w B^{\lambda }(i,j)=2d. \end{aligned}$$
Proof
In order to apply Theorem 4.11 to our fixed degree situation we have to show that the forms \(\{q_1^{\lambda },\ldots ,q_{m_\lambda }^{\lambda }\}\) when viewed as functions in n variables also form a symmetry basis of the \(\lambda ^{(n)}\)-isotypic component of \({\mathcal {F}}_{n,d}\) for all \(n \ge 2d\). Indeed, consider a standard Young tableau \(t_{\lambda }\) of shape \(\lambda \) and construct a standard Young tableau \(t_{\lambda ^{(n)}}\) of shape \(\lambda ^{(n)}\) by adding numbers \(2d+1,\dots ,n\) as rightmost entries of the top row of \(t_{\lambda ^{(n)}}\), while keeping the rest of the filling of \(t_{\lambda ^{(n)}}\) the same as for \(t_{\lambda }\). It follows by construction of the Specht polynomials that
$$\begin{aligned} sp_{t_{\lambda }}=sp_{t_{\lambda ^{(n)}}}. \end{aligned}$$
We may assume, the \(q_k^{(\lambda )}\) were chosen so that they map to \(sp_{t_{\lambda }}\) by an \({\mathcal {S}}_{2d}\)-isomorphism. We observe that \(sp_{t_{\lambda }}\) (and therefore \(sp_{t_\lambda ^{(n)}}\)) and \(q_k^{\lambda }\) do not involve any of the variables \(X_{j}\), \(j>2d\). Therefore both are stabilized by \({\mathcal {S}}_{n-2d}\) (operating on the last \(n-2d\) variables), and further the action on the first 2d variables is exactly the same. Thus there is an \({\mathcal {S}}_{n}\)-isomorphism mapping \(q_k^{\lambda }\) to \(sp_{t_\lambda ^{(n)}}\) and the \({\mathcal {S}}_{n}\)-modules generated by the two polynomials are isomorphic. Therefore it follows that \(q_k^{(\lambda )}\) also form a symmetry basis of the \(\lambda ^{(n)}\)-isotypic component of \({\mathcal {F}}_{n,d}\). \(\square \)
Remark 4.16
We remark that the sum of squares decomposition of \(f=\sum _{\lambda \vdash 2d} \langle B^{\lambda },Q^{\lambda }_n \rangle \), with \(B^{\lambda }=L_\lambda ^tL_{\lambda }\) can be read off as follows:
$$\begin{aligned} f=\sum _{\lambda \vdash 2d} {{\,\mathrm{sym}\,}}_n q_{\lambda }^t B^{\lambda }q_{\lambda } =\sum _{\lambda \vdash 2d} {{\,\mathrm{sym}\,}}_n ( L_{\lambda }q_{\lambda })^tL_{\lambda }q_{\lambda } . \end{aligned}$$
(4.4)
In particular, if for a fixed \(\lambda \vdash n\) and for every \(1\le i\le m_\lambda \) we denote \(\delta _i:=d-\deg q_i^{\lambda }\), then the set of polynomials
$$\begin{aligned} \bigcup _{i=1}^{m_\lambda }\bigcup _{\nu \vdash \delta _i}\left\{ q_i^{\lambda }p_\nu \right\} \end{aligned}$$
(4.5)
is a symmetry basis of the isotypic component \(H_{n,d}\) corresponding to \(\lambda \).
The Dual Cone of Symmetric Sums of Squares
Recall, that for a convex cone \(K\subset {\mathbb {R}}^n\) the dual cone \(K^*\) is defined as
$$\begin{aligned} K^{*}:=\{l \in {\text {Hom}}{({\mathbb {R}}^n,{\mathbb {R}})}:\ell (x)\ge 0\ \text {for all}\ x\in K\}. \end{aligned}$$
Our analysis of the dual cone \((\Sigma _{n,2d}^S)^*\) proceeds similarly to the analysis of the dual cone in the non-symmetric situation given in [4, 6].
Let \(S_{n,d}\) be the vector space of real quadratic forms on \(H_{n,d}\). Let \(S_{n,d}^+\) be the cone of positive semidefinite quadratic forms in \(S_{n,d}\). An element \({\mathcal {Q}}\in S_{n,d}\) is said to be \({\mathcal {S}}_n\)-invariant if \({\mathcal {Q}}(f)={\mathcal {Q}}(\sigma (f))\) for all \(\sigma \in {\mathcal {S}}_n\), \(f \in H_{n,d}\). We will denote by \({\bar{S}}_{n,d}\) the space of \({\mathcal {S}}_n\)-invariant quadratic forms on \(H_{n,d}\). Further we can identify a linear functional \(l\in (H_{n,2d}^{S})^*\) with a quadratic form \({\mathcal {Q}}_{l}\) defined by
$$\begin{aligned} {\mathcal {Q}}_\ell (f) = \ell ({{\,\mathrm{sym}\,}}f^2). \end{aligned}$$
Let \({\bar{S}}_{n,d}^+\) be the cone of positive semidefinite forms in \({\bar{S}}_{n,d}\), i.e.,
$$\begin{aligned} {\bar{S}}_{n,d}^+:=\{{\mathcal {Q}}\in {\bar{S}}_{n,d}:{\mathcal {Q}}(f)\ge 0\ \text {for all}\ f\in H_{n,d}\}. \end{aligned}$$
The following lemma is straightforward, but very important, as it allows us to identify the elements of dual cone \(l\in (\Sigma _{n,2d}^S)^*\) with quadratic forms \({\mathcal {Q}}_{\ell }\) in \({\bar{S}}_{n,d}^+\).
Lemma 4.17
A linear functional \(\ell \in (H_{n,2d}^S)^*\) belongs to the dual cone \((\Sigma _{n,2d}^S)^*\) if and only if the quadratic form \({\mathcal {Q}}_{\ell }\) is positive semidefinite.
Since for \(\ell \in (H_{n,2d}^S)^*\) we have \({\mathcal {Q}}_{\ell }\in {\bar{S}}_{n,d}\), Schur’s Lemma again applies and we can use the symmetry basis constructed above to simplify the condition that \({\mathcal {Q}}_{\ell }\) is positive semidefinite. In order to arrive at a dual statement of Corollary 4.15 we construct the following matrices:
Definition 4.18
For a partition \(\lambda \vdash 2d\) consider the block-matrix \(M_{n,\lambda }\) defined by
$$\begin{aligned} M_{n,\lambda }^{(i,j)}(\alpha ,\beta ):=\ell \bigl (p_{\alpha }\cdot p_{\beta }\cdot Q^{\lambda }_n(i,j)\bigr ), \end{aligned}$$
where in each block i, j the indices \((\alpha ,\beta )\) run through all pairs of weakly decreasing sequences \(\alpha =(\alpha _1,\ldots ,\alpha _a)\) and \(\beta =(\beta _1,\ldots ,\beta _b)\) such that
$$\begin{aligned} 2d-\deg _w Q^{\lambda }_n(i,j)=\alpha _1+\dots +\alpha _a+\beta _1+\dots +\beta _b. \end{aligned}$$
With this notation the following lemma is just the dual version of Corollary 4.15 and is established by expressing Lemma 4.17 in the basis given in (4.5):
Lemma 4.19
Let \(\ell \in H_{n,2d}^{*}\) be a linear functional. Then \(\ell \in (\Sigma _{n,2d})^{*}\) if and only if for all \(\lambda \vdash 2d\) the above matrices \(M_{n,\lambda }\) are positive semidefinite.
In order to examine the kernels of quadratic forms we use the following construction. Let \(W\subset H_{n,d}\) be any linear subspace. We define \(W^{<2>}\) to be the symmetrization of the degree 2d part of the ideal generated by W:
$$\begin{aligned} W^{<2>}:=\Bigl \{h\in H_{n,2d}^S:h={{\,\mathrm{sym}\,}}\sum f_ig_i\ \text {with}\ f_i \in W,\, g_i \in H_{n,d}\Bigr \}. \end{aligned}$$
In Lemma 4.17 we identified the dual cone \((\Sigma _{n,2d}^{S})^*\) with a linear section of the cone of positive semidefinite quadratic forms \(S_{n,d}^+\) with the subspace \({\bar{S}}_{n,d}\) of symmetric quadratic forms. By a slight abuse of terminology we think of positive semidefinite forms \(Q_{\ell }\) as elements of the dual cone \((\Sigma _{n,2d})^*\). The following important proposition is a straightforward adaptation of the equivalent result in the non-symmetric case [6, Proposition 2.1]:
Proposition 4.20
Let \(\ell \in (\Sigma ^{S}_{n,2d})^*\) be a linear functional non-negative on squares and let \(W_{\ell }\subset H_{n,d}\) be the kernel of the quadratic form \({\mathcal {Q}}_{\ell }\). The linear functional \(\ell \) spans an extreme ray of \((\Sigma _{n,2d}^S)^*\) if and only if \(W_{\ell }^{<2>}\) is a hyperplane in \(H_{n,2d}^S\). Equivalently, the kernel of \({\mathcal {Q}}_\ell \) is maximal, i.e., if \(\ker {\mathcal {Q}}_\ell \subseteq \ker {\mathcal {Q}}_m\) for some \(m \in H_{n,2d}^*\) then \(m=\lambda \ell \) for some \(\lambda \in {\mathbb {R}}\).
The dual correspondence yields that any facet F of a cone K, i.e., any maximal face of K, is given by an extreme ray of the dual cone \(K^*\). More precisely, for any maximal face F of K there exists an extreme ray of \(K^*\) spanned by a linear functional \(\ell \in K^*\) such that
$$\begin{aligned} F=\{x\in K:\ell (x)=0\}. \end{aligned}$$
We now aim to characterize the extreme rays of \((\Sigma _{n,2d}^{S})^*\) that are not extreme rays of the cone \(({\mathcal {P}}_{n,2d}^{S})^*\). For \(v \in {\mathbb {R}}^n\) define a linear functional
$$\begin{aligned} \ell _v:H^S_{n,2d} \rightarrow R,\quad \ell _v(f)=f(v). \end{aligned}$$
We say that the linear functional \(\ell _v\) corresponds to point evaluation at v. It is easy to show with the same proof as in the non-symmetric case that the extreme rays of the cone \(({\mathcal {P}}_{n,2d}^{S})^*\) are precisely the point evaluations \(\ell _v\) (see [5, Chap. 4] for the non-symmetric case). Therefore we need to identify extreme rays of \((\Sigma _{n,2d}^{S})^*\) which are not point evaluations. We now examine the case of degree 4 in detail, and give an explicit construction of an element of \((\Sigma _{n,4}^{S})^*\) which does not belong to \(({\mathcal {P}}_{n,2d}^{S})^*\).