1 Introduction

A central problem in dimension reduction, distributed sensing, and many statistical applications is the identification of properties of a high-dimensional random vector from knowledge of marginal distributions, i.e., the distributions of one or more lower-dimensional projections of the random vector. A simple example in statistics is the problem of computing its lowest moments. However, knowledge of some marginal distributions may not be sufficient to identify the first few high-dimensional moments. Here, we shall address the problem of designing low-dimensional projections of the random vector, so that its high-dimensional moments can be computed from the lower-dimensional ones by an explicit formula.

We consider a random vector X in \(\mathbb {R}^d\), distributed according to some Borel probability measure. In practice, X could be a random signal that is observed by distributed sensors, each measuring a certain piece of information. Inspired by [8, 20], each sensor is modeled as a matrix \(Q_j\in \mathbb {R}^{k\times d}\) with full rank \(k< d\). Computing with \(P_j:=Q_j^*(Q_jQ_j^*)^{-1}Q_j\) instead of \(Q_j\), we can effectively turn our measurement matrices into orthogonal projectors \(\{P_j\}_{j=1}^n\subset \mathcal {G}_{k,d}\), where \(\mathcal {G}_{k,d}\) denotes the set of orthogonal projectors on \(\mathbb {R}^d\) with rank k, i.e., \(P_j\) is the orthogonal projector onto the row-space of \(Q_j\). A variant of the Cramér–Wold Theorem says that two random vectors \(X,Y\in \mathbb {R}^d\) are identically distributed if and only if, for all \(P\in \mathcal {G}_{k,d}\), the two random vectors PXPY are identically distributed, cf. [26]. For further related results on projected distributions, we refer to [4, 10, 14, 17]. Here, we do not wish to identify the distribution of X, but restrict us to recover its first few moments. On the other hand, we want to achieve this by observing the moments of a number of low-dimensional projections and combining the information in a process we call moment fusion.

1.1 Moment Fusion

Suppose X is a random vector in \(\mathbb {R}^d\) distributed according to some unknown Borel probability measure on \(\mathbb {R}^d\). For a fixed integer \(p>0\), our goal is to determine the low-order moments

$$\begin{aligned} \mathbb {E}X^s, \quad s\in \mathbb {N}^d,\; |s|\le p, \end{aligned}$$
(1)

from low-order moments of lower-dimensional projections. We use here multi-index notation \(X^s=X_1^{s_1}\cdots X_d^{s_d}\) and \(|s| = \sum _{j=1}^d s_j\). More specifically, we suppose that we have only access to the first p moments of low-dimensional linear measurements, i.e., for certain matrices \(\{Q_j\}_{j=1}^n\subset \mathbb {R}^{k\times d}\) with fixed rank \(k<d\), we suppose that we know

$$\begin{aligned} \mathbb {E} (Q_jX)^s , \quad s\in \mathbb {N}^k,\; |s|\le p. \end{aligned}$$
(2)

From knowledge of \(\{Q_j\}_{j=1}^n\) and the first p moments of the dimension reduced random vectors \(Q_jX\), \(j=1,\ldots ,n\), in (2), we aim to reconstruct the high-dimensional moments of X in (1).

1.2 Special Examples

Suppose that \(x\in \mathbb {R}^d\) is a vector of unknowns. If \(\{Q_j\}_{j=1}^n\) are chosen such that

$$\begin{aligned} \{(Q_jx)^{s} : j=1,\ldots ,n,\; s\in \mathbb {N}^{k},\; |s|\le p\} \end{aligned}$$

spans the space of polynomials in x of degree at most p, then (1) can be computed from (2) by expressing each expected value \(\mathbb E X^s\), \(s\in \mathbb {N}^d\), \(|s|\le p\), in a suitable linear combination of the expected values of \(\{\mathbb E (Q_j X)^{s'}: j=1,\ldots ,n,\; s'\in \mathbb {N}^{k},\; |s'|\le p\ \}_{j=1}^n\). We provide an example for \(k=1\):

Example 1.1

  1. (1)

    If \(p=1\), then we can simply choose \(Q_j:=e_j^*\), \(j=1,\ldots ,d\), where \(\{e_j\}_{j=1}^d\) is the standard orthonormal basis for \(\mathbb {R}^d\). So, reconstruction is possible with d projectors.

  2. (2)

    For \(p=2\), the \(\{Q_j\}_{j=1}^d\) together with \(Q^+_{i,j}:=e^*_i+e^*_j\), \(i<j\) allow to reconstruct (1) from (2) with \(\left( {\begin{array}{c}d+1\\ 2\end{array}}\right) \) many low-dimensional measurements.

  3. (3)

    If \(p=3\), one can check that \(\{Q_j\}_{j=1}^n\) with \(Q^+_{i,j}\), \(Q^-_{i,j}:=e^*_i-e^*_j\), \(i<j\), and \(Q_{i,j,k}:=e^*_i+e^*_j+e^*_k\), \(i<j<k\), allow reconstruction, so that we use \(\left( {\begin{array}{c}d+2\\ 3\end{array}}\right) \) many linear measurements.

  4. (4)

    For \(p=4\), we can choose \(\{Q_j\}_{j=1}^n\) with \(Q^+_{i,j}\), \(Q^-_{i,j}\), \(\tilde{Q}^+_{i,j}:=e^*_i+2e^*_j\), \(i<j\), and \(Q_{i,j,k}\), \(Q^{-}_{i,j,k}:=e^*_i-e^*_j+e^*_k\), \(\tilde{Q}^{-}_{i,j,k}:=e^*_i+e^*_j-e^*_k\), \(i<j<k\), and \(Q_{i,j,k,\ell }:=e^*_i+e^*_j+e^*_k+e^*_\ell \), \(i<j<k<\ell \), allow reconstruction, so that we use \(\left( {\begin{array}{c}d+3\\ 4\end{array}}\right) \) many linear measurements.

Note that the number of linear measurements in Example 1.1 is exactly the dimension of the homogeneous polynomials of degree p in d variables. Similar examples can be derived for more general situations, and the following example deals with \(k=2\):

Example 1.2

  1. (1)

    If \(p=1\), then the choice \(\left( {\begin{array}{c}e_1^*\\ e_2^*\end{array}}\right) \), \(\left( {\begin{array}{c}e_3^*\\ e_4^*\end{array}}\right) ,\ldots \), up to \(\left( {\begin{array}{c}e_{d-1}^*\\ e_{d}^*\end{array}}\right) \), for d even or up to \(\left( {\begin{array}{c}e_{d}^*\\ e_{1}^*\end{array}}\right) \), for d odd, enables us to reconstruct the high-dimensional mean from the lower-dimensional means.

  2. (2)

    For \(p=2\), moment reconstruction works with the \(\left( {\begin{array}{c}d\\ 2\end{array}}\right) \) projectors \(\left( {\begin{array}{c}e_i^*\\ e_j^*\end{array}}\right) \), for \(i<j\).

1.3 Outline and Contribution of this Paper

The present paper is dedicated to go beyond the explicit Examples 1.1 and 1.2, and instead, provide a general strategy for moment reconstruction. Our main contribution is the identification of conditions on the projectors that yield explicit reconstruction formulas. Moreover, such conditions are compatible with numerical schemes, meaning that suitable projectors can be constructed explicitly by minimizing a certain potential function as discussed in Sects. 4 and 5. We also discuss randomized constructions. Our approach stems from applied harmonic analysis and relates to the concept of so-called Grassmannian cubatures, see, for instance, [2, 3].

The remainder of the paper is organized as follows: The condition on the projections and the associated reconstruction formula are formulated in Sect. 2 for random vectors X on the unit sphere \(\mathbb {S}^{d-1}\). In Sect. 3 we deal with \(X\in \mathbb {R}^d\) and either limit us to up to third moments or use rank one projections. Sections 4 and 5 are dedicated to the construction of suitable projectors based on numerical optimization and on a randomization strategy. The results in Sect. 6 imply that suitably randomized constructions can provide approximate moment recovery, with an error bound that holds with overwhelming probability.

2 Main Reconstruction Results for \(X\in \mathbb {S}^{d-1}\)

In this section, we focus on random vectors X with values in the unit sphere \(\mathbb {S}^{d-1}\) of \(\mathbb R^d\). We will rely on some results on cubatures for polynomial spaces on the Grassmannian manifold, see [1,2,3, 12, 13].

2.1 General Moment Reconstruction

We shall make use of the trace inner product \(\langle M_1 ,M_2\rangle :={{\mathrm{tr}}}(M_1M_2)\), for \(M_1,M_2\in \mathscr {H}_d:=\{M\in \mathbb {R}^{d\times d}:M^\top =M\}\). The Grassmann space

$$\begin{aligned} \mathcal {G}_{k,d}:=\{P\in \mathscr {H}_d:P^2=P,\; {{\mathrm{tr}}}(P)=k\} \end{aligned}$$

is the set of rank-k orthogonal projections on \(\mathbb {R}^{d}\). Note that \(\{Q_j\}_{j=1}^n\subset \mathbb {R}^{k\times d}\) with all matrices having full rank \(k<d\), and \(P_j:=Q_j^*(Q_jQ_j^*)^{-1}Q_j\), for \(j=1,\ldots ,n\), yield \(\{P_j\}_{j=1}^n\subset \mathcal {G}_{k,d}\). In place of \(\{Q_j\}_{j=1}^n\) we shall find conditions on \(\{P_j\}_{j=1}^n\) that enable moment reconstruction, i.e., conditions on the respective row-spaces of \(\{Q_j\}_{j=1}^n\).

The orthogonal group \(\mathcal {O}(d)\) acts transitively on \(\mathcal {G}_{k,d}\) by conjugation \(P\mapsto UPU^*\), for \(P\in \mathcal {G}_{k,d}\) and \(U\in \mathcal {O}(d)\). Thus, there is an orthogonally invariant probability measure \(\sigma _{k,d}\) on \(\mathcal {G}_{k,d}\), which is induced by the Haar measure on \(\mathcal {O}(d)\). This measure leads to the trace moments

$$\begin{aligned} \mu _{k,d}(M_1,\ldots ,M_t):=\int _{\mathcal {G}_{k,d}}\langle P,M_1\rangle \cdots \langle P,M_t\rangle d\sigma _{k,d}(P),\qquad \{M_i\}_{i=1}^t\subset \mathscr {H}_d, \end{aligned}$$

which were introduced in [12, 13]. In the present section, we can restrict ourselves to

$$\begin{aligned} \mu ^t_{k,d}(M):=\mu _{k,d}(M,\ldots ,M), \end{aligned}$$

where M occurs t times, and in Sect. 3 we shall make use of the more general case. In the following result, we use the notation \(E_{x,y}:=\frac{1}{2}(xy^*+yx^*)\), for \(x,y\in \mathbb {R}^d\).

Theorem 2.1

For \(\alpha \in \mathbb {N}^d\) with \(|\alpha |=t\), there are \(y^\alpha _{s,i}\in \mathbb {S}^{d-1}\) and coefficients \(f^\alpha _{s,i}\in \mathbb {R}\), such that

$$\begin{aligned} x^\alpha = \sum _{s=1}^t \sum _{i=1}^m f^\alpha _{s,i}\mu ^s_{k,d}(E_{x,y^\alpha _{s,i}}),\quad \text {for all } x\in \mathbb {S}^{d-1}. \end{aligned}$$

Proof

Note that [13, Lemma 7.1] yields

$$\begin{aligned} x_{i_1}\ldots x_{i_t}&= \frac{1}{t!}\sum _{ \emptyset \ne J \subset \{1,\dots ,t\} }(-1)^{t+\# J}\left( \sum _{j\in J} x_{i_j} \right) ^t, \end{aligned}$$
(3)

and \(\sum _{j\in J} x_{i_j}=\sum _{j\in J} \langle x,e_{i_j}\rangle \) leads to

$$\begin{aligned} x_{i_1}\ldots x_{i_t}&=\frac{1}{t!}\sum _{\emptyset \ne J \subset \{1,\dots ,t\}}(-1)^{t+\# J}\left( \left\langle x,\sum _{j\in J} e_{i_j}\right\rangle \right) ^t. \end{aligned}$$
(4)

Thus, it is sufficient to check that each \(\langle x,y\rangle ^t\), for \(x,y\in \mathbb {S}^{d-1}\), can be written as a linear combination of terms \(\mu ^s_{k,d}(E_{x,y})\), \(s=1,\ldots ,t\).

We shall prove the statement by an induction over t. The case \(t=1\) is covered by

$$\begin{aligned} \langle x,y\rangle =\frac{d}{k}\mu _{k,d}^1(E_{x,y}), \end{aligned}$$

see, for instance [3].

To consider general t, we need some preparations. A partition of t is an integer vector \(\pi =(\pi _1,\dots ,\pi _t)\) whose entries are ordered by \(\pi _1\ge \ldots \ge \pi _t\ge 0\) and sum up to \(t=\sum _{i=1}^t\pi _i\). We denote the number of nonzero entries by \(l(\pi )\), and the set of partitions \(\pi \) of t with \(l(\pi )\le d\) is denoted by \(\mathscr {P}_{t,d}\).

According to invariant theory, cf. [25, Theorem 7.1], the expansion

$$\begin{aligned} \mu ^t_{k,d}(M) = \frac{1}{q_{t,d}}\sum _{\pi \in \mathscr {P}_{t,d}} \alpha _\pi {{\mathrm{tr}}}(M^{\pi _1})\cdots {{\mathrm{tr}}}(M^{\pi _{l(\pi )}}) \end{aligned}$$
(5)

holds with suitable real-valued coefficients \(q_{t,d}\) and \(\alpha _\pi \). For \(x,y\in \mathbb {S}^{d-1}\), we observe that \({{\mathrm{tr}}}(E_{x,y}^s)\) is a polynomial of degree s in \(\langle x,y\rangle \) with leading coefficient \((\frac{1}{2})^{s-1}\). The latter yields, for \(x,y\in \mathbb {S}^{d-1}\), that \(\mu ^t_{k,d}(E_{x,y})\) is a polynomial in \(\langle x,y\rangle \) of degree t, i.e.,

$$\begin{aligned} \mu ^t_{k,d}(E_{x,y})&= \frac{1}{q_{t,d}}\sum _{\pi \in \mathscr {P}_t} \alpha _\pi \left( (\frac{1}{2})^{t-l(\pi )}\langle x,y\rangle ^t+\sum _{s=1}^{t-1}c_{\pi ,s}\langle x,y\rangle ^s \right) \\&= \frac{\langle x,y\rangle ^t}{2^t q_{t,d}} \left( \sum _{\pi \in \mathscr {P}_t} \alpha _\pi 2^{l(\pi )}\right) +\sum _{\pi \in \mathscr {P}_t} \sum _{s=1}^{t-1}c_{\pi ,s}\langle x,y\rangle ^s. \end{aligned}$$

One can check that its leading coefficient does not vanish. Indeed, denoting by \(D_2\) a diagonal rank-2 projection matrix, we observe

$$\begin{aligned} \frac{1}{2^t q_{t,d}} \left( \sum _{\pi \in \mathscr {P}_t} \alpha _\pi 2^{l(\pi )}\right) = \mu ^t_{k,d}(D_2)=\int _{\mathcal {G}_{k,d}}\langle P,D_2\rangle ^td\sigma _{k,d}(P) \end{aligned}$$

and the right-hand side is positive since the function \(\langle \cdot ,D_2\rangle ^t\ge 0\) does not vanish entirely on \(\mathcal {G}_{k,d}\). Therefore, we can isolate \(\langle x,y\rangle ^t\) and write it as a linear combination of \(\mu ^t_{k,d}(E_{x,y})\) and terms \(\langle x,y\rangle ^s\), for \(s=1,\ldots ,t-1\). By induction, each term \(\langle x,y\rangle ^s\), \(s=1,\ldots ,t-1\) can be written as a linear combination of terms \(\mu ^s_{k,d}(E_{x,y})\), \(s=1,\ldots ,t-1\), which concludes the proof. \(\square \)

Theorem 2.1 represents monomials by linear combinations of \(\mu ^s_{k,d}(E_{x,y^\alpha _{s,i}})\), for some \(y^\alpha _{s,i}\in \mathbb {S}^{d-1}\). Next, we shall aim to replace the latter with projected monomials. Let us define a function space on \(\mathcal {G}_{k,d}\) by

$$\begin{aligned} {{\mathrm{Pol}}}^\ell _t(\mathcal {G}_{k,d})&:={{\mathrm{span}}}\left\{ \langle M,\cdot \rangle ^s\big |_{\mathcal G_{k,d}} : M\in \mathscr {H}_d ,\;{{\mathrm{rank}}}(M)\le \ell ,\; s\le t \right\} \end{aligned}$$
(6)

and introduce the concept of cubatures on \(\mathcal {G}_{k,d}\).

Definition 2.2

For \(\{P_j\}_{j=1}^n\subset \mathcal {G}_{k,d}\) and \(\{\omega _j\}_{j=1}^n\subset \mathbb {R}\), we say that \(\{(P_j,\omega _j)\}_{j=1}^n\) is a cubature for \({{\mathrm{Pol}}}^{\ell }_{t}(\mathcal {G}_{k,d})\) if

$$\begin{aligned} \sum _{j=1}^n\omega _jf(P_j) =\int _{\mathcal {G}_{k,d}} f(P)d\sigma _{k,d}(P),\quad \text {for all } f\in {{\mathrm{Pol}}}^{\ell }_{t}(\mathcal {G}_{k,d}). \end{aligned}$$

Note that the construction of cubatures for \({{\mathrm{Pol}}}^{\ell }_{t}(\mathcal {G}_{k,d})\) and the properties of the function space \({{\mathrm{Pol}}}^\ell _t(\mathcal {G}_{k,d})\) are discussed in more detail in the Sects. 4 and 5, respectively. We can now formulate our first result on moment reconstruction, which is a direct consequence of Theorem 2.1.

Corollary 2.3

If \(\{(P_j,\omega _j)\}_{j=1}^n\) is a cubature for \({{\mathrm{Pol}}}^2_{t}(\mathcal {G}_{k,d})\), then, for \(\alpha \in \mathbb {N}^d\) with \(|\alpha |=t\), there are coefficients \(a^\alpha _\beta \in \mathbb {R}\), such that, for any random vector \(X\in \mathbb {S}^{d-1}\),

$$\begin{aligned} \mathbb {E}X^\alpha =\sum _{|\beta |\le t} a^\alpha _\beta \sum _{j=1}^n \omega _j\mathbb {E} (P_jX)^\beta . \end{aligned}$$
(7)

Proof

We observe that \(\langle P_j x,y\rangle = \langle P_j,E_{x,y}\rangle \) holds, for all \(x,y\in \mathbb {R}^d\). According to Theorem 2.1 and since \(\langle \cdot ,E_{x,y}\rangle ^t|_{\mathcal {G}_{k,d}}\in {{\mathrm{Pol}}}^{2}_{t}(\mathcal {G}_{k,d})\), the cubature property yields

$$\begin{aligned} \mathbb {E}X^\alpha = \sum _{s=1}^t \sum _{i=1}^m f^\alpha _{s,i}\sum _{j=1}^n\omega _j \mathbb {E}\left\langle P_jX,y^\alpha _{s,i}\right\rangle ^s \end{aligned}$$

and the assertion follows by observing that the terms \(\mathbb {E}\langle P_jX,y^\alpha _{s,i}\rangle ^s\) are linear combinations of moments of order s of \(P_jX\). \(\square \)

Since we are originally given the moments of \(Q_jX\), we must still express \(\mathbb {E}(P_jX)^\beta \), where \(P_j=Q_j^*(Q_jQ_j^*)^{-1}Q_j\) and \(\beta \in \mathbb {N}^d\), \(|\beta |\le t\), by means of moments of \(Q_jX\). If we suppress the index j, we obtain the multilinear relation

$$\begin{aligned} \mathbb {E}(PX)_{i_1}\cdots (PX)_{i_t} =\sum _{j_1,\ldots ,j_t=1}^k \mathbb {E}\left( (QX)_{j_1}\cdots (QX)_{j_t}\right) z_{j_1,i_1}\cdots z_{j_t,i_t}, \end{aligned}$$

where \(z_{i,k}=(Q^*(QQ^*)^{-1})_{i,k}\). Thus, the moments of \(Q_jX\) enable us to apply (7).

2.2 Explicit Moment Reconstruction

This section is dedicated to explicitly compute the expansion in Corollary 2.3 for \(t=1,2,3\) by using a very particular class of polynomial functions. Indeed, zonal polynomials, cf. [9, 16, 18, 19, 24], are special multivariate homogeneous polynomials on \(\mathscr {H}_d\). These polynomials \(C_\pi \) are indexed by all partitions \(\pi \) of \(\mathbb {N}\) and are invariant under orthogonal conjugation. According to [18], see also [12, 13], we obtain

$$\begin{aligned} \mu ^t_{k,d}(M) = \sum _{\pi \in \mathscr {P}_{t,d}}\frac{C_\pi (M)C_\pi (D_k)}{C_\pi (D_k)},\quad \text {for all } M\in \mathscr {H}_d, \end{aligned}$$
(8)

where \(D_k\) is a diagonal matrix in \(\mathbb {R}^{d\times d}\) with k ones and zeros elsewhere. Knowledge of the zonal polynomials enabled us in [13] to compute the trace moments for arbitrarily large t and with explicit expressions for \(t=1,2,3\):

Theorem 2.4

([13]) For all \(d\ge 3\) and \(k<d\), we have

$$\begin{aligned} \mu ^1_{k,d}(M)&=\frac{1}{q_{1,d}}\alpha _{(1)}{{\mathrm{tr}}}(M),\\ \mu ^2_{k,d}(M)&=\frac{1}{q_{2,d}}\left( \alpha _{(1,1)}{{\mathrm{tr}}}^2(M)+\alpha _{(2)}{{\mathrm{tr}}}(M^2)\right) ,\\ \mu ^3_{k,d}(M)&=\frac{1}{q_{3,d}}\left( \alpha _{(1,1,1)}{{\mathrm{tr}}}^3(M)+\alpha _{(2,1)}{{\mathrm{tr}}}(M){{\mathrm{tr}}}(M^2)+\alpha _{(3)}{{\mathrm{tr}}}(M^3)\right) , \end{aligned}$$

holds for all \(M\in \mathscr {H}_d\), where \(q_{1,d}=d\), \(\alpha _{(1)}=k\), and

$$\begin{aligned} q_{2,d}&=(d-1)d(d+2),\\ \alpha _{(1,1)}&=(d+1)k^2-2k\\ \alpha _{(2)}&= 2k(d-k),\\ q_{3,d}&= (d-2)(d-1)d(d+2)(d+4),\\ \alpha _{(1,1,1)}&=(d^2+3d-2)k^3-6(d+2)k^2+16k,\\ \alpha _{(2,1)}&=-6(d+2)k^3 + 6(d^2+2d+4) k^2-24dk ,\\ \alpha _{(3)}&= 16k^3-24dk^2+8d^2k. \end{aligned}$$

For \(d=2\) and \(k=1\) in Theorem 2.4, the constant \(q_{3,d}\) would be zero, but so are \(\alpha _{(1,1,1)}\), \(\alpha _{(2,1)}\), and \(\alpha _{(3)}\). The identity for \(\mu ^3_{1,2}(M)\) still holds with the modified coefficients

$$\begin{aligned} q_{3,2}= 48,\quad \alpha _{(1,1,1)}=1,\quad \alpha _{(2,1)}=6,\quad \alpha _{(3)}=8. \end{aligned}$$

Theorem 2.4 and the proof of Theorem 2.1 lead to the following explicit moment recovery formulas.

Corollary 2.5

Let \(X\in \mathbb {S}^{d-1}\) be a random vector with \(d \ge 3\).

  1. (i)

    If \(\{(P_j,\omega _j)\}_{j=1}^n\) is a cubature for \({{\mathrm{Pol}}}^2_{1}(\mathcal {G}_{k,d})\), then, for \(i=1,\ldots ,d\),

    $$\begin{aligned} \mathbb {E}X_i=A_1\sum _{j=1}^n\omega _j \mathbb {E}(P_jX)_i, \qquad \text {where } A_1=\frac{d}{k}. \end{aligned}$$
    (9)
  2. (ii)

    If \(\{(P_j,\omega _j)\}_{j=1}^n\) is a cubature for \({{\mathrm{Pol}}}^2_{2}(\mathcal {G}_{k,d})\), then (9) holds and, for \(i,\ell =1,\ldots ,d\),

    $$\begin{aligned} \mathbb {E}X_iX_\ell =A_2 \sum _{j=1}^n\omega _j \mathbb {E}(P_jX)_i (P_jX)_\ell -B_2\frac{k}{d}\delta _{i,\ell }, \end{aligned}$$
    (10)

    where

    $$\begin{aligned} A_2=\frac{(d-1)d(d+2)}{k(dk+d-2)},\qquad B_2=\frac{(d-k)d}{kd(k+1)-2k}. \end{aligned}$$
  3. (iii)

    If \(\{(P_j,\omega _j)\}_{j=1}^n\) is a cubature for \({{\mathrm{Pol}}}^{2}_{3}(\mathcal {G}_{k,d})\), then (9) and (10) hold and, for \(i,\ell ,m=1,\ldots ,d\),

    $$\begin{aligned} \mathbb {E}X_iX_\ell X_m&=A_3\sum _{j=1}^n\omega _j \mathbb {E}(P_jX)_i(P_jX)_\ell (P_jX)_m\\&\quad -\frac{B_3}{3}\sum _{j=1}^n\omega _j \left( \mathbb {E}(P_jX)_i\delta _{\ell ,m}+\mathbb {E}(P_jX)_\ell \delta _{i,m}+\mathbb {E}(P_jX)_m\delta _{i,\ell }\right) \frac{k}{d}, \end{aligned}$$

    where

    $$\begin{aligned} A_3&= \frac{(d-2)(d-1)d(d+2)(d+4)}{k(d^2k^2+3d^2k+2d^2-6dk-12d-4k^2)}, \\ B_3&= \frac{3d^2(d^2k + 2d^2 - 5dk - 4d + 4k^2 + 2k)}{k^2(d^2k^2 + 3d^2k + 2d^2 + 3dk^2 - 9dk - 12d + 2k^2 - 6k + 16)} . \end{aligned}$$

Remark 2.6

Note that (9) is proved by monomial identities, so that it also holds when the expectation is eliminated on both sides.

Proof of Corollary 2.5

For \(x,y\in \mathbb {R}^d\), we obtain

$$\begin{aligned} {{\mathrm{tr}}}(E_{x,y})&= \langle x,y\rangle ,\\ {{\mathrm{tr}}}(E_{x,y}^2)&=\frac{1}{2}(\langle x,y\rangle ^2+\Vert x\Vert ^2\Vert y\Vert ^2),\\ {{\mathrm{tr}}}(E_{x,y}^3)&=\frac{1}{4}(\langle x,y\rangle ^3+3\langle x,y\rangle \Vert x\Vert ^2\Vert y\Vert ^2), \end{aligned}$$

so that Theorem 2.4 implies, for all \(d\ge 3\) and \(x,y\in \mathbb {R}^d\),

$$\begin{aligned} \mu ^1_{k,d}(E_{x,y})&= \frac{\alpha _{(1)}}{q_{1,d}}\langle x,y\rangle , \end{aligned}$$
(11)
$$\begin{aligned} \mu ^2_{k,d}(E_{x,y})&= \frac{2\alpha _{(1,1)}+\alpha _{(2)}}{2q_{2,d}}\langle x,y\rangle ^2+\frac{\alpha _{(2)}}{2q_{2,d}}\Vert x\Vert ^2\Vert y\Vert ^2, \end{aligned}$$
(12)
$$\begin{aligned} \mu ^3_{k,d}(E_{x,y})&= \frac{4\alpha _{(1,1,1)}+2\alpha _{(2,1)}+\alpha _{(3)}}{4q_{3,d}}\langle x,y\rangle ^3+\frac{2\alpha _{(2,1)}+\alpha _{(3)}}{4q_{3,d}}\langle x,y\rangle \Vert x\Vert ^2\Vert y\Vert ^2,\qquad \qquad \nonumber \\ \end{aligned}$$
(13)

where the constants \(q_{1,d},q_{2,d},q_{3,d}\) and \(\alpha _{(1)},\alpha _{(2)},\alpha _{(3)},\alpha _{(1,1,1)},\alpha _{(2,1)}\) are specified in Theorem 2.4.

Suppose now that \(\{(P_j,\omega _j)\}_{j=1}^n\) is a cubature of \({{\mathrm{Pol}}}^{2}_{t}(\mathcal {G}_{k,d})\), so that we obtain

$$\begin{aligned} \mu _{k,d}^t(E_{x,y})&=\sum _{j=1}^n\omega _j \langle P_{j},E_{x,y}\rangle ^t = \sum _{j=1}^n\omega _j \langle P_{j}x,y\rangle ^t, \end{aligned}$$

and the left-hand sides in (11), (12), and (13) can be replaced with \(\sum _{j=1}^n\omega _j \langle P_{j}x,y\rangle ^t\), for \(t=1,2,3\), respectively. In the following, we assume \(x\in \mathbb {S}^{d-1}\) and \(y \in \mathbb R^d\). Rearranging terms leads to the following formulas, respectively, and \(A_1,A_2,A_3\), and \(B_2,B_3\) are as in Corollary 2.5. If \(\{(P_j,\omega _j)\}_{j=1}^n\) is a cubature for \({{\mathrm{Pol}}}^2_{1}(\mathcal {G}_{k,d})\), then

$$\begin{aligned} \langle x,y\rangle =A_1\sum _{j=1}^n\omega _j \langle P_{j}x,y\rangle . \end{aligned}$$
(14)

If \(\{(P_j,\omega _j)\}_{j=1}^n\) is a cubature for \({{\mathrm{Pol}}}^2_{2}(\mathcal {G}_{k,d})\), then (14) holds and

$$\begin{aligned} \langle x,y\rangle ^2=A_2 \sum _{j=1}^n\omega _j \langle P_{j}x,y\rangle ^2-B_2\Vert y\Vert ^2\frac{k}{d}. \end{aligned}$$
(15)

If \(\{(P_j,\omega _j)\}_{j=1}^n\) is a cubature for \({{\mathrm{Pol}}}^{2}_{3}(\mathcal {G}_{k,d})\), then (14), (15) hold, so that

$$\begin{aligned} \langle x,y\rangle ^3= A_3\sum _{j=1}^n\omega _j \langle P_{j}x,y\rangle ^3 - B_3\Vert y\Vert ^2\sum _{j=1}^n\omega _j \langle P_{j}x,y\rangle \frac{k}{d}. \end{aligned}$$
(16)

As at the beginning of the proof of Theorem 2.1, [13, Lemma 7.1] yields

$$\begin{aligned} x_{i_1}\cdots x_{i_t}&=\frac{1}{t!}\sum _{J\subset \{1,\ldots ,t\}}(-1)^{t+\# J}\left( \left\langle x,\sum _{j\in J} e_{i_j}\right\rangle \right) ^t. \end{aligned}$$

In order to compute the term \(x_{i_1}\ldots x_{i_t}\), we can repeatedly apply (14), (15), (16), respectively, with \(y=\sum _{j\in J} e_{i_j}\). Such rearrangements yield the formulas and constants in Corollary 2.5. \(\square \)

Remark 2.7

The framework that we present in the present paper also allows the explicit computations of higher-order moments beyond \(t=1,2,3\). Indeed, if \(\{(P_j,\omega _j)\}_{j=1}^n\) is a cubature for \({{\mathrm{Pol}}}^2_{t}(\mathcal {G}_{k,d})\), then we can compute all moments of order t by using the zonal polynomials. However, we do not have one closed formula incorporating t as a variable, but we need to compute the expressions for each t separately. Note that the formulas in Theorem 2.5 are merely based on identities between the corresponding polynomials in the entries of a unit vector \(x\in \mathbb R^d\).

3 Moment Fusion for \(X\in \mathbb {R}^d\)

3.1 The General Case for up to Cubic Moments

A homogeneity argument yields that (9) even holds for random \(X\in \mathbb {R}^d\). Analogously, considering (10) as a monomial identity with \(B_2\frac{k}{d}\delta _{i,\ell }=B_2\frac{k}{d}\Vert x\Vert ^2\delta _{i,\ell }\), for \(x\in \mathbb {S}^{d-1}\), a homogeneity argument yields that, for random \(X\in \mathbb {R}^d\),

$$\begin{aligned} \mathbb {E}X_iX_\ell =A_2 \sum _{j=1}^n\omega _j \mathbb {E}(P_jX)_i (P_jX)_\ell -B_2\sum _{r=1}^d\sum _{j=1}^n\omega _j \mathbb {E}(P_jX)^2_r\delta _{i,\ell }, \end{aligned}$$

provided that \(\{(P_j,\omega _j)\}_{j=1}^n\) is a cubature for \({{\mathrm{Pol}}}^2_{2}(\mathcal {G}_{k,d})\).

In order to deal with \(X\in \mathbb {R}^d\) for \(t=3\) as well, we observe that the formulas in (11), (12), and (13) hold in more generality, see [13, Theorem 7.3],

$$\begin{aligned} \mu _{k,d}(X_1)&=\frac{1}{q_{1,d}}\alpha _{(1)}{{\mathrm{tr}}}(X_1),\\ \mu _{k,d}(X_1,X_2)&= \frac{1}{q_{2,d}}\left( \alpha _{(1,1)}{{\mathrm{tr}}}(X_1){{\mathrm{tr}}}(X_2)+\alpha _{(2)}{{\mathrm{tr}}}(X_1X_2)\right) ,\\ \mu _{k,d}(X_1,X_2,X_3)&=\frac{1}{q_{3,d}}\left( \alpha _{(1,1,1)}{{\mathrm{tr}}}(X_1){{\mathrm{tr}}}(X_2){{\mathrm{tr}}}(X_3)\right. \\&\left. \quad \quad \!\! +\frac{\alpha _{(2,1)}}{3}({{\mathrm{tr}}}(X_1){{\mathrm{tr}}}(X_2X_3)+{{\mathrm{tr}}}(X_2){{\mathrm{tr}}}(X_1X_3)+{{\mathrm{tr}}}(X_3){{\mathrm{tr}}}(X_1X_2))\right. \\&\left. \quad \quad \!\! +\alpha _{(3)} {{\mathrm{tr}}}(X_1X_2X_3) \right) . \end{aligned}$$

For \(x\in \mathbb {R}^d\) and \(y\in \mathbb {S}^{d-1}\), using the above relation gives

$$\begin{aligned} \mu _{k,d}(E_{x,y},xx^*) = \frac{\alpha _{(1,1)} +\alpha _{(2)}}{q_{2,d}} \langle x, y \rangle \Vert x\Vert ^2 = \frac{k(k+2)}{d(d+2)} \langle x, y \rangle \Vert x\Vert ^2 \end{aligned}$$

and combined with identity (13), we obtain

$$\begin{aligned} \langle x,y\rangle ^3 = C_{3,d}^{(1)} \mu ^3_{k,d}(E_{x,y}) - C_{3,d}^{(2)} \mu _{k,d}(E_{x,y},xx^*) \end{aligned}$$
(17)

with \(C_{3,d}^{(1)} = \frac{4q_{3,d}}{4\alpha _{(1,1,1)}+2\alpha _{(2,1)}+\alpha _{(3)}}\) and \(C_{3,d}^{(2)} = \frac{2\alpha _{(2,1)}+\alpha _{(3)}}{(4\alpha _{(1,1,1)}+2\alpha _{(2,1)}+\alpha _{(3)}) \frac{k(k+2)}{d(d+2)} }\). Indeed, one can check that the term \(4\alpha _{(1,1,1)}+2\alpha _{(2,1)}+\alpha _{(3)}\) is nonzero. If \(\{(P_j,\omega _j)\}_{j=1}^n\) is a cubature for \({{\mathrm{Pol}}}_3^2(\mathcal {G}_{k,d})\), then we can apply

$$\begin{aligned} \mu ^3_{k,d}(E_{x,y}) = \sum _{j=1}^n\omega _j \langle P_jx,y\rangle ^3, \end{aligned}$$
(18)

because the mapping \(P\mapsto \langle P,E_{x,y}\rangle ^3\) is contained in \({{\mathrm{Pol}}}_3^2(\mathcal {G}_{k,d})\). In Proposition 5.1 we shall check that \(P\mapsto \langle P, E_{x,y}\rangle \langle P, xx^*\rangle \) is also contained in \({{\mathrm{Pol}}}_3^2(\mathcal {G}_{k,d})\), so that also

$$\begin{aligned} \mu _{k,d}(E_{x,y},xx^*) = \sum _{j=1}^n\sum _{m=1}^d \omega _j \langle P_jx,y\rangle (P_jx)_m^2 \end{aligned}$$
(19)

holds. The actual moments of order 3 can now be computed from (4) by observing that \(\langle P_jx,y\rangle \) yields a linear combination of the terms \((P_jx)_1,\ldots ,(P_jx)_d\).

We now collect the resulting expressions for all third moments:

Corollary 3.1

Let \(X \in \mathbb R^d\) be a random vector with \(d\ge 3\), let the constants \(A_1\), \(A_2\) and \(B_2\) be as above, and let \(i,h,l \in \{1, 2, \dots , d\}\) with \(i \ne h \ne l \ne i\).

  1. (i)

    If \(\{(P_j,\omega _j)\}_{j=1}^n\) is a cubature for \({{\mathrm{Pol}}}^2_{1}(\mathcal {G}_{k,d})\), then

    $$\begin{aligned} \mathbb {E}X_i=A_1\sum _{j=1}^n\omega _j \mathbb {E}(P_jX)_i. \end{aligned}$$
    (20)
  2. (ii)

    If \(\{(P_j,\omega _j)\}_{j=1}^n\) is a cubature for \({{\mathrm{Pol}}}^2_{2}(\mathcal {G}_{k,d})\), then (20) holds and

    $$\begin{aligned} \mathbb {E}X^2_i&= A_2 \sum _{j=1}^n\omega _j \mathbb {E}(P_jX)^2_i -B_2\sum _{j=1}^n\sum _{m=1}^d\omega _j \mathbb {E}(P_jX)^2_m, \end{aligned}$$
    (21)
    $$\begin{aligned} \mathbb {E}X_iX_\ell&= A_2 \sum _{j=1}^n\omega _j \mathbb {E}(P_jX)_i (P_jX)_\ell . \end{aligned}$$
    (22)
  3. (iii)

    If \(\{(P_j,\omega _j)\}_{j=1}^n\) is a cubature for \({{\mathrm{Pol}}}^{2}_{3}(\mathcal {G}_{k,d})\), then (20), (21), (22) hold and

    $$\begin{aligned} \mathbb E X_i^3&= C_{3,d}^{(1)} \sum _{j=1}^n \omega _j \mathbb E (P_j X)_i^3 - C_{3,d}^{(2)} \sum _{j=1}^n \sum _{m=1}^d \omega _j \mathbb E (P_jX)_i (P_jX)_m^2 , \end{aligned}$$
    (23)
    $$\begin{aligned} \mathbb E X_i^2 X_h =&\, C_{3,d}^{(1)} \sum _{j=1}^n \omega _j \mathbb E (P_j X)_i^2 (P_jX_h) - \frac{1}{3} C_{3,d}^{(2)} \sum _{j=1}^n \sum _{m=1}^d \omega _j \mathbb E (P_j X)_h (P_j X)_m^2 , \end{aligned}$$
    (24)
    $$\begin{aligned} \mathbb E X_i X_h X_\ell&= C_{3,d}^{(1)} \sum _{j=1}^n \omega _j \mathbb E (P_j X)_i (P_j X)_h (P_j X)_\ell . \end{aligned}$$
    (25)

Proof

The first and second moments have already been discussed prior to the statement of the corollary. For the third moments, the expression for \(\mathbb E X_i^3\) results from choosing \(y=e_i\) in (17), (18), and (19), which yields (23).

Next, we address (24). The choices \(y_+ = \frac{1}{\sqrt{2}} (e_i + e_h)\) and \(y_- = \frac{1}{\sqrt{2}}(e_i + e_h)\) yield

$$\begin{aligned} x_i^2 x_h&= \frac{\sqrt{2}}{ 3 } [ \langle x , y_+ \rangle ^3 - \langle x, y_-\rangle ^3] - \frac{1}{3} \langle x, e_h \rangle ^3 . \end{aligned}$$

Applying (17), (18), and (19) leads to

$$\begin{aligned} \mathbb E X_i^2 X_h =&\, C_{3,d}^{(1)} \sum _{j=1}^n \omega _j ( (P_jX)_i^2 (P_j X)_h + \frac{1}{3} (P_j X)_h^3 )\\&- \frac{2}{3} C_{3,d}^{(2)} \sum _{j=1}^n \sum _{m=1}^d \omega _j (P_jX)_h (P_jX)_m^2 - \frac{1}{3} \mathbb E X_h^3 \end{aligned}$$

and inserting the expression for \(\mathbb E X_h^3\) then reduces this to (24).

Finally, for \(\mathbb E X_i X_h X_\ell \), we observe

$$\begin{aligned} x_i x_h x_\ell = \frac{1}{24} ( (x_i + x_h +x_\ell )^3 + (x_i - x_h -x_\ell )^3 - (x_i + x_h -x_\ell )^3 - (x_i - x_h +x_\ell )^3 ). \end{aligned}$$

By using \(y_{+++} = \frac{1}{\sqrt{3}} (e_i + e_h + e_\ell )\), \(y_{+--} = \frac{1}{\sqrt{3}} (e_i - e_h - e_\ell )\), \(y_{--+} = - \frac{1}{\sqrt{3}} (e_{i}+e_{j}-e_\ell )\), and \(y_{-+-} = -\frac{1}{\sqrt{3}} (e_i - e_h + e_\ell )\), we obtain

$$\begin{aligned} \mathbb E X_i X_h X_\ell&= \frac{ \sqrt{3}}{8} \mathbb E[ \langle X, y_{+++}\rangle ^3 + \langle X, y_{+--}\rangle ^3 + \langle X, y_{--+}\rangle ^3 + \langle X, y_{-+-}\rangle ^3 ], \end{aligned}$$

and a calculation using (17), (18), and (19) leads to (25). \(\square \)

3.2 All Moments from Projections onto One-Dimensional Subspaces

In the previous section, we have outlined the recovery of the moments for \(t=1,2,3\) and general k. To address all moments \(t>3\), we now restrict us to \(k=1\).

Let us denote the permutation group of \(\{1,\ldots ,t\}\) by \(S_t\). We say a permutation \(s\in S_t\) is associated to a partition \(\pi \) and denote this by \(s\sim \pi \) if there is a set of cycles \(\{c_i\}_{i=1}^m\) such that \(s=(c_1)\cdots (c_m)\) and the cardinality of \(c_i\) equals \(\pi _i\), for \(i=1,\ldots ,m\). Note that we also use the standard notation \(c_i\in s\) for a cycle \(c_i\) occurring in s. For \(\{M_i\}_{i=1}^t\subset \mathscr {H}_d\), we use a cycle index \(M_{c_i}:=M_{c_{i,1}}\cdots M_{c_{i,\ell _i}}\), where \(c_i=(c_{i,1}\ldots c_{i,\ell _i})\).

To clarify notation, we provide a simple example.

Example 3.2

For \(t=4\), let the permutation s be given by \(\begin{pmatrix} 1&{}2&{}3&{}4\\ 3&{}1&{}2&{}4 \end{pmatrix} \), and suppose we have a set of matrices \(\{M_i\}_{i=1}^4\), then s has the cyclic representation \((c_1)(c_2)=(1 3 2)(4)\) and is associated to the partition (3, 1, 0, 0). This implies \(M_{c_1}=M_1M_3M_2\) and \(M_{c_2}=M_4\).

Due to the orthogonal invariance of the Haar measure, the Grassmannian trace moments are invariant under the orthogonal group \(\mathcal {O}_d\), i.e.,

$$\begin{aligned} \mu _{k,d}(UM_1,\dots , UM_t) = \mu _{k,d}(M_1,\dots , M_t),\quad \text {for all } U\in \mathcal {O}_d. \end{aligned}$$

A general result in invariant theory, cf. [25], and the invariance of \(\mu _{k,d}\) under permutations yield

$$\begin{aligned} \mu _{k,d}(M_1,\dots , M_t)= \sum _{\pi \in \mathscr {P}_t} \alpha _\pi \sum _{\begin{array}{c} s\in S_t\\ s\sim \pi \end{array}} \prod _{c\in s} {{\mathrm{tr}}}(M_c), \end{aligned}$$
(26)

where \(\alpha _\pi \in \mathbb {R}\), see also (5) for \(M_1=\ldots =M_t\).

Proposition 3.3

For \(d \ge t\), \(t \in \mathbb {N}_{0}\), and provided that \(k=1\), the expansion (26) of the trace moments possesses only positive coefficients \(\alpha _{\pi }\), \(\pi \in \mathscr {P}_{t}\).

Proof

For any fixed permutation \(\sigma :\{1,\dots ,t\}\rightarrow \{1,\dots ,t\}\), we consider the matrices

$$\begin{aligned} M_{i} = {\left\{ \begin{array}{ll} e_{i}e_{\sigma (i)}^{*} + e_{\sigma (i) }e_{i}^{*},&{} i\ne \sigma (i) \\ e_{i}e_{i}^{*},&{} i = \sigma (i), \end{array}\right. }, \qquad i=1,\dots ,t, \end{aligned}$$

where \(\{e_{i}\}_{i=1}^d \in \mathbb {R}^{d}\) is the standard basis.

Now, let \(s \in S_{t}\) be another arbitrary permutation with some cycle \(c \in s\). We denote the cardinality of c by l. From

$$\begin{aligned} {{\mathrm{tr}}}(M_{c}) = {{\mathrm{tr}}}( M_{c_{1}} \cdots M_{c_{l}} ) = \sum _{k_{1},\dots ,k_{l}=1}^{d} (M_{c_{1}})_{k_{1},k_{2}} (M_{c_{2}})_{k_{2},k_{3}} \cdots (M_{c_{l-1}})_{k_{l-1},k_{l}}(M_{c_{l}})_{k_{l},k_{1}} \end{aligned}$$

we conclude by the definition of \(M_{i}\) that the indices \(k_{i}\) contribute to the sum if and only if \(k_{i}, k_{i+1\, \mathrm {mod}\, l} \in \{ c_{i}, \sigma (c_{i}) \}\) for all \(i=1,\dots ,l\). Equivalently, it must hold that \(k_{i} \in \{ c_{i}, \sigma (c_{i}) \} \cap \{ c_{i-1\,\mathrm {mod}\,l}, \sigma (c_{i-1\,\mathrm {mod}\,l}) \}\), \(i=1,\dots ,l\). Since \(c_{i} \ne c_{j}\) and \(\sigma (c_{i}) \ne \sigma (c_{j})\) for \(i \ne j\), this can only happen if \(c_{i} = \sigma (c_{i-1\,\mathrm {mod}\,l})\) for all \(i=1,\dots ,l\) or \(c_{i-1\,\mathrm {mod}\,l} = \sigma (c_{i})\) for all \(i=1,\dots ,l\). Hence, the trace of \(M_{c}\) vanishes if and only if neither the cycle c nor its inverse \(c^{-1}\) are contained in \(\sigma \). More precisely,

$$\begin{aligned} {{\mathrm{tr}}}(M_{c}) = {\left\{ \begin{array}{ll} 1, &{} c \in \sigma \text { or } c^{-1} \in \sigma ,\\ 0, &{} \text {else}. \end{array}\right. } \end{aligned}$$

Using these observations we obtain

$$\begin{aligned} \begin{aligned} \mu _{1,d}(M_{1},\dots ,M_{t})&= \sum _{\pi \in \mathscr {P}_t} \alpha _\pi \sum _{\begin{array}{c} s\in S_t\\ s\sim \pi \end{array}} \prod _{c\in s} {{\mathrm{tr}}}(M_c) \\&= \alpha _{\pi _\sigma } \#\{ s \in S_{t} : c \text { or } c^{-1} \in \sigma , \forall c\in s\}, \end{aligned} \end{aligned}$$

where \(\pi _\sigma \) is the partition associated to \(\sigma \). Hence, \(\pi _\sigma \) is a fraction of the trace moment \(\mu _{1,d}(M_{1},\dots ,M_{t})\). It remains to verify that the latter is positive.

Together with the definition of the trace moments \(\mu _{1,d}(M_{1},\dots ,M_{t})\) and those of \(M_{i}\) we arrive at

$$\begin{aligned} \mu _{1,d}(M_{1},\dots ,M_{t})&= \int _{\mathcal O_{d}} \prod _{i=1}^{t} \langle O D_{1} O^{*}, M_{i} \rangle d O \\&= \int _{\mathcal O_{d}} \prod _{i=1}^{t} \langle O e_{1} (O e_{1})^{*}, M_{i} \rangle d O \\&= \int _{\mathcal O_{d}} \prod _{i=1}^{t} 2^{\#\{i,\sigma (i)\}} O_{1,i} O_{1,\sigma (i)} d O \\&= \int _{\mathcal O_{d}} \Big (\prod _{i=1}^{t} 2^{\#\{i,\sigma (i)\}} O_{1,i}\Big ) \Big (\prod _{i=1}^{t} O_{1,\sigma (i)} \Big )d O. \end{aligned}$$

Since \(\sigma \) is a permutation, we obtain

$$\begin{aligned} \mu _{1,d}(M_{1},\dots ,M_{t})&= \int _{\mathcal O_{d}} \Big (\prod _{i=1}^{t} 2^{\#\{i,\sigma (i)\}} O_{1,i}\Big ) \Big (\prod _{i=1}^{t} O_{1,i} \Big )d O\\&=\int _{\mathcal O_{d}} \prod _{i=1}^{t} 2^{\#\{i,\sigma (i)\}} (O_{1,i})^{2} d O > 0 \end{aligned}$$

and the assertion follows. \(\square \)

Proposition 3.4

For fixed \(m,\ell \in \mathbb {N}_{0}\) and \(d \in \mathbb {N}\) with \(d \ge m + \ell \) there are coefficients \(\{a_i\}_{i=0}^{\lfloor m/2\rfloor }\in \mathbb {R}\) such that, for all \(x\in \mathbb {R}^d\), \(y \in \mathbb {S}^{d-1}\), it holds

$$\begin{aligned} \langle x,y\rangle ^m\Vert x\Vert ^{2\ell } = \sum _{i=0}^{\lfloor m/2\rfloor } a_i \mu ^{(m-2i,\ell +i)}_{1,d}(E_{x,y},xx^*). \end{aligned}$$
(27)

Proof

Let us first note that by the identity (26) the trace moments \(\mu ^{(m,\ell )}_{1,d}(E_{x,y},xx^{*})\), \(m,\ell \in \mathbb {N}_{0}\), can be written as polynomials in \(\langle x, y \rangle \), \(\Vert x\Vert ^{2}\) and \(\Vert y\Vert ^{2}\). Hence, together with the homogeneity in x and y we infer the representation

$$\begin{aligned} \begin{aligned} \mu ^{(m,\ell )}_{1,d}(E_{x,y},xx^*) = \sum _{i=0}^{\lfloor m/2 \rfloor }b_i^{(m,\ell )}\langle x,y\rangle ^{m-2i}\Vert x\Vert ^{2(i+\ell )},\qquad x \in \mathbb {R}^{d}, y \in \mathbb {S}^{d-1}, \end{aligned} \end{aligned}$$
(28)

for some coefficients \(b_{i}^{(m,\ell )} \in \mathbb {R}\). Moreover, we have

$$\begin{aligned} b_{0}^{(m,\ell )} > 0, \qquad d \ge m + \ell , \end{aligned}$$
(29)

which follows from Proposition 3.3 and the fact that the coefficients of \(\langle x,y \rangle ^{m} \Vert x \Vert ^{2\ell }\) in any term of the form

$$\begin{aligned} \prod _{i=1}^{r}{{\mathrm{tr}}}\big (\prod _{j=1}^{s_{i}} (E_{x,y})^{m_{i,j}} (xx^{*})^{\ell _{i,j}}\big ), \quad \text { with } \quad \sum _{i=1}^{r}\sum _{j=1}^{s_{i}} m_{i,j} = m, \quad \sum _{i=1}^{r} \sum _{j=1}^{s_{i}} \ell _{i,j} = \ell \end{aligned}$$

are positive.

Now, the statement (3.4) will follow by induction over m. Therefore, let \(m \ge 2\), \(\ell \in \mathbb {N}_{0}\) with \(d \ge m + \ell \) be given and assume that the statement (3.4) holds for all \(m' = m-2i\), \(\ell '=\ell + i\), \(i=1,\dots ,\lfloor m/2 \rfloor \). Using equation (28) we obtain

$$\begin{aligned} \mu ^{(m,\ell )}_{1,d}(E_{x,y},xx^*) = b_{0}^{(m,\ell )} \langle x,y\rangle ^{m} \Vert x\Vert ^{2 \ell } + \sum _{i=1}^{\lfloor m/2 \rfloor }b_i^{(m,\ell )}\langle x,y\rangle ^{m-2i}\Vert x\Vert ^{2(\ell +i)}. \end{aligned}$$

Since \(d \ge m + \ell > m' + \ell ' = m + \ell - i\), \(i=1,\dots ,\lfloor m/2 \rfloor \), we can expand the sum on the right-hand side by the induction hypothesis into trace moments \(\mu _{1,d}^{(m-2i,\ell +i)}(E_{x,y},xx^*)\), \(i=1,\dots , \lfloor m/2 \rfloor \). Hence, using \(b^{(m,\ell )}_{0} \ne 0\), see (29), we can rearrange terms and arrive at the statement (27). It remains to show the induction base with the cases \(m \in \{0,1\}\), \(\ell \in \mathbb {N}_{0}\), \(d \ge m + \ell \).

For \(m=0\), \(\ell \in \mathbb {N}_{0}\) and \(d \in \mathbb {N}\) we observe by orthogonal invariance

$$\begin{aligned} \mu ^{(0,\ell )}_{1,d}(E_{x,y},xx^*) = \mu ^{\ell }_{1,d} (xx^{*})= \mu ^{\ell }_{1,d}(D_{1})\Vert x\Vert ^{2\ell } ,\qquad x\in \mathbb {R}^d. \end{aligned}$$

The term \(\mu ^{\ell }_{1,d}(D_{1})\) is positive and has been explicitly computed in [3]:

$$\begin{aligned} \mu ^{\ell }_{1,d}(D_1)=\frac{(1/2)_\ell }{(d/2)_\ell },\qquad (a)_\ell :=a(a+\ell )\cdots (a+\ell -1). \end{aligned}$$

Hence, the assertion follows for \(m=0\), \(\ell \in \mathbb {N}_{0}\).

For \(m=1\), \(\ell \in \mathbb {N}_{0}\) and \(d \in \mathbb {N}\) we find by (26) that

$$\begin{aligned} \mu ^{(1,\ell )}_{1,d}(E_{x,y},xx^*) = \langle x,y\rangle \Vert x\Vert ^{2\ell } \sum _{\pi \in \mathscr {P}_{\ell +1}} \alpha _\pi \sum _{\begin{array}{c} s\in S_{\ell + 1}\\ s\sim \pi \end{array}} 1, \qquad x\in \mathbb {R}^d,\quad y \in \mathbb {S}^{d-1}. \end{aligned}$$

Moreover, we can check that the coefficient of \(\langle x,y\rangle \Vert x\Vert ^{2\ell } \) is nonzero by observing

$$\begin{aligned} \sum _{\pi \in \mathscr {P}_{t}} \alpha _\pi \sum _{\begin{array}{c} s\in S_{\ell +1}\\ s\sim \pi \end{array}} 1 = \mu ^{\ell +1}_{1,d}(I_1)>0, \end{aligned}$$

which concludes the proof. \(\square \)

Corollary 3.5

If \(\{(P_j,\omega _j)\}_{j=1}^n\) is a cubature for \({{\mathrm{Pol}}}^3_{t}(\mathcal {G}_{1,d})\), then, for \(\alpha \in \mathbb {N}^d\) with \(|\alpha |=t\le d\), there are coefficients \(a^\alpha _\beta \in \mathbb {R}\), such that, for any random vector \(X\in \mathbb {R}^d\),

$$\begin{aligned} \mathbb {E}X^\alpha =\sum _{|\beta |= t} a^\alpha _\beta \sum _{j=1}^n \omega _j\mathbb {E} (P_jX)^\beta . \end{aligned}$$
(30)

Proof

For any \(i=0,\ldots ,\lfloor t/2\rfloor \), the function \(F:\mathcal {G}_{1,d}\rightarrow \mathbb {R}\) defined by \(P\mapsto \langle P,E_{x,y}\rangle ^{t-2i}\langle P,xx^*\rangle ^{2i}\) is contained in \({{\mathrm{Pol}}}^3_{t}(\mathcal {G}_{1,d})\), see part two of Proposition 4.1 in the subsequent Sect. 4 for details. Thus, the cubature property yields

$$\begin{aligned} \mu ^{(t-2i,i)}_{1,d}(E_{x,y_j},xx^*)&=\sum _{j=1}^n \omega _j \langle P_j,E_{x,y}\rangle ^{t-2i} \langle P_j,xx^*\rangle ^{i}\\&= \sum _{j=1}^n \omega _j \langle P_jx,y\rangle ^{t-2i} \Vert P_jx\Vert ^{2i}. \end{aligned}$$

Note that \(\langle P_jx,y\rangle ^{t-2i}\) and \(\Vert P_jx\Vert ^{2i}\) are linear combinations of monomials in \(P_jx\) of degree \(t-2i\) and 2i, respectively. Hence, their product is a linear combination of monomials in \(P_jx\) of degree t. Applying (4) and invoking Proposition 3.4 for \(\ell =0\) concludes the proof. \(\square \)

4 Constructing Cubatures for \({{\mathrm{Pol}}}^{2}_{t}(\mathcal {G}_{k,d})\)

In this section, we shall derive a general framework for the construction of cubatures for \({{\mathrm{Pol}}}^{2}_{t}(\mathcal {G}_{k,d})\) that are needed to apply our results in Theorem 2.5. For general existence results of cubatures, we refer to [11], and explicit group theoretical constructions are provided in [7]. In the following, we shall discuss random constructions as well as deterministic constructions based on the solution of an optimization problem.

4.1 Random Construction

For \(n,m\ge \dim ({{{\mathrm{Pol}}}}^\ell _{t}(\mathcal {G}_{k,d}))\), it follows from classical arguments that there are \(\{M_i\}_{i=1}^m\subset \mathscr {H}^\ell _d\) and \(\{P_j\}_{j=1}^n\subset \mathcal {G}_{k,d}\) such that the matrix

$$\begin{aligned} (\langle M_i,P_j\rangle ^t)_{\begin{array}{c} i=1\ldots ,m\\ j=1,\ldots ,n \end{array}} \end{aligned}$$
(31)

has rank \(\dim ({{{\mathrm{Pol}}}}^\ell _{t}(\mathcal {G}_{k,d}))\). We can now compute weights \(\omega :=\{\omega \}_{j=1}^n\) by solving the linear system of equations

$$\begin{aligned} \sum _{j=1}^n\langle M_i,P_j\rangle ^t \omega _j = \mu _{k,d}^t(M_i),\quad i=1,\ldots ,m, \end{aligned}$$

which yields a cubature \(\{(P_j,\omega _j)\}_{j=1}^n\). Note that the weights \(\{\omega _j\}_{j=1}^n\) are not necessarily nonnegative.

We claim that \(M_i\) and \(P_j\) can be chosen in a random fashion. Indeed, we observe that both spaces, \(\mathcal {G}_{k,d}\) and \(\mathscr {H}^\ell _d\), can be parametrized analytically, so that there is \(D>0\) and a surjective analytic mapping \(F:\mathbb {R}^D\rightarrow (\mathscr {H}^\ell _d)^m\times (\mathcal {G}_{k,d})^n\). Let us assume \(n=m=\dim ({{{\mathrm{Pol}}}}^\ell _{t}(\mathcal {G}_{k,d}))\) for simplicity. Otherwise, we can extract a submatrix. We now define

$$\begin{aligned} G:(\mathscr {H}^\ell _d)^n\times (\mathcal {G}_{k,d})^n&\rightarrow \mathbb {R}\\ \big ((P_i)_{i=1}^n , (M_j)_{j=1}^n \big )&\mapsto \det \big ((\langle P_i,M_j\rangle ^t)_{i,j}\big ). \end{aligned}$$

Since F is surjective, the mapping \(G\circ F:\mathbb {R}^D\rightarrow \mathbb {R}\) is not identically zero. Moreover, \(G\circ F\) is analytic, so that \((G\circ F)^{-1}(\{0\})\subset \mathbb {R}^D\) has Lebesgue measure zero and, hence, is a zero set with respect to any continuous probability measure \(\nu \) on \(\mathbb {R}^D\). Thus, the parametrization F enables a random choice in \((\mathscr {H}^\ell _d)^m\times (\mathcal {G}_{k,d})^n\), so that the matrix (31) has rank \(\dim ({{{\mathrm{Pol}}}}^\ell _{t}(\mathcal {G}_{k,d}))\) with probability one with respect to \(\nu \). In other words, \(G^{-1}(\{0\})\) is a zero set with respect to the induced probability measure \(\nu _F\) on \((\mathscr {H}^\ell _d)^m\times (\mathcal {G}_{k,d})^n\). Thus, (31) has rank \(\dim ({{{\mathrm{Pol}}}}^\ell _{t}(\mathcal {G}_{k,d}))\) with probability one and weights \(\{\omega _j\}_{j=1}^n\) can be computed.

Let us also verify that (31) having rank \(\dim ({{{\mathrm{Pol}}}}^\ell _{t}(\mathcal {G}_{k,d}))\) is a generic property. Indeed, both spaces, \(\mathcal {G}_{k,d}\) and \(\mathscr {H}^\ell _d\), are real algebraic varieties that are irreducible, cf. [5, 21], so that also \((\mathscr {H}^\ell _d)^m\times (\mathcal {G}_{k,d})^n\) is irreducible. Without loss of generality, we can restrict us to \(n=m=\dim ({{{\mathrm{Pol}}}}^\ell _{t}(\mathcal {G}_{k,d}))\) again. Note that G is a polynomial map and, hence, is Zariski continuous. Therefore, the set \(U:=\{u\in (\mathscr {H}^\ell _d)^m\times (\mathcal {G}_{k,d})^n : G(u)\ne 0\}\) is Zariski open. Classical arguments yield that U cannot be empty, so that irreducibility yields that U is Zariski dense. Thus, we have verified that, for \(n,m\ge \dim ({{{\mathrm{Pol}}}}^\ell _{t}(\mathcal {G}_{k,d}))\), there is a nonempty Zariski open and dense subset U in \((\mathscr {H}^\ell _d)^m\times (\mathcal {G}_{k,d})^n\) such that the matrix

$$\begin{aligned} (\langle M_i,P_j\rangle ^t)_{\begin{array}{c} i=1\ldots ,m\\ j=1,\ldots ,n \end{array}},\quad \text { where} \quad \big ((M_i)_{i=1}^m,(P_j)_{j=1}^n\big )\in U, \end{aligned}$$

has rank \(\dim ({{{\mathrm{Pol}}}}^\ell _{t}(\mathcal {G}_{k,d}))\).

4.2 Deterministic Construction

Here, we present the design of cubatures as the solution of an optimization problem. As in [12], we shall apply the theory of reproducing kernel Hilbert spaces. We first define a measure \(\nu _{\ell ,d}\) on \(\mathscr {H}^\ell _d:=\{M\in \mathscr {H}_d: {{\mathrm{rank}}}(M)\le \ell \}\) by

$$\begin{aligned} \nu _{\ell ,d}(\mathcal {A}):=\int _{\mathbb {S}^{\ell -1}} \int _{\mathcal {O}_d} 1_{\mathcal {A}}(O^*{{\mathrm{diag}}}(\lambda _1,\ldots ,\lambda _\ell ,0,\ldots ,0)O) d\lambda dO \, , \end{aligned}$$

where \(1_{\mathcal A}\) is the indicator function of the set \(\mathcal A\). It is not hard to see that the mapping

$$\begin{aligned} K^\ell _t:\mathcal {G}_{k,d}\times \mathcal {G}_{k,d}&\rightarrow \mathbb {R}\\ (P_1,P_2)&\mapsto \int _{\mathscr {H}^\ell _{d}}\langle P_1,M\rangle ^t\langle M,P_2\rangle ^t d\nu _{\ell ,d}(M) \end{aligned}$$

is a positive definite kernel on \(\mathcal {G}_{k,d}\). Next, we check that the function spaces under consideration are spanned by the shifts of \(K^\ell _t\).

Proposition 4.1

If \(\ell \) and t are nonnegative integers, then

$$\begin{aligned} {{\mathrm{Pol}}}_t^\ell (\mathcal {G}_{k,d})&= {{\mathrm{span}}}\{\langle M,\cdot \rangle ^t\big |_{\mathcal G_{k,d}} : M\in \mathscr {H}_d,\;{{\mathrm{rank}}}(M)=\ell \}\\&= {{\mathrm{span}}}\{K^\ell _t(P,\cdot )|_{\mathcal {G}_{k,d}} : P\in \mathcal {G}_{k,d} \}.\\ \end{aligned}$$

If \(\ell _1,\ell _2\) and \(t_1,t_2\) are nonnegative integers, then

$$\begin{aligned} {{\mathrm{Pol}}}^{\ell _1}_{t_1}(\mathcal {G}_{k,d}) \cdot {{\mathrm{Pol}}}^{\ell _2}_{t_2}(\mathcal {G}_{k,d})&\subset {{\mathrm{Pol}}}^{\ell _1+\ell _2}_{t_1+t_2}(\mathcal {G}_{k,d}). \end{aligned}$$

Proof

To verify the first equality, we must check that the left-hand side is contained in the right-hand side. We first define

$$\begin{aligned} \widetilde{{{\mathrm{Pol}}}}^\ell _t(\mathcal {G}_{k,d}):={{\mathrm{span}}}\{\langle M,\cdot \rangle ^t\big |_{\mathcal G_{k,d}} : M\in \mathscr {H}^\ell _d\}. \end{aligned}$$

Since the rank \(\ell \) matrices are dense in \(\mathscr {H}^\ell _d\), we obtain

$$\begin{aligned} \widetilde{{{\mathrm{Pol}}}}^{\ell }_t(\mathcal {G}_{k,d}) = {{\mathrm{span}}}\{\langle M,\cdot \rangle ^t\big |_{\mathcal G_{k,d}} : M\in \mathscr {H}_d,\; {{\mathrm{rank}}}(M) =\ell \}. \end{aligned}$$

Thus, the first equality holds if we can verify that the spaces \(\widetilde{{{\mathrm{Pol}}}}^{\ell }_t(\mathcal {G}_{k,d}) \) are an ascending sequence in t, i.e., \(\widetilde{{{\mathrm{Pol}}}}^{\ell }_t(\mathcal {G}_{k,d}) \subset \widetilde{{{\mathrm{Pol}}}}^{\ell }_{t+1}(\mathcal {G}_{k,d})\). To do so, we first aim to verify

$$\begin{aligned} \widetilde{{{\mathrm{Pol}}}}^{\ell }_t(\mathcal {G}_{k,d}) = \bigoplus _{t_1+\ldots +t_\ell =t}\widetilde{{{\mathrm{Pol}}}}^{1}_{t_1}(\mathcal {G}_{k,d})\cdots \widetilde{{{\mathrm{Pol}}}}^{1}_{t_\ell }(\mathcal {G}_{k,d}). \end{aligned}$$
(32)

The spectral decomposition yields that the left-hand side is contained in the right-hand side. To verify the reverse set inclusion, we must check that

$$\begin{aligned} P\mapsto \langle P,x_1x^*_1\rangle ^{t_1} \cdots \langle P,x_\ell x^*_\ell \rangle ^{t_\ell } \in \widetilde{{{\mathrm{Pol}}}}^{\ell }_t(\mathcal {G}_{k,d}),\quad \text {for all } t_1+\ldots +t_\ell =t. \end{aligned}$$
(33)

We now observe that [13, Lemma 7.1] as already used in (3) yields (33). Thus, (32) is satisfied. It was checked in [3] that \(\widetilde{{{\mathrm{Pol}}}}^{1}_{t}(\mathcal {G}_{k,d})\subset \widetilde{{{\mathrm{Pol}}}}^{1}_{t+1}(\mathcal {G}_{k,d})\) holds, so that (32) implies \(\widetilde{{{\mathrm{Pol}}}}^{\ell }_t(\mathcal {G}_{k,d}) \subset \widetilde{{{\mathrm{Pol}}}}^{\ell }_{t+1}(\mathcal {G}_{k,d})\), which yields the first equality.

The second part of the proposition follows from the first equality and (32).

We now take care of the second equality. Since \(\mathcal {M}:={{\mathrm{span}}}\{ \langle \cdot ,P_1\rangle ^t \langle \cdot ,P_2\rangle ^t|_{\mathscr {H}^\ell _d} : P_1,P_2\in \mathcal {G}_{k,d}\}\) is finite-dimensional, classical arguments let us infer that there are \(\{M_j\}_{j=1}^m\subset \mathscr {H}^\ell _d\) and numbers \(\{\omega _j\}_{j=1}^m\in \mathbb {R}\) such that, for all \(P_1,P_2\in \mathcal {G}_{k,d}\),

$$\begin{aligned} \sum _{j=1}^m \omega _j \langle M_j,P_1\rangle ^t \langle M_j,P_2\rangle ^t = \int _{\mathscr {H}^\ell _d} \langle M,P_1\rangle ^t \langle M,P_2\rangle ^t d\nu _{\ell ,d}(M), \end{aligned}$$
(34)

cf. [15, Theorem 6.1]. By applying (34), we derive

$$\begin{aligned} K^\ell _t(P,\cdot )&= \int _{\mathscr {H}^\ell _d} \langle M,P\rangle ^t \langle M,\cdot \rangle ^t d\nu _{\ell ,d}(M)\\&= \sum _{j=1}^m \omega _j \langle M_j,P\rangle ^t \langle M_j,\cdot \rangle ^t\in {{\mathrm{Pol}}}^\ell _t(\mathcal {G}_{k,d}). \end{aligned}$$

Thus, we have verified that \({{\mathrm{span}}}\{K^\ell _t(P,\cdot )|_{\mathcal {G}_{k,d}} : P\in \mathcal {G}_{k,d} \}\subset {{\mathrm{Pol}}}^\ell _t(\mathcal {G}_{k,d})\). To verify the reverse inclusion, we shall check that \(\dim ({{\mathrm{span}}}\{K^\ell _t(P,\cdot )|_{\mathcal {G}_{k,d}} : P\in \mathcal {G}_{k,d} \})\ge \dim ({{\mathrm{Pol}}}^\ell _t(\mathcal {G}_{k,d}))\).

We first observe that

$$\begin{aligned} \dim ({{\mathrm{span}}}\{\langle \cdot ,M\rangle ^t|_{\mathcal {G}_{k,d}} : M\in \mathscr {H}^\ell _d \}) = \dim ({{\mathrm{span}}}\{\langle P,\cdot \rangle ^t|_{\mathscr {H}^\ell _d} : P\in \mathcal {G}_{k,d} \}), \end{aligned}$$
(35)

which is a general principle that holds in much more generality, see [13, Proof of Lemma 5.5] for details. Note that the left-hand side of (35) is \(\dim ({{\mathrm{Pol}}}^\ell _t(\mathcal {G}_{k,d}))\), and we shall denote this number by r here. Then according to (35), there are \(\{P_j\}_{j=1}^r\subset \mathcal {G}_{k,d}\) such that \(\{\langle P_j,\cdot \rangle ^t|_{\mathscr {H}^\ell _d}\}_{j=1}^r\) is a basis for \({{\mathrm{span}}}\{\langle P,\cdot \rangle ^t|_{\mathscr {H}^\ell _d} : P\in \mathcal {G}_{k,d} \}\). If we can verify that the matrix \(K:=\big (K^\ell _t(P_i,P_j) \big )_{i,j=1}^r\) is nonsingular, then \(\{K^\ell _t(P_j,\cdot )\}_{j=1}^r\) is linearly independent, which concludes the proof. Indeed, suppose that \(\alpha ^*K\alpha =0\), then we obtain

$$\begin{aligned} 0&= \sum _{i,j}\alpha _i\alpha _j K^\ell _t(P_i,P_j)\\&= \int _{\mathscr {H}^\ell _{d}} \sum _{i,j}\alpha _i\alpha _j\langle P_i,M\rangle ^t\langle M,P_j\rangle ^t d\nu _{\ell ,d}(M)\\&= \int _{\mathscr {H}^\ell _{d}} \big ( \sum _{i}\alpha _i\langle P_i,M\rangle ^t\big )^2 d\nu _{\ell ,d}(M). \end{aligned}$$

This implies \(\sum _{i}\alpha _i\langle P_i,M\rangle ^t=0\), for all \(M\in {{\mathrm{supp}}}(\nu _{\ell ,d})\). Since \(\langle P_j,\cdot \rangle ^t|_{\mathscr {H}^\ell _d}\) are homogeneous polynomials, the latter also holds for all \(M\in \mathscr {H}^\ell _d\). The linear independence of \(\{\langle P_j,\cdot \rangle ^t|_{\mathscr {H}^\ell _d}\}_{j=1}^r\) implies that we must have \(\alpha _1,\ldots ,\alpha _r=0\). Thus, K is indeed nonsingular, and this concludes the proof. \(\square \)

Remark 4.2

The end of the above proof shows that the special form of \(\nu _{\ell ,d}\) is not important, and any measure with sufficiently large support would work.

The kernel \(K^\ell _t\) induces an inner product (and hence also a norm \(\Vert \cdot \Vert _{K^\ell _t}\)) on \({{\mathrm{Pol}}}^\ell _t(\mathcal {G}_{k,d})\) by

$$\begin{aligned} \langle f,g\rangle _{{{{\mathrm{Pol}}}}^\ell _t}:= \sum _{i,j} \alpha _i\beta _j K^\ell _t(P_i,\tilde{P}_j), \end{aligned}$$
(36)

where \(f=\sum _{i}\alpha _i K^\ell _t(P_i,\cdot )\) and \(g=\sum _{j}\beta _j K^\ell _t(\tilde{P}_i,\cdot )\). Note that the expression (36) does not depend on the special choice of \(P_i\) and \(\tilde{P_j}\). The induced norm enables us to introduce approximate cubatures:

Definition 4.3

We say that \(\{(P_j,\omega _j)\}_{j=1}^n\) is an \(\epsilon \)-approximate cubature for \( {{{\mathrm{Pol}}}}^{\ell }_{t}(\mathcal {G}_{k,d})\) with respect to \(K^\ell _t\) if

$$\begin{aligned} \sup _{f\in {{{\mathrm{Pol}}}}^\ell _{t}(\mathcal {G}_{k,d}),\; \Vert f\Vert _{K^\ell _t}= 1} |\sum _{j=1}^n\omega _jf(P_j) -\int _{\mathcal {G}_{k,d}} f(P)d\sigma _{k,d}(P)|\le \epsilon . \end{aligned}$$
(37)

Apparently, an \(\epsilon \)-approximate cubature for \( {{{\mathrm{Pol}}}}^{\ell }_{t}(\mathcal {G}_{k,d})\) yields

$$\begin{aligned} \left| \sum _{j=1}^n\omega _jf(P_j) -\int _{\mathcal {G}_{k,d}} f(P)d\sigma _{k,d}(P)\right| \le \epsilon \Vert f\Vert _{K^\ell _t},\quad \text {for all }f\in {{{\mathrm{Pol}}}}^\ell _{t}(\mathcal {G}_{k,d}). \end{aligned}$$

In order to numerically find \(\epsilon \)-approximate cubatures, we consider the modified fusion frame potential

$$\begin{aligned} \sum _{i,j} \omega _i\omega _jK^\ell _t(P_i,P_j). \end{aligned}$$
(38)

By following the lines in [12] for the standard fusion frame potential, see also [3], we derive that

$$\begin{aligned} c^\ell _t:= \int _{\mathcal {G}_{k,d}} \int _{\mathcal {G}_{k,d}} K^\ell _t(P,Q) d\sigma _{k,d}(P)d\sigma _{k,d}(Q) \end{aligned}$$
(39)

is a lower bound on (38), and the gap

$$\begin{aligned} \sum _{i,j} \omega _i\omega _jK^\ell _t(P_i,P_j)-c^\ell _t \ge 0 \end{aligned}$$

is exactly the squared cubature error, i.e.,

$$\begin{aligned} \sum _{i,j} \omega _i\omega _jK^\ell _t(P_i,P_j)-c^\ell _t = \sup _{f\in {{{\mathrm{Pol}}}}^\ell _{t}(\mathcal {G}_{k,d}),\; \Vert f\Vert _{K^\ell _t}= 1} \left| \sum _{j=1}^n\omega _jf(P_j) -\int _{\mathcal {G}_{k,d}} f(P)d\sigma _{k,d}(P)\right| ^2. \end{aligned}$$

Indeed, if (38) can be minimized numerically, then a proper cubature or at least an \(\epsilon \)-approximate cubature can be obtained, where \(\epsilon \) relates to machine precision provided there exists a corresponding cubature for this choice of n. However, numerical evaluation of the kernel \(K^\ell _t\) may be difficult in practice. In the subsequent section, we shall circumvent such difficulties by considering cubatures for larger spaces that enable us to work with a simpler kernel.

5 Construction of Cubatures for \({{\mathrm{Pol}}}_{t}(\mathcal {G}_{k,d})\)

5.1 Cubatures from Optimization Procedures

This section is dedicated to derive cubatures from a numerical scheme that is indeed easy to implement. We define polynomials of degree at most t on \(\mathcal {G}_{k,d}\) by

$$\begin{aligned} {{\mathrm{Pol}}}_t(\mathcal {G}_{k,d}) :=\{\text {polynomials of degree at most } t \text { on } \mathscr {H}_d \text { restricted to } \mathcal {G}_{k,d}\}. \end{aligned}$$
(40)

Note that \({{\mathrm{Pol}}}_t(\mathcal {G}_{k,d})\) satisfies the product property that is usually associated with polynomial spaces, i.e.,

$$\begin{aligned} {{\mathrm{span}}}\big ({{\mathrm{Pol}}}_{t_1}(\mathcal {G}_{k,d})\cdot {{\mathrm{Pol}}}_{t_2}(\mathcal {G}_{k,d}) \big ) = {{\mathrm{Pol}}}_{t_1+t_2}(\mathcal {G}_{k,d}), \end{aligned}$$

see, for instance, [12]. It is known that these spaces can be rewritten as

$$\begin{aligned} {{\mathrm{Pol}}}_t(\mathcal {G}_{k,d}) ={{\mathrm{span}}}\{\langle M,\cdot \rangle ^t\big |_{\mathcal G_{k,d}} : M\in \mathscr {H}_d\}, \end{aligned}$$

see, for instance, [3, 12]. Obviously, \({{\mathrm{Pol}}}^\ell _{t}(\mathcal {G}_{k,d})\) is contained in \({{\mathrm{Pol}}}_{t}(\mathcal {G}_{k,d})\). In the following proposition, we explore when equality holds:

Proposition 5.1

For \(0\le \ell <d\) and \(0\le t\), we have

$$\begin{aligned} {{\mathrm{Pol}}}^{\ell }_t(\mathcal {G}_{k,d}) = {{\mathrm{Pol}}}_t(\mathcal {G}_{k,d}),\quad \text {for }\ell \ge \min \{k,t\}. \end{aligned}$$

Proof

For \(\ell \ge k\), the equality is standard cf. [12]. We derive from (32) that

$$\begin{aligned} {{\mathrm{Pol}}}^{t}_t(\mathcal {G}_{k,d}) = {{\mathrm{span}}}\big ( {{\mathrm{Pol}}}^{1}_{1}(\mathcal {G}_{k,d})\cdots {{\mathrm{Pol}}}^{1}_{1}(\mathcal {G}_{k,d})\big ), \end{aligned}$$

where the product has t terms, holds, and the findings in [3] yield that the right-hand side equals \({{\mathrm{Pol}}}_t(\mathcal {G}_{k,d})\). Therefore, \(\ell \ge t\) also yields \({{\mathrm{Pol}}}^{\ell }_t(\mathcal {G}_{k,d})={{\mathrm{Pol}}}_t(\mathcal {G}_{k,d})\).

Note that \(\{(P_j,\omega _j)\}_{j=1}^n\) being a cubature for \({{\mathrm{Pol}}}^{2}_{3}(\mathcal {G}_{k,d})\) as used in Corollary 2.5 already implies that it is also a cubature for \({{\mathrm{Pol}}}_{2}(\mathcal {G}_{k,d})={{\mathrm{Pol}}}^2_{2}(\mathcal {G}_{k,d})\) and for \({{\mathrm{Pol}}}_{1}(\mathcal {G}_{k,d})={{\mathrm{Pol}}}^2_{1}(\mathcal {G}_{k,d})\). It should also be mentioned that the space \({{\mathrm{Pol}}}^2_{t}(\mathcal {G}_{1,d})\) in Corollary 3.5 is the same as \({{\mathrm{Pol}}}_{t}(\mathcal {G}_{1,d})\). Hence, for \(k=1\), we were dealing with the space (40) all along.

A computational approach for cubatures for \({{\mathrm{Pol}}}_{t}(\mathcal {G}_{k,d})\) is discussed in [12]. Since \({{\mathrm{Pol}}}^{2}_{t}(\mathcal {G}_{k,d})\) is a subset, this approach yields also cubatures for \({{\mathrm{Pol}}}^{2}_{t}(\mathcal {G}_{k,d})\). By refining some ideas in [12], we shall introduce \(\epsilon \)-approximate cubatures for the kernel

$$\begin{aligned} K_t:\mathcal {G}_{k,d}\times \mathcal {G}_{k,d}&\rightarrow \mathbb {R}\\ (P_1,P_2)&\mapsto \langle P_1,P_2\rangle ^t. \end{aligned}$$

Indeed, \(K_t\) is a positive definite kernel on \(\mathcal {G}_{k,d}\) and its shifts generate \({{\mathrm{Pol}}}_{t}(\mathcal {G}_{k,d})\), i.e.,

$$\begin{aligned} {{\mathrm{Pol}}}_t(\mathcal {G}_{k,d}) = {{\mathrm{span}}}\{K_t(P,\cdot ) : P\in \mathcal {G}_{k,d} \}. \end{aligned}$$

The kernel \(K_t\) induces an inner product on \({{\mathrm{Pol}}}_t(\mathcal {G}_{k,d})\) analogously to (36) and, in turn, also a norm \(\Vert \cdot \Vert _{K_t}\).

Definition 5.2

We say that \(\{(P_j,\omega _j)\}_{j=1}^n\) is an \(\epsilon \)-approximate cubature for \( {{{\mathrm{Pol}}}}_{t}(\mathcal {G}_{k,d})\) with respect to \(K_t\) if

$$\begin{aligned} \sup _{f\in {{{\mathrm{Pol}}}}_{t}(\mathcal {G}_{k,d}),\; \Vert f\Vert _{K_t}= 1} \left| \sum _{j=1}^n\omega _jf(P_j) -\int _{\mathcal {G}_{k,d}} f(P)d\sigma _{k,d}(P)\right| \le \epsilon . \end{aligned}$$

In the following, we shall describe that \(\epsilon \)-approximate cubatures for \( {{{\mathrm{Pol}}}}_{t}(\mathcal {G}_{k,d})\) can be computed by numerical schemes as at the end of Sect. 4.2. Indeed, the potential

$$\begin{aligned} \sum _{i,j} \omega _i\omega _j K_t(P_i,P_j) \end{aligned}$$
(41)

can be bounded from below by

$$\begin{aligned} \lambda _t:=\int _{\mathcal {G}_{k,d}}\int _{\mathcal {G}_{k,d}}K_t(P,P')d\sigma _{k,d}(P)d\sigma _{k,d}(P'), \end{aligned}$$
(42)

so that

$$\begin{aligned} \sum _{i,j} \omega _i\omega _jK_t(P_i,P_j)-\lambda _t\ge 0. \end{aligned}$$

As in the previous section, this gap is exactly the squared cubature error, i.e.,

$$\begin{aligned} \sum _{i,j}\omega _i\omega _jK_t(P_i,P_j)-\lambda _t = \sup _{f\in {{{\mathrm{Pol}}}}_{t}(\mathcal {G}_{k,d}),\; \Vert f\Vert _{K_t}= 1} \left| \sum _{j=1}^n\omega _jf(P_j) -\int _{\mathcal {G}_{k,d}} f(P)d\sigma _{k,d}(P)\right| ^2. \end{aligned}$$

It is remarkable that (42) can be computed exactly by analytical tools, so that the outcome of numerical optimization schemes minimizing (41) can be compared with \(\lambda _t\), see [12] for further details and examples of successful minimization outcomes. Indeed, the easier structure of the kernel \(K_t\) generating \({{\mathrm{Pol}}}_t(\mathcal {G}_{k,d})\) makes this approach more amenable to numerical optimization than the setting of \({{\mathrm{Pol}}}^\ell _t(\mathcal {G}_{k,d})\) presented in the previous section.

5.2 Approximate Cubatures from Randomized Projections

We now examine to what extent a random choice of projections gives an approximate cubature. Let us call a (Borel)-probability measure \(\nu _{k,d}\) on \(\mathcal {G}_{k,d}\) a probabilistic cubature for \({{\mathrm{Pol}}}_t(\mathcal {G}_{k,d})\) if

$$\begin{aligned} \int _{\mathcal {G}_{k,d}} f(P)d\nu _{k,d}(P) = \int _{\mathcal {G}_{k,d}} f(P)d\sigma _{k,d}(P),\quad \text {for all } f\in {{\mathrm{Pol}}}_t(\mathcal {G}_{k,d}). \end{aligned}$$
(43)

Note that any cubature for \({{\mathrm{Pol}}}_{t}(\mathcal {G}_{k,d})\) can be considered as a finitely supported probabilistic cubature, provided that the weights are nonnegative. Another example, of course, is \(\sigma _{k,d}\) itself.

In the remainder of this section, we let each \(\omega _j = \frac{1}{n}\) and choose each \(P_j\) according to a probabilistic cubature \(\nu _{k,d}\). In that case, \(\{P_j\}_{j=1}^n\) is a collection of random matrices and the expected value of the gap, that is, the squared cubature error, can be computed explicitly. Denoting the expectation with respect to the random choice of \(\{P_j\}_{j=1}^n\) by \(\mathbb E_P\), and using that \(\mathbb E_P K_t(P_i,P_i) = k^t \) and \(\mathbb E_P K_t(P_i,P_j) = \lambda _t\) if \(i \ne j\), we get

$$\begin{aligned} \mathbb E_P \left[ \frac{1}{n^2} \sum _{i,j=1}^n K_t(P_i, P_j) - \lambda _t \right]&= \frac{k^t}{n} + \frac{n(n-1)}{n^2} \lambda _t - \lambda _t = \frac{1}{n}(k^t - \lambda _t). \end{aligned}$$

Thus, letting n grow faster than \(k_t\) ensures that the expected value of the gap becomes arbitrarily small. In the following theorem, we show that this expected behavior happens with overwhelming probability.

Theorem 5.3

If \(\{P_j\}_{j=1}^n\) are chosen independently identically distributed with respect to a probabilistic cubature \(\nu _{k,d}\) for \({{\mathrm{Pol}}}_t(\mathcal {G}_{k,d})\) and \(\tau >0\), then

$$\begin{aligned} \mathbb P\left( \frac{1}{n^2} \sum _{i,j=1}^n K_t(P_i, P_j) -\lambda _t - \frac{1}{n}(k^t - \lambda _t)\ge \frac{\tau ^2 k^t}{n}\right) \le 4 e^{-\Psi _\tau (n)} r_\tau (n), \end{aligned}$$

where

$$\begin{aligned} \Psi _\tau (n) = \frac{\tau ^2/2}{ (1 - \lambda _t/k^t)+\frac{\tau }{3\sqrt{n}}},\qquad r_\tau (n) = 1 + \frac{6}{ n \tau ^2 \ln ^2(1+\frac{ \tau }{\sqrt{n} (1 - \lambda _t/k^t)})}. \end{aligned}$$

Proof

First, we note that \(\langle P_i, P_j \rangle ^t = \langle P_i^{\otimes t}, P_j^{\otimes t}\rangle \), where the Hilbert–Schmidt inner product on the right-hand side is on the Hilbert space \((\mathbb R^{d\times d})^{\otimes t} \simeq \mathbb R^{d^2t}\). Thus, \(\sum _{i,j} K_t(P_i, P_j) = \Vert \sum _{j} P_j^{\otimes t}\Vert _{HS}^2\) holds.

We define the averaged tensor power \(\Lambda _t = \mathbb E_{P} P_1^{\otimes t}\). For \(P_0\in \mathcal {G}_{k,d}\), we can compute

$$\begin{aligned} \langle P_0^{\otimes t},\Lambda _t\rangle&= \left\langle P_0^{\otimes t},\int _{\mathcal {G}_{k,d}} P^{\otimes t}d\nu _{k,d}(P)\right\rangle =\int _{\mathcal {G}_{k,d}} \langle P_0,P\rangle ^t d\nu _{k,d}(P). \end{aligned}$$

Since \(\nu _{k,d}\) is a probabilistic cubature and \(\langle P_0,\cdot \rangle ^s\in {{\mathrm{Pol}}}_t(\mathcal {G}_{k,d})\), we obtain

$$\begin{aligned} \langle P_0^{\otimes t},\Lambda _t\rangle&= \int _{\mathcal {G}_{k,d}} \langle P_0,P\rangle ^t d\sigma _{k,d}(P). \end{aligned}$$

Let \(U\in \mathcal {O}_d\) be such that \(U^*D_kU=P_0\), where \(D_k\) denotes the diagonal matrix with k ones and zeros elsewhere. The commutativity of the trace and the orthogonal invariance of \(\sigma _{k,d}\) yield

$$\begin{aligned} \langle P_0^{\otimes t},\Lambda _t\rangle&=\int _{\mathcal {G}_{k,d}} \langle U^*D_kU,P\rangle ^t d\sigma _{k,d}(P)\\&=\int _{\mathcal {G}_{k,d}} \langle D_k,UPU^*\rangle ^t d\sigma _{k,d}(P)\\&= \int _{\mathcal {G}_{k,d}} \langle D_k,P\rangle ^t d\sigma _{k,d}(P). \end{aligned}$$

By applying the probabilistic cubature property once more, we derive

$$\begin{aligned} \langle P_0^{\otimes t},\Lambda _t\rangle&= \int _{\mathcal {G}_{k,d}} \langle D_k,P\rangle ^t d\nu _{k,d}(P)= \langle D_k^{\otimes t},\Lambda _t\rangle . \end{aligned}$$

Thus, the term \(\langle P_0^{\otimes t},\Lambda _t\rangle \) does not depend on the particular choice of \(P_0\in \mathcal {G}_{k,d}\). Averaging over all \(P_0\) with respect to \(\sigma _{k,d}\) then implies that for each i,

$$\begin{aligned} \langle P_i^{\otimes t},\Lambda _t\rangle = \int _{\mathcal {G}_{k,d}}\langle P_0^{\otimes t},\Lambda _t\rangle d\sigma _{k,d}(P_0)= \Vert \Lambda _t\Vert _{HS}^2. \end{aligned}$$
(44)

Similarly to the above computations, the probabilistic cubature property also yields

$$\begin{aligned} \Vert \Lambda _t\Vert _{HS}^2=\lambda _t. \end{aligned}$$
(45)

By applying (44) and (45), taking \(Y_j = (P_j^{\otimes t} - \Lambda _t)/k^{t/2}\) then gives

$$\begin{aligned} \Vert Y_j\Vert _{HS}^2 = 1 -\lambda _t/k^t. \end{aligned}$$

Hence, \(\Vert Y_j\Vert _{HS} \le 1\) and \(\mathbb {E}_PY_j = 0\), so that Minsker’s vector-valued Bernstein inequality [23, Corollary 5.1] provides, for all \(\tau >0\),

$$\begin{aligned} \mathbb P\left( \left\| \frac{1}{n} \sum _{j=1}^n Y_j \right\| ^2_{HS} > \tau ^2/n\right) \le 4 e^{-\Psi _{\tau }(n)} r_{\tau }(n), \end{aligned}$$

where \(\Psi _\tau \) and \(r_\tau \) are as stated. To finish the proof, we observe that

$$\begin{aligned} k^t\left\| \frac{1}{n} \sum _{j=1}^n Y_j \right\| ^2_{HS} = \frac{1}{n^2} \sum _{i,j=1}^n K_t(P_i, P_j) -\lambda _t - \frac{1}{n}(k^t - \lambda _t). \end{aligned}$$

\(\square \)

When n tends to infinity, then \(r_\tau (n) \rightarrow 1 + \frac{6(1 -\lambda _t/k^t )^2}{ \tau ^4}\) and \(\Psi _\tau (n) \rightarrow \frac{\tau ^2/2}{1 - \lambda _t/k^t}\). Thus, for large n, the distribution of the gap concentrates near zero at the same rate as the expected value. Since the gap is the square of the maximal cubature error, we conclude a probabilistic construction of approximate cubatures.

Corollary 5.4

If \(\{P_j\}_{j=1}^n\) are chosen independently from a probabilistic cubature for \({{\mathrm{Pol}}}_t(\mathcal {G}_{k,d})\) and \(\tau >0\), then a \(\sqrt{\frac{(1+\tau ^2)k^t-\lambda _t}{ n}}\)-approximate cubature for \( {{{\mathrm{Pol}}}}_{t}(\mathcal {G}_{k,d})\) with respect to \(K_t\) is obtained with probability bounded below by \( 1 - 4 e^{-\Psi _\tau (n)} r_\tau (n). \)

For related results on random matrices, we refer to [6, 22, 27, 28].

6 Error Propagation for \(\epsilon \)-Approximate Cubatures

The numerical optimization approach in general can provide cubatures up to machine precision only. Therefore, we are dealing with \(\epsilon \)-approximate cubatures and this is also what we obtain from the random constructions. In these cases, the moment reconstruction formulas in Corollary 2.5 hold up to some error term:

Theorem 6.1

Let \(X\in \mathbb {S}^{d-1}\) be a random vector and \(\{(P_j,\omega _j)\}_{j=1}^n\) be an \(\epsilon \)-approximate cubature for \({{\mathrm{Pol}}}_{t}(\mathcal {G}_{k,d})\) with respect to \(K_t\). Then (7) in Corollary 2.3 holds up to a constant \(c_\alpha \) times \(\epsilon \), i.e., for \(\alpha \in \mathbb {N}^d\), \(|\alpha |=t\),

$$\begin{aligned} \left| \mathbb {E} X^\alpha - \sum _{|\beta |\le t} a^\alpha _\beta \sum _{j=1}^n \omega _j\mathbb {E} (P_jX)^\beta \right| \le \epsilon c_\alpha . \end{aligned}$$
(46)

If \(k=1\) and X is random vector in \(\mathbb {R}^d\), then (30) in Corollary 3.5 holds up to a constant times \(\epsilon \mathbb {E}\Vert X\Vert ^t\).

The above theorem verifies that the cubature error propagates in a linear fashion when it comes to the moment reconstruction formulas. It should be mentioned though that the constant \(c_\alpha \) depends on k and d.

Proof

According to Theorem 2.1, we derive, for \(x\in \mathbb {S}^{d-1}\)

$$\begin{aligned} x^\alpha&= \sum _{s=1}^t \sum _{i=1}^m f_{s,i}\mu ^s_{k,d}(E_{x,y_i})\\&= \sum _{s=1}^t \sum _{i=1}^m f_{s,i}\int _{\mathcal {G}_{k,d}} \langle P,E_{x,y_i}\rangle ^s d\sigma _{k,d}(P) \end{aligned}$$

Since the function \(F^\alpha _x = \sum _{s=1}^t \sum _{i=1}^m f^\alpha _{s,i} \langle \cdot ,E_{x,y^\alpha _{s,i}}\rangle ^s\) is an element in \({{\mathrm{Pol}}}_t(\mathcal {G}_{k,d})\), the cubature property yields

$$\begin{aligned} \left| x^\alpha - \sum _{s=1}^t \sum _{i=1}^m f^\alpha _{s,i}\sum _{j=1}^n \omega _j \langle P_j,E_{x,y^\alpha _{s,i}}\rangle ^s \right| \le \epsilon \Vert F_x\Vert _{K_t}. \end{aligned}$$

The coefficients \(a^\alpha _\beta \) in Corollary 2.3 are used with \(c_\alpha :=\sup _{x\in \mathbb {S}^{d-1}} \Vert F^\alpha _x\Vert _{K_t}\) to derive (46).

The second part of the theorem can be verified in an analogous fashion, so we omit the details. \(\square \)

7 Concluding Remarks

Our results appear to match reasonable characteristics in distributed sensing. We require a rather large set of sensors (projectors) and we assume that the high-dimensional signal is modeled by means of a probability distribution. The sensors are deterministic and can even be given by the experimental setup as long as we are able to find weights such that projectors and weights altogether form a cubature. Each sensor must reconstruct the first few moments of the projection marginal distribution, which may allow in practice for fewer data samples than for estimating the marginal distribution itself. In the end, the first few moments of the high-dimensional random signal can be computed with low costs by a closed formula.

As far as we know, the present paper is a first attempt to address this type of moment recovery problem with tools from harmonic analysis. Further investigations are necessary to combine those ideas with proper statistical estimation techniques, in which the low-dimensional moments are estimated from acquired data. This is intended to be addressed in forthcoming work.