From low to high-dimensional moments without magic

We aim to compute the first few moments of a high-dimensional random vector from the first few moments of a number of its low-dimensional projections. To this end, we identify algebraic conditions on the set of low-dimensional projectors that yield explicit reconstruction formulas. We also provide a computational framework, with which suitable projectors can be derived by solving an optimization problem. Finally, we show that randomized projections permit approximate recovery.


Introduction
A central problem in dimension reduction, distributed sensing, and many statistical applications is the identification of properties of a high-dimensional random vector from knowledge of marginal distributions, i.e., the distributions of one or more lower-dimensional projections of the random vector. A simple example in statistics is the problem of computing its lowest moments. However, knowledge of some marginal distributions may not be sufficient to identify the first few high-dimensional moments. Here, we shall address the problem of designing low-dimensional projections of the random vector, so that its high-dimensional moments can be computed from the lower dimensional ones by an explicit formula.
We consider a random vector X in R d , distributed according to some Borel probability measure. In practice, X could be a random signal that is observed by distributed sensors, each measuring a certain piece of information. Inspired by [8,20], each sensor is modeled as a matrix Q j ∈ R k×d with full rank k < d. Computing with P j := Q * j (Q j Q * j ) −1 Q j instead of Q j , we can effectively turn our measurement matrices into orthogonal projectors {P j } n j=1 ⊂ G k,d , where G k,d denotes the set of orthogonal projectors on R d with rank k, i.e., P j is the orthogonal projector onto the row-space of Q j . A variant of the Cramér-Wold Theorem says that two random vectors X, Y ∈ R d are identically distributed if and only if, for all P ∈ G k,d , the two random vectors P X, P Y are identically distributed, cf. [26]. For further related results on projected distributions, we refer to [17,14,4,10]. Here, we do not wish to identify the distribution of X, but restrict us to recover its first few moments. On the other hand, we want to achieve this by observing the moments of a number of low-dimensional projections and combining the information in a process we call moment fusion.
1.1. Moment fusion. Suppose X is a random vector in R d distributed according to some unknown Borel probability measure on R d . For a fixed integer p > 0, our goal is to determine the low-order moments (1) EX s , s ∈ N d , |s| ≤ p, 1 from low-order moments of lower-dimensional projections. We use here multi-index notation X s = X s1 1 · · · X s d d and |s| = d j=1 s j . More specifically, we suppose that we have only access to the first p moments of low-dimensional linear measurements, i.e., for certain matrices {Q j } n j=1 ⊂ R k×d with fixed rank k < d, we suppose that we know (2) E(Q j X) s , s ∈ N k , |s| ≤ p.
From knowledge of {Q j } n j=1 and the first p moments of the dimension reduced random vectors Q j X, j = 1, . . . , n, in (2), we aim to reconstruct the high-dimensional moments of X in (1).

Special examples.
Suppose that x ∈ R d is a vector of unknowns. If {Q j } n j=1 are chosen such that {(Q j x) s : j = 1, . . . , n, s ∈ N k , |s| ≤ p} spans the space of polynomials in x of degree at most p, then (1) can be computed from (2) by expressing each expected value EX s , s ∈ N d , |s| ≤ p, in a suitable linear combination of the expected values of {E(Q j X) s ′ : j = 1, . . . , n, s ′ ∈ N k , |s ′ | ≤ p } n j=1 . We provide an example for k = 1: Example 1.1.
(1) If p = 1, then we can simply choose Q j := e * j , j = 1, . . . , d, where {e j } d j=1 is the standard orthonormal basis for R d . So, reconstruction is possible with d projectors.
Note that the number of linear measurements in Example 1.1 is exactly the dimension of the homogeneous polynomials of degree p in d variables. Similar examples can be derived for more general situations, and the following example deals with k = 2: , for d odd, enables us to reconstruct the high-dimensional mean from the lower-dimensional means.
(2) For p = 2, moment reconstruction works with the d 2 projectors e * i e * j , for i < j.

1.3.
Outline and contribution of this paper. The present paper is dedicated to go beyond the explicit Examples 1.1 and 1.2, and instead, provide a general strategy for moment reconstruction. Our main contribution is the identification of conditions on the projectors that yield explicit reconstruction formulas. Moreover, such conditions are compatible with numerical schemes, meaning that suitable projectors can be constructed explicitly by minimizing a certain potential function as discussed in Sections 4 and 5. We also discuss randomized constructions. Our approach stems from applied harmonic analysis and relates to the concept of so-called Grassmannian cubatures, see, for instance, [2,3].
The remainder of the paper is organized as follows: The condition on the projections and the associated reconstruction formula are formulated in Section 2 for random vectors X on the unit sphere S d−1 . In Section 3 we deal with X ∈ R d and either limit us to up to third moments or use rank one projections. Sections 4 and 5 are dedicated to the construction of suitable projectors based on numerical optimization and on a randomization strategy. The results in Section 6 imply that suitably randomized constructions can provide approximate moment recovery, with an error bound that holds with overwhelming probability.
2. Main reconstruction results for X ∈ S d−1 In this section, we focus on random vectors X with values in the unit sphere S d−1 of R d . We will rely on some results on cubatures for polynomial spaces on the Grassmannian manifold, see [1,2,3,12,13].
2.1. General moment reconstruction. We shall make use of the trace inner product M 1 , is the set of rank-k orthogonal projections on R d . Note that {Q j } n j=1 ⊂ R k×d with all matrices having full rank k < d, and P j : In place of {Q j } n j=1 we shall find conditions on {P j } n j=1 that enable moment reconstruction, i.e., conditions on the respective row-spaces of {Q j } n j=1 . The orthogonal group O(d) acts transitively on G k,d by conjugation P → U P U * , for P ∈ G k,d and U ∈ O(d). Thus, there is an orthogonally invariant probability measure σ k,d on G k,d , which is induced by the Haar measure on O(d). This measure leads to the trace moments which were introduced in [12,13]. In the present section, we can restrict ourselves to µ t k,d (M ) := µ k,d (M, . . . , M ), where M occurs t times, and in Section 3 we shall make use of the more general case. In the following result, we use the notation E x,y := 1 2 (xy * + yx * ), for x, y ∈ R d . Theorem 2.1. For α ∈ N d with |α| = t, there are y α s,i ∈ S d−1 and coefficients f α s,i ∈ R, such that Proof. Note that [13, Lemma 7.1] yields and j∈J x ij = j∈J x, e ij leads to Thus, it is sufficient to check that each x, y t , for x, y ∈ S d−1 , can be written as a linear combination of terms µ s k,d (E x,y ), s = 1, . . . , t. We shall prove the statement by an induction over t. The case t = 1 is covered by [3].
To consider general t, we need some preparations. A partition of t is an integer vector π = (π 1 , . . . , π t ) whose entries are ordered by π 1 ≥ . . . ≥ π t ≥ 0 and sum up to t = t i=1 π i . We denote the number of nonzero entries by l(π), and the set of partitions π of t with l(π) ≤ d is denoted by P t,d .
According to invariant theory, cf. [25, Theorem 7.1], the expansion holds with suitable real-valued coefficients q t,d and α π . For x, y ∈ S d−1 , we observe that tr(E s x,y ) is a polynomial of degree s in x, y with leading coefficient ( 1 2 ) s−1 . The latter yields, for x, y ∈ S d−1 , that µ t k,d (E x,y ) is a polynomial in x, y of degree t, i.e., c π,s x, y s .
One can check that its leading coefficient does not vanish. Indeed, denoting by D 2 a diagonal rank-2 projection matrix, we observe and the right-hand-side is positive since the function ·, D 2 t ≥ 0 does not vanish entirely on G k,d . Therefore, we can isolate x, y t and write it as a linear combination of µ t k,d (E x,y ) and terms x, y s , for s = 1, . . . , t − 1. By induction, each term x, y s , s = 1, . . . , t − 1 can be written as a linear combination of terms µ s k,d (E x,y ), s = 1, . . . , t − 1, which concludes the proof. Theorem 2.1 represents monomials by linear combinations of µ s k,d (E x,y α s,i ), for some y α s,i ∈ S d−1 . Next, we shall aim to replace the latter with projected monomials. Let us define a function space on G k,d by and introduce the concept of cubatures on G k,d .
Note that the construction of cubatures for Pol ℓ t (G k,d ) and the properties of the function space Pol ℓ t (G k,d ) are discussed in more detail in the Sections 4 and 5, respectively. We can now formulate our first result on moment reconstruction, which is a direct consequence of Theorem 2.1.
there are coefficients a α β ∈ R, such that, for any random vector X ∈ S d−1 , Proof. We observe that P j x, y = P j , E x,y holds, for all x, y ∈ R d . According to Theorem 2.1 and since ·, ω j E P j X, y α s,i s and the assertion follows by observing that the terms E P j X, y α s,i s are linear combinations of moments of order s of P j X.
Since we are originally given the moments of Q j X, we must still express E(P j X) β , where P j = Q * j (Q j Q * j ) −1 Q j and β ∈ N d , |β| ≤ t, by means of moments of Q j X. If we suppress the index j, we obtain the multilinear relation Thus, the moments of Q j X enable us to apply (7).

Explicit moment reconstruction.
This section is dedicated to explicitly compute the expansion in Corollary 2.3 for t = 1, 2, 3 by using a very particular class of polynomial functions. Indeed, zonal polynomials, cf. [18,19,16,9,24], are special multivariate homogeneous polynomials on H d . These polynomials C π are indexed by all partitions π of N and are invariant under orthogonal conjugation. According to [18], see also [12,13], we obtain where D k is a diagonal matrix in R d×d with k ones and zeros elsewhere. Knowledge of the zonal polynomials enabled us in [13] to compute the trace moments for arbitrarily large t and with explicit expressions for t = 1, 2, 3: 13]). For all d ≥ 3 and k < d, we have For d = 2 and k = 1 in Theorem 2.4, the constant q 3,d would be zero, but so are α (1,1,1) , α (2,1) , and α (3) . The identity for µ 3 1,2 (M ) still holds with the modified coefficients Theorem 2.4 and the proof of Theorem 2.1 lead to the following explicit moment recovery formulas.
Remark 2.6. Note that (9) is proved by monomial identities, so that it also holds when the expectation is eliminated on both sides.
Suppose now that {(P j , ω j )} n j=1 is a cubature of Pol 2 t (G k,d ), so that we obtain and the left-hand-sides in (11), (12), and (13) can be replaced with n j=1 ω j P j x, y t , for t = 1, 2, 3, respectively. In the following, we assume x ∈ S d−1 and y ∈ R d . Rearranging terms leads to the following formulas, respectively, and A 1 , A 2 , A 3 , and (14) holds and (14), (15) hold, so that As at the beginning of the proof of Theorem 2.1, [13, Lemma 7.1] yields In order to compute the term x i1 · · · x it , we can repeatedly apply (14), (15), (16), respectively, with y = j∈J e ij . Such rearrangements yield the formulas and constants in Corollary 2.5.
Remark 2.7. The framework that we present in the present paper also allows the explicit computations of higher order moments beyond t = 1, 2, 3. Indeed, if {(P j , ω j )} n j=1 is a cubature for Pol 2 t (G k,d ), then we can compute all moments of order t by using the zonal polynomials. However, we do not have one closed formula incorporating t as a variable, but we need to compute the expressions for each t separately. Note that the formulas in Theorem 2.5 are merely based on identities between the corresponding polynomials in the entries of a unit vector x ∈ R d .
3. Moment fusion for X ∈ R d 3.1. The general case for up to cubic moments. A homogeneity argument yields that (9) even holds for random X ∈ R d . Analogously, considering (10) as a monomial identity with B 2 provided that {(P j , ω j )} n j=1 is a cubature for Pol 2 2 (G k,d ). In order to deal with X ∈ R d for t = 3 as well, we observe that the formulas in (11), (12), and (13) hold in more generality, see [13,Theorem 7.3], For x ∈ R d and y ∈ S d−1 , using the above relation gives x, y x 2 and combined with identity (13), we obtain 3,d = 4q 3,d 4α (1,1,1) +2α (2,1) +α (3) and C . Indeed, one can check that the term 4α (1,1,1) + 2α (2,1) + α (3) is nonzero. If {(P j , ω j )} n j=1 is a cubature for Pol 2 3 (G k,d ), then we can apply because the mapping P → P, E x,y 3 is contained in Pol 2 3 (G k,d ). In Proposition 5.1 we shall check that P → P, E x,y P, xx * is also contained in Pol 2 3 (G k,d ), so that also (19) µ holds. The actual moments of order 3 can now be computed from (4) by observing that P j x, y yields a linear combination of the terms (P j x) 1 , . . . , (P j x) d . We now collect the resulting expressions for all third moments: Corollary 3.1. Let X ∈ R d be a random vector with d ≥ 3, let the constants A 1 , A 2 and B 2 be as above, and let i, h, l ∈ {1, 2, . . . , d} with i = h = l = i.
, then (20) holds and (iii) If {(P j , ω j )} n j=1 is a cubature for Pol 2 3 (G k,d ), then (20), (21), (22) hold and Proof. The first and second moments have already been discussed prior to the statement of the corollary. For the third moments, the expression for EX 3 i results from choosing y = e i in (17), (18), and (19), which yields (23).

All moments from projections onto one-dimensional subspaces.
In the previous section, we have outlined the recovery of the moments for t = 1, 2, 3 and general k. To address all moments t > 3, we now restrict us to k = 1. Let us denote the permutation group of {1, . . . , t} by S t . We say a permutation s ∈ S t is associated to a partition π and denote this by s ∼ π if there is a set of cycles {c i } m i=1 such that s = (c 1 ) · · · (c m ) and the cardinality of c i equals π i , for i = 1, . . . , m. Note that we also use the standard notation c i ∈ s for a cycle c i occurring in s. For {M i } t i=1 ⊂ H d , we use a cycle index M ci := M ci,1 · · · M c i,ℓ i , where c i = (c i,1 . . . c i,ℓi ).
To clarify notation, we provide a simple example.
where {e i } d i=1 ∈ R d is the standard basis. Now, let s ∈ S t be another arbitrary permutation with some cycle c ∈ s. We denote the cardinality of c by l. From Using these observations we obtain where π σ is the partition associated to σ. Hence, π σ is a fraction of the trace moment µ 1,d (M 1 , . . . , M t ). It remains to verify that the latter is positive. Together with the definition of the trace moments µ 1,d (M 1 , . . . , M t ) and those of M i we arrive at Since σ is a permutation, we obtain 2 #{i,σ(i)} (O 1,i ) 2 dO > 0 and the assertion follows.
Hence, the assertion follows for m = 0, ℓ ∈ N 0 . For m = 1, ℓ ∈ N 0 and d ∈ N we find by (26) that Moreover, we can check that the coefficient of x, y x 2ℓ is nonzero by observing π∈Pt α π s∈S ℓ+1 s∼π which concludes the proof.
Corollary 3.5. If {(P j , ω j )} n j=1 is a cubature for Pol 3 t (G 1,d ), then, for α ∈ N d with |α| = t ≤ d, there are coefficients a α β ∈ R, such that, for any random vector X ∈ R d , Proof. For any i = 0, . . . , ⌊t/2⌋, the function F : G 1,d → R defined by P → P, E x,y t−2i P, xx * 2i is contained in Pol 3 t (G 1,d ), see part two of Proposition 4.1 in the subsequent Section 4 for details. Thus, the cubature property yields Note that P j x, y t−2i and P j x 2i are linear combinations of monomials in P j x of degree t − 2i and 2i, respectively. Hence, their product is a linear combination of monomials in P j x of degree t. Applying (4) and invoking Proposition 3.4 for ℓ = 0 concludes the proof.

4.
Constructing cubatures for Pol 2 t (G k,d ) In this section we shall derive a general framework for the construction of cubatures for Pol 2 t (G k,d ) that are needed to apply our results in Theorem 2.5. For general existence results of cubatures, we refer to [11], and explicit group theoretical constructions are provided in [7]. In the following we shall discuss random constructions as well as deterministic constructions based on the solution of an optimization problem. 4.1. Random construction. For n, m ≥ dim(Pol ℓ t (G k,d )), it follows from classical arguments that there are ...,m j=1,...,n has rank dim(Pol ℓ t (G k,d )). We can now compute weights ω := {ω} n j=1 by solving the linear system of equations which yields a cubature {(P j , ω j )} n j=1 . Note that the weights {ω j } n j=1 are not necessarily nonnegative.
We claim that M i and P j can be chosen in a random fashion. Indeed, we observe that both spaces, G k,d and H ℓ d , can be parametrized analytically, so that there is D > 0 and a surjective analytic mapping F : If ℓ 1 , ℓ 2 and t 1 , t 2 are nonnegative integers, then To verify the first equality, we must check that the left-hand-side is contained in the right-hand-side. We first define Since the rank ℓ matrices are dense in H ℓ d , we obtain Thus, the first equality holds if we can verify that the spaces Pol ℓ t (G k,d ) are an ascending sequence in t, i.e., Pol To do so, we first aim to verify The spectral decomposition yields that the left-hand-side is contained in the righthand-side. To verify the reverse set inclusion, we must check that is a lower bound on (38), and the gap is exactly the squared cubature error, i.e., Indeed, if (38) can be minimized numerically, then a proper cubature or at least an ǫ-approximate cubature can be obtained, where ǫ relates to machine precision provided there exists a corresponding cubature for this choice of n. However, numerical evaluation of the kernel K ℓ t may be difficult in practice. In the subsequent section, we shall circumvent such difficulties by considering cubatures for larger spaces that enable us to work with a simpler kernel.

Construction of cubatures for
Pol t (G k,d ) 5.1. Cubatures from optimization procedures. This section is dedicated to derive cubatures from a numerical scheme that is indeed easy to implement. We define polynomials of degree at most t on G k,d by Note that Pol t (G k,d ) satisfies the product property that is usually associated with polynomial spaces, i.e., span Pol t1 (G k,d ) · Pol t2 (G k,d ) = Pol t1+t2 (G k,d ), see, for instance, [12]. It is known that these spaces can be rewritten as see, for instance, [12,3]. Obviously, Pol ℓ t (G k,d ) is contained in Pol t (G k,d ). In the following proposition, we explore when equality holds: Proof. For ℓ ≥ k, the equality is standard cf. [12]. We derive from (32) that where the product has t terms, holds, and the findings in [3] yield that the righthand-side equals Pol t (G k,d ). Therefore, ℓ ≥ t also yields Pol ℓ t (G k,d ) = Pol t (G k,d ).
Note that {(P j , ω j )} n j=1 being a cubature for Pol 2 3 (G k,d ) as used in Corollary 2.5 already implies that it is also a cubature for Pol 2 (G k,d ) = Pol 2 2 (G k,d ) and for Pol 1 (G k,d ) = Pol 2 1 (G k,d ). It should also be mentioned that the space Pol 2 t (G 1,d ) in Corollary 3.5 is the same as Pol t (G 1,d ). Hence, for k = 1, we were dealing with the space (40) all along.
Note that any cubature for Pol t (G k,d ) can be considered as a finitely supported probabilistic cubature, provided that the weights are nonnegative. Another example, of course, is σ k,d itself.
In the remainder of this section, we let each ω j = 1 n and choose each P j according to a probabilistic cubature ν k,d . In that case, {P j } n j=1 is a collection of random matrices and the expected value of the gap, that is, the squared cubature error, can be computed explicitly. Denoting the expectation with respect to the random choice of {P j } n j=1 by E P , and using that E P K t (P i , P i ) = k t and E P K t (P i , P j ) = λ t if i = j, we get Thus, letting n grow faster than k t ensures that the expected value of the gap becomes arbitrarily small. In the following theorem, we show that this expected behavior happens with overwhelming probability.
Theorem 5.3. If {P j } n j=1 are chosen independently identically distributed with respect to a probabilistic cubature ν k,d for Pol t (G k,d ) and τ > 0, then .
Proof. First, we note that P i , P j t = P ⊗t i , P ⊗t j , where the Hilbert-Schmidt inner product on the right-hand side is on the Hilbert space (R d×d ) ⊗t ≃ R d 2 t . Thus, i,j K t (P i , P j ) = j P ⊗t j 2 HS holds. We define the averaged tensor power Λ t = E P P ⊗t 1 . For P 0 ∈ G k,d , we can compute Since ν k,d is a probabilistic cubature and P 0 , · s ∈ Pol t (G k,d ), we obtain Let U ∈ O d be such that U * D k U = P 0 , where D k denotes the diagonal matrix with k ones and zeros elsewhere. The commutativity of the trace and the orthogonal invariance of σ k,d yield By applying the probabilistic cubature property once more, we derive Thus, the term P ⊗t 0 , Λ t does not depend on the particular choice of P 0 ∈ G k,d . Averaging over all P 0 with respect to σ k,d then implies that for each i, Similarly to the above computations, the probabilistic cubature property also yields (45) Λ t 2 HS = λ t . By applying (44) and (45), taking Y j = (P ⊗t Hence, Y j HS ≤ 1 and E P Y j = 0, so that Minsker's vector-valued Bernstein inequality [23, Corollary 5.1] provides, for all τ > 0, where Ψ τ and r τ are as stated. To finish the proof, we observe that When n tends to infinity, then r τ (n) → 1 + 6(1−λt/k t ) 2 τ 4 and Ψ τ (n) → τ 2 /2 1−λt/k t . Thus, for large n, the distribution of the gap concentrates near zero at the same rate as the expected value. Since the gap is the square of the maximal cubature error, we conclude a probabilistic construction of approximate cubatures.
Corollary 5.4. If {P j } n j=1 are chosen independently from a probabilistic cubature for Pol t (G k,d ) and τ > 0, then a (1+τ 2 )k t −λt n -approximate cubature for Pol t (G k,d ) with respect to K t is obtained with probability bounded below by 1 − 4e −Ψτ (n) r τ (n).

Error propagation for ǫ-approximate cubatures
The numerical optimization approach in general can provide cubatures up to machine precision only. Therefore, we are dealing with ǫ-approximate cubatures and this is also what we obtain from the random constructions. In these cases, the moment reconstruction formulas in Corollary 2.5 hold up to some error term: Theorem 6.1. Let X ∈ S d−1 be a random vector and {(P j , ω j )} n j=1 be an ǫapproximate cubature for Pol t (G k,d ) with respect to K t . Then (7) in Corollary 2.3 holds up to a constant c α times ǫ, i.e., for α ∈ N d , |α| = t, EX α − |β|≤t a α β n j=1 ω j E(P j X) β ≤ ǫc α .
If k = 1 and X is random vector in R d , then (30) in Corollary 3.5 holds up to a constant times ǫE X t .
The above theorem verifies that the cubature error propagates in a linear fashion when it comes to the moment reconstruction formulas. It should be mentioned though that the constant c α depends on k and d.
Proof. According to Theorem 2.1, we derive, for x ∈ S d−1 Since the function F α x = t s=1 m i=1 f α s,i ·, E x,y α s,i s is an element in Pol t (G k,d ), the cubature property yields The coefficients a α β in Corollary 2.3 are used with c α := sup x∈S d−1 F α x Kt to derive (46).
The second part of the theorem can be verified in an analogous fashion, so we omit the details.

Concluding remarks
Our results appear to match reasonable characteristics in distributed sensing. We require a rather large set of sensors (projectors) and we assume that the highdimensional signal is modeled by means of a probability distribution. The sensors are deterministic and can even be given by the experimental setup as long as we are able to find weights such that projectors and weights altogether form a cubature. Each sensor must reconstruct the first few moments of the projection marginal distribution, which may allow in practice for fewer data samples than for estimating the marginal distribution itself. In the end, the first few moments of the highdimensional random signal can be computed with low costs by a closed formula.
As far as we know, the present paper is a first attempt to address this type of moment recovery problem with tools from harmonic analysis. Further investigations are necessary to combine those ideas with proper statistical estimation techniques, in which the low-dimensional moments are estimated from acquired data. This is intended to be addressed in forthcoming work.