Abstract
We consider the problem of identifying a mixture of Gaussian distributions with the same unknown covariance matrix by their sequence of moments up to certain order. Our approach rests on studying the moment varieties obtained by taking special secants to the Gaussian moment varieties, defined by their natural polynomial parametrization in terms of the model parameters. When the order of the moments is at most three, we prove an analogue of the Alexander–Hirschowitz theorem classifying all cases of homoscedastic Gaussian mixtures that produce defective moment varieties. As a consequence, identifiability is determined when the number of mixed distributions is smaller than the dimension of the space. In the twocomponent setting, we provide a closed form solution for parameter recovery based on moments up to order four, while in the onedimensional case we interpret the rank estimation problem in terms of secant varieties of rational normal curves.
Similar content being viewed by others
1 Introduction
In the context of algebraic statistics [19], moments of probability distributions have recently been explored from an algebraic and geometric point of view [1, 4, 11, 13]. The key point for this connection is that in many cases the sets of moments define algebraic varieties, hence called moment varieties. In the case of moments of mixture distributions, there is a natural correspondence to secant varieties of the moment varieties. Studying geometric invariants such as their dimension reveals properties such as model identifiability. One of the main applications for statistical inference is in the context of the method of moments, which matches the distribution’s moments to moment estimates obtained from a sample.
Gaussian mixtures are a prominent statistical model with multiple applications (see [3] and references therein). They are probability distributions on \(\mathbb {R}^n\) with a density that is a convex combination of Gaussian densities:
where \(\mu _1,\ldots ,\mu _k\in \mathbb {R}^n\) are the k means, \(\varSigma _1,\ldots ,\varSigma _k \in {\text {Sym}}^2(\mathbb {R}^n)\) are the covariance matrices, and the \(0\le \lambda _i \le 1\) with \(\lambda _1+\cdots +\lambda _k=1\) are the mixture weights.
The starting point is thus the Gaussian moment variety \({\mathcal {G}}_{n,d}\), as introduced in [4], whose points are the vectors of all moments of order at most d of an ndimensional Gaussian distribution. The moments corresponding to the mixture density (1) form the secant variety \({\text {Sec}}_k({\mathcal {G}}_{n,d})\), and identifiability in this general setting was the focus of [5].
In this work, we study special families of Gaussian mixtures, called homoscedastic mixtures, where all the Gaussian components share the same covariance matrix. In other words, a homoscedastic Gaussian mixture has a density of the form
where the Gaussian probability densities \(f_{{\mathcal {N}}_(\mu _i,\varSigma )}(x)\) have all different means \(\mu _i\) and same covariance matrix \(\varSigma \). The moments, up to order d, of homoscedastic Gaussian mixtures are still polynomials in the parameters (the means and the covariance matrix), and form the moment variety \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\). This is a set of special ksecants inside the secant variety \({\text {Sec}}_k({\mathcal {G}}_{n,d})\).
The main question we are concerned with is: when can a general homoscedastic kmixture of ndimensional Gaussians be identified by its moments of order d? More precisely, denote by \(\varTheta ^H_{n,k}\) the parameter space of means, covariances and mixture weights for homoscedastic mixtures, and the moment map by
The mixture parameters of a point on the moment variety \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\) can be uniquely recovered if the fiber of the moment map (3) is a singleton up to natural permutations of the parameters. If this happens for a general point on the moment variety, we say that the mixture is rationally identifiable from its moments up to order d. If the fiber of a general point is finite, we say that we have algebraic identifiability. The parameters are not identifiable if the general fiber of the moment map has positive dimension.
If the dimension of the parameter space is larger than the dimension of the space of moments, then one may expect any moment to lie on the moment variety. Clearly, the fiber of the moment map must have positive dimension and we cannot have identifiability. We therefore distinguish the unexpected cases: when the dimension of the moment variety is less than the dimension of both the parameter space and the moment space, then we say that the moment variety \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\) is defective. In particular, defectivity implies nonidentifiability.
We illustrate with an example:
Example 1
Let \(n=2\), \(k=2\) and \(d=3\). That is, we consider moments up to order three for the homoscedastic mixture of two Gaussians in \(\mathbb {R}^2\). The Gaussian moment variety \({\mathcal {G}}_{2,3}\) is 5dimensional with 2 parameters for the mean vector and 3 for the symmetric covariance matrix. The parameters for the homoscedastic mixture are two mean vectors \(\mu _1= \begin{pmatrix} \mu _{11} \\ \mu _{12} \\ \end{pmatrix}\) and \(\mu _2= \begin{pmatrix} \mu _{21} \\ \mu _{22} \\ \end{pmatrix}\), the common covariance \(\varSigma = \begin{pmatrix} \sigma _{11} &{}\sigma _{12}\\ \sigma _{12} &{}\sigma _{22}\\ \end{pmatrix}\) and the mixture weight \(\lambda \) of the first component, in total \(2 \times 2 + 3 + 1 = 8\) parameters. On the other hand, there are 9 bivariate moments up to order 3. Explicitly, the map is:
Since there are more moments than parameters, one would expect that the mixture parameters can be recovered. However, the dimension of \(\mathrm{Sec}^H_2({\mathcal {G}}_{2,3})\) equals 7. This is one less than the expected dimension of 8. Therefore, it is defective and there is no algebraic identifiability. This means that the method of moments is doomed to fail in this setting. However, if one measures moments up to order \(d=4\), it is possible to uniquely recover the mixture parameters.
As is often observed [1, 4, 13], a change of coordinates to cumulants tends to yield simpler representations and faster computations. This is the case here and hence we also study the cumulant varieties of the homoscedastic Gaussian mixtures. For Example 1, the moment variety in cumulant coordinates is simply the cone over a twisted cubic curve (see Example 5). This is not a coincidence, as is shown in Sect. 3.
Our main results, Theorems 2 and 3, identify the defective homoscedastic moment varieties when \(d=3\) and show that the homoscedastic moment variety is not defective when \(k\le n+1\). These are analogues of the Alexander–Hirschowitz theorem on secantdefective Veronese varieties [2].
This paper is organized as follows: In Sect. 2 we present the connection between moments and cumulants. The moment varieties corresponding to homoscedastic mixtures are defined in Sect. 3. In Sect. 4 we give general algebraic identifiability considerations and do a careful analysis of the subcases \(d=3\), \(k=2\) and \(n=1\). Finally, we conclude with a summary of results and list further research directions.
2 Moments and Cumulants
To get started, we make some remarks about moments and cumulants from an algebraic perspective. To a sufficiently integrable random variable X on \(\mathbb {R}^n\), associate its moments \(m_{a_1,\ldots ,a_n}[X]\) and cumulants \(\kappa _{a_1,\ldots ,a_n}[X]\) through the generating functions in \(\mathbb {R}[\![u_1,\ldots ,u_n]\!]\):
The information obtained from moments is equivalent to that from cumulants, since they are obtained from one another through the simple transformations
which are welldefined, because the 0th moment is always one, whereas the 0th cumulant is always zero: \(m_{0}[X]=1,\kappa _0[X]=0\) for every random variable X. In particular, moments and cumulants take values in the affine hyperplanes \(\mathbb {A}^M_n\) and \(\mathbb {A}^K_n\) of \(\mathbb {R}[\![u_1,\ldots ,u_n]\!]\) defined by
We call these hyperplanes the moment space and the cumulant space.
Taking only moments up to order d, replace the ring \(\mathbb {R}[\![ u_1,\ldots ,u_n ]\!]\) of power series with the truncated ring \(\mathbb {R}[\![u_1,\ldots ,u_n ]\!]/(u_1,\ldots ,u_n)^{d+1}\), and everything goes through. In particular, there is an analogous definition of the affine hyperplanes \(\mathbb {A}^M_{n,d}\) and \(\mathbb {A}^K_{n,d}\) which we also call moment space and cumulant space.
Example 2
(Dirac distribution) Let \(\mu =(\mu _1,\ldots ,\mu _n)\) in \(\mathbb {R}^n\) be a point. The Dirac distribution \(\delta _{\mu }\) with center \(\mu \) on \(\mathbb {R}^n\) is given by
If X is a random variable on \(\mathbb {R}^n\) with this distribution, its momentgenerating function is
The moments of X are monomials evaluated at \(\mu \). On the other hand, for the cumulant generating function
the linear cumulants coincide with the coordinates of \(\mu \), and the higher order cumulants are all zero.
This has an immediate translation into algebrogeometric terms: the parameter space for all Dirac distributions is the space \(\mathbb {R}^n\), and the image of the moment map of degree d, \(M:\mathbb {R}^n \rightarrow \mathbb {A}^M_{n,d}\) is the affine dth Veronese variety \(V_{n,d} \subseteq \mathbb {A}^M_{n,d}\). On the other hand, the image of the cumulant map \(K:\mathbb {R}^n \rightarrow \mathbb {A}^K_{n,d}\) is the linear subspace given by \(\{ \kappa _2 = \kappa _3 = \cdots = \kappa _d = 0 \}\), where \(\kappa _i\) is the degree ipart of an element in \(\mathbb {A}^K_{n,d}\).
Example 3
(Gaussian distribution) Let \(\mu \in \mathbb {R}^n\) be a point, and \(\varSigma \in {\text {Sym}}^2\mathbb {R}^n\) an \(n\times n\) symmetric and positivedefinite matrix. The Gaussian distribution on \(\mathbb {R}^n\) with mean \(\mu \) and covariance matrix \(\varSigma \) is given by the density
If \(X\sim {\mathcal {N}}(\mu ,\varSigma )\) is a Gaussian random variable with these parameters, its momentgenerating function and cumulantgenerating function are given by
The Gaussian moment variety \({\mathcal {G}}_{n,d}\subseteq \mathbb {A}^M_{n,d}\) consists of all Gaussian moments up to order d. Observe that the corresponding cumulant variety is given simply by the linear subspace \(\{ \kappa _3 = \cdots = \kappa _d = 0 \} \subseteq \mathbb {A}^K_{n,d}\).
While our focus is on Gaussian distributions, our approach applies to general location families that admit moment and cumulant varieties. We illustrate this with the next example.
Example 4
(Laplace distribution) The (symmetric) multivariate Laplace distribution has a location parameter \(\mu \in \mathbb {R}^n\) and a covariance parameter \(\varSigma \), a positivedefinite \(n\times n\) matrix. Its density function involves the modified Bessel function of the second kind (see [12, Chapter 5]), but it can be defined via its simpler moment generating function:
with radius of convergence such that \(u^t \varSigma u < 2 \).
Moments and cumulants up to order \(d=3\) match with the Gaussian case. Also note that when \(\varSigma = 0\), the Dirac moment generating function is recovered. However, when \(d \ge 4\), the Laplace cumulants are no longer a linear space in the cumulant space.
The multiplicative structure of the power series ring \(\mathbb {R}[\![ u_1,\cdots ,u_n ]\!]\) makes it particularly suitable to independence statements with respect to moments. Indeed, if X, Y are two independent random variables on \(\mathbb {R}^n\), then
With cumulants it is even simpler: it holds that
The group of affine transformations \({\text {Aff}}(\mathbb {R}^n)\) acts naturally on both moments and cumulants: indeed, for any \(A\in GL(n,\mathbb {R})\) and \(b\in \mathbb {R}^n\) and a random variable X on \(\mathbb {R}^n\),
and
In particular, note that translations correspond simply to translations in cumulant coordinates, whereas they induce a more complicated expression in moment coordinates.
3 Homoscedastic Secants
When Karl Pearson introduced Gaussian mixtures to model subpopulations of crabs [18], he also proposed the method of moments in order to estimate the parameters. The basic idea is to compute sample moments from observed data, and match them to the distribution’s moments expressed in terms of the unknown parameters. The method of moments estimates are the parameters that solve these equations. This is a classical estimation method in statistics; a good survey is [16], and a recent ‘denoised’ version for Gaussian mixtures is [21].
The method of moments is very friendly for mixture models because computing moments of mixture densities is straightforward, since for every measurable function \(g:\mathbb {R}^n \rightarrow \mathbb {R}\)
and thus, the moments are just linear combinations of the corresponding Gaussian moments.
As hinted in the introduction, this discussion can be rephrased in geometric terms: let \({\mathcal {G}}_{n,d}\subseteq \mathbb {A}^M_{n,d}\) be the Gaussian moment variety on \(\mathbb {R}^n\) of order d. Then, the moments of mixtures of Gaussians are linear combinations of points in \({\mathcal {G}}_{n,d}\), so that their corresponding variety is the kth secant variety \({\text {Sec}}_k({\mathcal {G}}_{n,d})\).
The densities of homoscedastic Gaussian mixtures, where the Gaussian components share a common covariance matrix, have the form:
where the \(\mu _i \in \mathbb {R}^n\) are the mean parameters, the \(\varSigma \in {\text {Sym}}^2 \mathbb {R}^n\) is the common covariance parameters, and the \(\lambda _i \in \mathbb {R}\) with \(\lambda _1+\cdots +\lambda _k = 1\) are the mixture parameters. Thus, the parameter space for homoscedastic mixtures is
and it has dimension
The moment map for homoscedastic mixtures is then an algebraic map
Points on the image, the moments of homoscedastic mixtures, are linear combinations of points in \({\mathcal {G}}_{n,d}\subseteq \mathbb {A}^M_{n,d}\) which share the same covariance matrix.
Definition 1
The homoscedastic ksecant variety, denoted \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\), is the image of the moment map \(M_{n,k,d}\). The fiber dimension \(\varDelta ^H_{n,k,d}\) is the general fiber dimension of the map \(M_{n,k,d}\),
We say that \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\) is algebraically identifiable if \(\varDelta ^H_{n,k,d}=0\).
The feasibility of the method of moments is based on computing points on the fibers of the moment map \(M_{n,k,d}\). Algebraic identifiability of \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\) means that a general homoscedastic Gaussian mixture in the homoscedastic ksecant variety is identifiable from its moments up to order d in the sense that only finitely many Gaussian mixture distributions share the same moments up to order d, whereas we reserve the term rationally identifiable if a general fiber consists of a single point, up to label swapping. In case the general fiber is not finite, then it is positivedimensional, there is no identifiability of the parameters from the moments up to order d, and a higher order is needed for identifiability (cf. Remark 4 and [4, Problem 17]).
Since the dimension of \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\) is always bounded by the dimension of the ambient space \(\mathbb {A}^M_{n,d}\), there is a simple estimate for the fiber dimension:
Lemma 1
For all n, d, k it holds that
Proof
The moment space \(\mathbb {A}^M_{n,d}\) is an affine hyperplane inside the vector space \(\mathbb {R}[[ u_1,\ldots ,u_n ]]/(u_1,\ldots ,u_n)^{d+1}\); hence, it has dimension
Since \({\text {Sec}}^H_k({\mathcal {G}}_{n,d}) \subseteq \mathbb {A}^M_{n,d}\), note that
which is exactly the inequality in the statement. \(\square \)
We expect that in general situations the inequality (18) is in fact an equality. Hence, define the defect to be
We say that \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\) is defective if \(\delta ^H_{n,k,d}>0\). As observed earlier, defectivity implies nonidentifiability.
3.1 Cumulant Representation
Let us explore how homoscedastic secants become simpler in cumulant coordinates, and how this representation can be used to check identifiability.
First, rephrase the situation in terms of random variables: let \(Z=Z_{\varSigma }\) be a Gaussian random variable with mean 0 and covariance matrix \(\varSigma \), and let \(B=B_{(\mu _1,\ldots ,\mu _k),(\lambda _1,\ldots ,\lambda _k)}\) an independent random variable with distribution given by a mixture of Dirac distributions:
Then, the random variable \(Z+B\) has density given by the homoscedastic mixture (1). Moreover, if \(m=\mu _1\lambda _1+\cdots +\mu _k\lambda _k\) is the mean of B, we write \(B=A+m\), where A is a centered mixture of Dirac distributions.
One can compute cumulants of this random variable as follows:
and this suggests to parametrize the homoscedastic secants in cumulant coordinates as follows:
where \(\varTheta _{n,k}^0\) parametrizes the centered mixtures of Dirac distributions
The cumulant homoscedastic secant variety \(\log ({\text {Sec}}^H_{k}({\mathcal {G}}_{n,d}))\) is the image of the map K. Since in this variety, one can freely translate by the elements in \(\mathbb {R}^n\) and \({\text {Sym}}^2(\mathbb {R}^n)\), the first cumulants and the second cumulants can take any value. The constraints are in the cumulants of order three and higher. We summarize this discussion in the following lemma.
Lemma 2
Let \(\mathbb {A}^{K,3}_{n,d}\) be the space of cumulants of order at least three and at most d, let
be the cumulant map and let \(C^0_{n,k,d}\) denote the closure \(\overline{\phi _{n,k,d}(\varTheta ^0_{n,k})}\). Then, the cumulant homoscedastic secant variety \(\log ({\text {Sec}}^H_k({\mathcal {G}}_{n,d}))\) is a cone over \(C_{n,k,d}^0\).
Remark 1
In particular, the equations for the cumulant homoscedastic secant variety \(\log ({\text {Sec}}^H({\mathcal {G}}_{n,d}))\) inside \(\mathbb {A}^{K}_{n,d}\) are exactly the same as the equations for \(C^0_{n,k,d}\) inside \(\mathbb {A}^{K,3}_{n,d}\).
The fiber dimension \(\varDelta _{n,k,d}^{H}\) can also be computed as the fiber dimension of the map \(\phi _{n,k,d}\):
Lemma 3
The fiber dimension \(\varDelta ^H_{n,k,d}\) is equal to the fiber dimension of \(\phi _{n,k,d}\). In other words
Proof
The fiber dimension is the difference \(\dim \varTheta ^H_{n,k}  \dim \log ({\text {Sec}}^H_{k}({\mathcal {G}}_{n,d}))\). We know that \(\varTheta ^H_{n,k} \cong \varTheta ^0_{n,k} \times \mathbb {R}^n \times {\text {Sym}}^2\mathbb {R}^n\). Moreover, Lemma 2 says that \(\log ({\text {Sec}}^H_{k}({\mathcal {G}}_{n,d}))\) is the cone over \(C^0_{n,k,d}\), which is precisely \(\mathbb {R}^n\times {\text {Sym}}^2\mathbb {R}^n \times C^0_{n,k,d}\), so that the first equality follows. For the second equality, the dimension of \(\varTheta ^0_{n,k}\) can be computed as \(nk+k1n = (n+1)(k1)\). \(\square \)
Example 5
(\(n=k=2 \) , \( d=3\)) Revisiting Example 1 from the introduction, we concluded that \(\mathrm{Sec}^H_2({\mathcal {G}}_{2,3}) \subset \mathbb {A}^M_{2,3} \cong \mathbb {A}^9\) is expected to be a hypersurface but it is actually of codimension 2. The ideal of \(\mathrm{Sec}^H_2({\mathcal {G}}_{2,3})\) is Cohen–Macaulay and determinantal (generated by the maximal minors of a \(6 \times 5\)matrix) as described in [4, Proposition 19]. The homoscedastic cumulant variety \(\log ({\text {Sec}}^H_{2}({\mathcal {G}}_{2,3}))\) is defined by the vanishing of the \(2\times 2\) minors of
Note that indeed the first and secondorder cumulants \(k_{10},k_{01},k_{20},k_{11},k_{22}\) do not appear in the equations above, so that the cumulant variety is the cone over the twisted cubic curve.
Remark 2
To estimate the mixture parameters from the cumulants, it is enough to consider the map \(\phi _{n,k,d}\) of Lemma 2. Indeed, suppose that we have a homoscedastic mixture with parameters \((((\lambda _1,\ldots ,\lambda _k),(\mu _1,\ldots ,\mu _k)),m,\varSigma ) \in \varTheta ^0_{n,k}\times \mathbb {R}^n \times {\text {Sym}}^2\mathbb {R}^n\) and suppose that its cumulants are known, so that in polynomial form
Then, to recover the parameters one can first try to recover the \(\lambda _i\) and the \(\mu _i\) from the cumulants of order three and higher, and then compute m and \(\varSigma \) from the cumulants of order one and two.
3.2 Veronese Secants
We briefly observe that we can recast the above discussion in a way that makes apparent the connection to mixtures of Dirac distributions and, hence, to secants of Veronese varieties. To work with classical secant varieties, this time we work in moment coordinates. Now, every homoscedastic mixture is the distribution of a random variable of the form \(Z+B\), where B is a mixture of Dirac distributions and Z is a centered Gaussian of covariance \(\varSigma \), independent from B. Thus, the moment generating function of this variable is
Therefore, the role of the covariance parameter is decoupled from the others: In particular, for \(\varSigma =0\), one obtains the moment variety for mixtures of Dirac distributions. When restricting to moments \(M(u)_d\) of degree at most d, this is precisely the ksecants to the Veronese variety \({\text {Sec}}_k({\mathcal {V}}_{n,d})\). The additive group \({\text {Sym}}^2\mathbb {R}^n\) acts on the moment space \(\mathbb {A}^M_{n,d}\) by
and so (28) says that \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\) is the union of all the orbits of the points in \({\text {Sec}}_k({\mathcal {V}}_{n,d})\) under this action.
This is useful because we can exploit wellknown results on secants of Veronese varieties to address identifiability. First, let \(\varDelta ^{{\mathcal {V}}}_{n,k,d}\) denote the fiber dimension of the ksecants to the Veronese variety \({\text {Sec}}_k({\mathcal {V}}_{n,d}) \subseteq \mathbb {A}^M_{n,d}\): by definition, this is
A basic estimate for the dimension of \({\text {Sec}}_k({\mathcal {V}}_{n,d})\) is given by the dimension of the ambient space \(\dim \mathbb {A}^{M}_{n,d} = \left( {\begin{array}{c}n+d\\ d\end{array}}\right) 1\), hence
so that we can define the defect of the ksecants to the Veronese variety as
This number was famously computed by Alexander and Hirschowitz [2], see also [7]:
Theorem 1
(Alexander–Hirschowitz) The defect for the Veronese variety is always zero, except in the following exceptional cases
Moreover, for a general point \(M(u)\in {\text {Sec}}_k({\mathcal {V}}_{n,d})\), consider the closed subset of \({\text {Sym}}^2\mathbb {R}^n\) given by
We have the following relation between the fiber dimensions (17) and (30):
Proposition 1
It holds that
where \(M\in {\text {Sec}}_k({\mathcal {V}}_{n,d})\) is a general point.
Proof
By the previous discussion, the moment map for homoscedastic mixtures factors as a composition of two surjective maps
Hence, the fiber dimension of the composite map is the sum of the fiber dimensions of the two factors. For the first one this is \(\varDelta ^{{\mathcal {V}}}_{n,k,d}\), so it remains to consider the second. Denote the second factor by \(\rho :{\text {Sym}}^2(\mathbb {R}^n) \times {\text {Sec}}_k({\mathcal {V}}_{n,d}) \rightarrow {\text {Sec}}^H_{k}({\mathcal {V}}_{n,d})\) and let \((\varSigma _o,M_o(u)) \in {\text {Sym}}^2(\mathbb {R}^n) \times {\text {Sec}}_k({\mathcal {V}}_{n,d})\) be a general point. The fiber is
concluding the proof. \(\square \)
Remark 3
In the range \((n+1)\left( k + \frac{n}{2}\right) \le \left( {\begin{array}{c}n+d\\ d\end{array}}\right) \) where we expect identifiability for homoscedastic Gaussian mixtures, we see that \(\varDelta ^H_{n,k,d}=\delta ^H_{n,k,d}\), and Alexander–Hirschowitz says that \(\varDelta ^{{\mathcal {V}}}_{n,k,d} = \delta ^{{\mathcal {V}}}_{n,k,d} = 0\). Hence, Proposition 1 yields
4 Moment Identifiability
Now we start to determine identifiability in various cases. To do so, it is convenient to change notation slightly. Up to now, we have identified moments and cumulants with their corresponding generating functions. In the next sections, it is useful to identify the parameters with polynomials as well. We replace the location parameter \(\mu = (\mu _1,\ldots ,\mu _n)\) with the corresponding linear polynomial \(u^t\mu =\mu _1u_1+\cdots +\mu _nu_n\) and we replace the covariance parameter \(\varSigma \) with the quadric \(\frac{1}{2}u^t\varSigma u\). Of course, the two representations are equivalent, but the polynomial formalism is better suited to the cumulant space and the moment space. In particular, the linear polynomials live in the dual vector space \(V={\text {Hom}}(\mathbb {R}^n,\mathbb {R})\), whereas the quadratic polynomials live in \({\text {Sym}}^2 V\).
The next inequality reflects the fact that increasing the order of moments (or cumulants) measured results in better identifiability:
Lemma 4
The fiber dimensions of general fibers of \(M_{n,k,d}\) and \(M_{n,k,d+1}\) satisfy:
Proof
By definition, the fiber dimension \(\varDelta ^H_{n,k,d}\) is the dimension of a general nonempty fiber of the moment map \(M_{n,k,d}:\varTheta ^H_{n,k,d} \rightarrow \mathbb {A}^M_{n,d}\). However, this map is the composition of the map \(M_{n,k,d+1}:\varTheta ^H_{n,k,d} \rightarrow \mathbb {A}^M_{n,d+1}\) and the projection map \(\mathbb {A}^M_{n,d+1}\rightarrow \mathbb {A}^M_{n,d}\), that forgets the moments of order \(d+1\), so the conclusion follows. \(\square \)
Remark 4
Since Gaussian mixtures are identifiable from finitely many moments (see, e.g., [4]), the sequence
must stabilize at 0 for some large enough d.
The following observation is less trivial. It allows a reduction to the case \(n=k1\).
Proposition 2
Suppose that \(d\ge 3\) and \(n\ge k1\), then
Proof
Use Lemma 3, which says that the fiber dimension \(\varDelta ^H_{n,k,d}\) is equal to the fiber dimension of the map
This dimension can be computed by looking at the differential of the map at a general point. The parameter space is defined as
Let \(p=((\lambda _1,\ldots ,\lambda _k),(L_1,\ldots ,L_k)) \in \varTheta ^0_{n,k}\) be a general point. Then, the tangent space to \(\varTheta ^{0}_{n,k}\) at the point is given by
The fiber dimension of \(\phi _{n,k,d}\) coincides with the dimension of the kernel of the differential \(d\phi _{n,k,d}\) at the general point p. In particular, since the point is general and \(n\ge k1\), we can suppose that \(L_i=u_i\) for \(i=1,\ldots ,k1\) and that all the \(\lambda _i\) are nonzero. In particular \(L_k\) is a linear combination of \(u_1,\ldots ,u_{k1}\). Now, we claim that if \(((\varepsilon _1,\ldots ,\varepsilon _k),(H_1,\ldots ,H_k))\) is in the kernel of \(d\phi _{n,k,d}\) then the only variables appearing in the \(H_i\) are \(u_1,\ldots ,u_{k1}\). If this is true, then we are done, because the kernel of \(d\phi _{n,k,d}\) coincides with the kernel of \(d\phi _{k1,k,d}\) at the point \(((\lambda _1,\ldots ,\lambda _k),(L_1,\ldots ,L_k)) \in \varTheta ^0_{k1,k}\)
To prove the claim, observe that the map is given by the cumulant functions \(\phi _{n,k,d}=(\kappa _3,\kappa _4,\ldots ,\kappa _d)\), so the kernel of \(d\phi _{n,k,d}\) equals the intersection of the kernels of the \(d\kappa _i\) for \(i=3,\ldots ,d\). Therefore, it is enough to prove the analogous claim for the kernel of the differential \(d\kappa _3\) of \(\kappa _3\). Since the first moment is zero by construction, the third cumulant coincides with the third moment
Hence, the differential is the linear map
and if \(((\varepsilon _1,\ldots ,\varepsilon _k),(H_1,\ldots ,H_k))\) is in the kernel, then it must be that
Since \(\lambda _k\ne 0\), this is equivalent to \(\sum _{i=1}^k h_i(\lambda _kL_i)^2 = 0\) and since \(\lambda _1L_1+\cdots +\lambda _kL_k=0\), we see that
By assumption \(L_i=u_i\) for \(i=1,\ldots ,k1\), so this last expression is equal to zero if and only if
If this is true, then \(h_k\) uses only the variables \(u_1,\ldots ,u_{k1}\). Indeed, if some other variable, say y, appears in \(h_k\), then on the righthand side there is the monomial \(yu_1u_2\), while there is no such a monomial on the lefthand side. Likewise, if the variable y appears in one of the \(h_i\) for \(i=1,\ldots ,k1\): then on the lefthand side there would be a monomial of the form \(yu_i^2\), while there is no such monomial on the right hand side.
Hence, the \(h_i\) are polynomials in the \(u_1,\ldots ,u_k\), and, by definition of the \(h_i\), it follows that the same holds for the \(H_i\). This proves the claim and the result follows. \(\square \)
4.1 Moments Up to Order \(d=3\)
When \(d=3\) we determine the defect \({\delta }^H_{n,k,3}\) and the fiber dimension \({\varDelta }^H_{n,k,3}\) of the map
for each n and k, and use Lemma 3. When \(d=3\), the space \(\mathbb {A}^{K,3}_{n,3}\) is identified with the space \({\text {Sym}}^3V\) of homogeneous polynomials of degree three, and as noted in the proof of Proposition 2, the third cumulants coincide with the third moments, so that:
We compute the closure \(C^0_{n,k,3}\) of the image.
Lemma 5
The set \(C^0_{n,k,3}\) is the Zariski closure of
Proof
Recall that
To compute the Zariski closure, suppose that all the \(\lambda _i\) are strictly positive, so that in particular we can write
Since cubic roots are well defined over \(\mathbb {R}\),
where \(H_i := \root 3 \of {\lambda _i} L_i\) for \(i=1,\ldots ,k1\), and \(H_k := \sum _{i=1}^{k1} \left( \frac{\root 3 \of {\lambda _i}}{\root 3 \of {\lambda _k}}\right) ^2 H_i\), using the equality \(\root 3 \of {\lambda _k}\frac{\lambda _i}{\lambda _k}=\left( \frac{\root 3 \of {\lambda _i}}{\root 3 \of {\lambda _k}}\right) ^2\root 3 \of {\lambda _i}\). In particular, this shows immediately that \(\lambda _1L_1^3+\cdots +\lambda _kL_k^3\) can be written as a sum of cubic powers of linearly dependent linear forms.
For the converse, let \(H_1,\ldots ,H_k\) be linearly dependent linear forms. For the Zariski closure, it suffices to assume that \(H_k = \beta _1H_{1}\cdots \beta _{k1}H_{k1}\) for some general \(\beta _1,\ldots ,\beta _{k1} \in \mathbb {R}\) strictly positive. So we want to write
for some positive \(\lambda _1,\ldots ,\lambda _{k}\in \mathbb {R}\) such that \(\lambda _1+\cdots +\lambda _k=1\). Given such \(\lambda _i\), the above computations yields
where \(L_i = \frac{1}{\root 3 \of {\lambda _i}}H_i\) for \(i=1,\ldots ,k1\) and \(L_k = \frac{\lambda _1}{\lambda _k}L_1\cdots \frac{\lambda _{k1}}{\lambda _k}L_{k1}\), so that \(\lambda _1L_1+\cdots +\lambda _kL_k = 0\), as wanted.
To conclude, it remains to show that Eq. (46) have a solution: these equations are equivalent to
Observe that the square roots are well defined since \(\beta _i>0\) for all \(i=1,\ldots ,k1\). Moreover, if \((\lambda _1,\ldots ,\lambda _{k1})\) is a solution to (48), then it is easy to see that all the \(\lambda _i\) must be strictly positive: indeed, since the \(\beta _i\) are positive, \(\lambda _i\) and \(1\lambda _1\cdots \lambda _{k1}\) have the same sign. Thus, if one of the \(\lambda _i\) is negative, then all the \(\lambda _i\) are negative, but then \(1\lambda _1\cdots \lambda _{k1}>0\) which is absurd.
Now, setting \(b_i = \sqrt{\beta _i}^3\), rewrite the equations as the linear system
The matrix determinant lemma gives that \(\det (\mathrm {I}+ b\cdot \mathbb {1}^T) = 1 + \mathbb {1}^T b\) = \(1 + b_1+\cdots +b_{k1}\), which is positive since the \(\beta _i\) are positive. This means the system (49) has a unique solution. \(\square \)
Remark 5
The proof of Lemma 5 actually gives more: indeed, it shows that the image of the positive part
which is the one relevant in statistics, coincides with the set of sums \(\{ H_1(u)^3+\cdots +H_k(u)^3\}\), where the \(H_i\) are positively linearly dependent, meaning that there are coefficients \(\beta _1,\ldots ,\beta _k>0\) such that
Remark 6
The set of sums of cubes of k dependent linear forms has a natural interpretation in terms of the projective Veronese variety: indeed consider the third Veronese embedding of \(\mathbb {P}(V)=\mathbb {P}^{n1}\):
For each \((k2)\)dimensional linear subspace \(\varPi \subseteq \mathbb {P}^{n1}\) let \({\text {Sec}}_k(v_3(\varPi )) \subseteq \mathbb {P}({\text {Sym}}^3V)\) be the kth secant variety of its image \(v_3(\varPi )\). Then, by Lemma 5, the variety \(C^0_{n,k,3}\) is the affine cone over the union of these secants:
We compute the dimension of this variety, dividing it in the cases \(k\le n+1\) and \(k\ge n+1\):
Proposition 3

(i)
If \(k\ge n+1\), then
$$\begin{aligned} \dim C^0_{n,k,3} = \min \left\{ kn , \left( {\begin{array}{c}n+2\\ 3\end{array}}\right) \right\} \end{aligned}$$(53)except in the case \(n=5,k=7\), where \(\dim C^0_{5,7,3} = 34\).

(ii)
If \(k\le n+1\), then
$$\begin{aligned} \varDelta ^H_{n,4,3} = 2, \qquad \varDelta ^H_{n,3,3} = 2, \qquad \varDelta ^H_{n,2,3} = 1. \end{aligned}$$(54)and when \(k\ge 5\),
$$\begin{aligned} \varDelta ^H_{n,k,3} = 0. \end{aligned}$$(55)
Proof
(i) Since \(k\ge n+1\), Remark 6 shows that \(C^0_{n,k,3}\) is the cone over the kth secant variety \({\text {Sec}}_k(v_3(\mathbb {P}^{n1}))\). The dimension of this variety is computed by the Alexander–Hirschowitz theorem, so that
with the single exception of \(n=5,k=7\), where the dimension is one less than the expected, hence \(\dim C^0_{5,7,3} = 34\).
(ii) Since \(k\le n+1\), Proposition 2 shows that \(\varDelta ^H_{n,k,3} = \varDelta ^H_{k1,k,3}\). Hence, for \(k=2,3,4\) we see directly from Table 1 that
For \(k\ge 5\) instead, we follow the proof of Proposition 2 and show that the differential of \(\phi _{k1,k,3}:\varTheta ^0_{k1,k,3} \rightarrow {\text {Sym}}^3 V\) at a general point is injective. For this, consider the kernel of the differential at a point \(p=((\lambda _1,\ldots ,\lambda _k),(L_1,\ldots ,L_k))\). It consists of elements \(((\varepsilon _1,\ldots ,\varepsilon _k),(H_1,\ldots ,H_k)) \in \mathbb {R}^k\times V^k\) such that \(\varepsilon _1+\cdots +\varepsilon _k = 0, \varepsilon _1L_1+\cdots +\varepsilon _kL_k + \lambda _1H_1+\cdots +\lambda _k H_k = 0\) and
where \(\ell _i = \lambda _k^2(3\lambda _iH_i+\varepsilon _iL_i)+\lambda _i^2(3\lambda _kH_k+\varepsilon _kL_k)\) and \(h=3\lambda _k H_k+\varepsilon _kL_k\). Now choose the specific point p given by \(\lambda _i = \frac{1}{k}\) for each \(i=1,\ldots ,k\), \(L_i = u_i\) for \(i=1,\ldots ,k1\) and \(L_k = u_1\cdots u_{k1}\). Then, the above equation becomes
Let us write \(h=h_1u_1+\cdots +h_{k1}u_{k1}\). Then, in (59), the coefficient of \(u_au_bu_c\) is \(\frac{2}{k^2}(h_a+h_b+h_c)\) for all \(1\le a<b<c \le k1\). Hence
Let \(1\le a< b< c< d\le k1\) be any four distinct indices between 1 and \(k1\). Then, the previous equations translate into the linear system
The matrix appearing in the linear system is invertible, so \(h_a=h_b=h_c=h_d=0\). Since this holds for an arbitrary choice of four distinct indices, it follows that \(h=0\). Now, relation (59) tells us that \(\sum _{i=1}^{k1}u_i^2 \ell _i = 0\), but since \(u_1^2,\ldots ,u_{k1}^2\) form a complete intersection of quadrics, they do not have linear syzygies, which implies that \(\ell _i=0\) for each i. From the definitions of \(\ell _i\) and h, it follows that \(3\lambda _iH_i+\varepsilon _iL_i=0\) for each i but then the other two relations \(\sum _i \varepsilon _i=0\) and \(\sum _i (\lambda _iH_i+\varepsilon _iL_i)=0\) imply that \(H_i=0,\varepsilon _i=0\) for all i, which is what was needed. \(\square \)
Now we are ready for a complete classification of defectivity when \(d=3\).
Theorem 2
For \(d=3\), the defect \(\delta ^H_{n,k,3}=0\) for any k and n, with the following exceptions:

\(n\ge k\) and \(k=2\), where \(\delta ^H_{n,2,3}=1\).

\(n\ge k\) and \(k=3,4\), where \(\delta ^H_{n,k,3}=2\).

\(n=5\) and \(k=7\), where \(\delta ^H_{5,7,3}=1\).

\(n\ge 4\) and \(n+1 < k \le \frac{n^2+2n+6}{6}\) where \(\delta ^H_{n,k,3} = kn1\).

\(n\ge 4\) and \(\frac{n^2+2n+6}{6} \le k < \frac{n^2+3n+2}{6}\) where \(\delta ^H_{n,k,3} = n\left( \frac{n^2+3n+2}{6} k \right) \).
Proof
First consider the case when \(n\ge k\): then Proposition 3 applies. It is straightforward to check that \(\delta ^H_{n,k,3} = \varDelta ^H_{n,k,d}\), from which the statement of the theorem follows.
For the cases where \(k\ge n+1\), start with the exceptional case \(n=5,k=7\): Proposition 3 gives that \(\dim \overline{\phi _{n,k,3}(\varTheta ^0_{5,7})} = 34\), and Lemma 3 yields \(\varDelta ^H_{5,7,3} = 2\) and \(\delta ^H_{5,7,3} = 1\).
Now, consider the other cases: Proposition 3 gives that
and then Lemma 3 shows that
so that
Suppose first that \(n=1,2,3\): this implies \(k\ge n+1 \ge \frac{n^2+2n+6}{6} \ge \frac{n^2+3n+2}{6}\) so that
Now, suppose that \(n\ge 4\). Then \(5\le n+1\le \frac{n^2+2n+6}{6} \le \frac{n^2+3n+2}{6}\) and there are three possibilities for k: if \(k\ge \frac{n^2+3n+2}{6}\), then
If instead \(\frac{n^2+2n+6}{6}\le k < \frac{n^2+3n+2}{6}\), then
which is strictly positive. Finally, if \(n+1\le k < \frac{n^2+2n+6}{6}\), the defect is
which is positive if and only if \(k>n+1\). \(\square \)
As a consequence, identifiability can be characterized whenever \(k\le n+1\):
Theorem 3
Suppose \(k\le n+1\). If \(k\ge 5\) then a general homoscedastic mixture is algebraically identifiable from moments up to order 3. If instead \(k=2,3,4\), then a general homoscedastic mixture is algebraically identifiable from the moments up to order \(d=4\).
Proof
When \(k\ge 5\) this follows immediately from Theorem 2 and Lemma 4. If instead \(k=2,3,4\), thanks to Proposition 2, it is enough to set \(n=k1\) and check the first d for which we have identifiability: these are a finite number of cases that can be done by direct computation (e.g., in Macaulay2 [9]), and we find that such a d is 4. \(\square \)
4.2 Mixtures with \(k=2\) Components
When \(k=2\) we characterize the rational identifiability as well. Since the case \(d=3\) is already covered, consider only \(d\ge 4\).
Theorem 4
The homoscedastic secant variety \({\text {Sec}}^H_{2}({\mathcal {G}}_{n,4})\) is algebraically identifiable. If \(d\ge 5\), the homoscedastic secant variety \({\text {Sec}}^H_2({\mathcal {G}}_{n,d})\) is also rationally identifiable.
Proof
By Lemma 3 and Remark 2, it is enough to consider the parameter space given by \(\varTheta ^0_{n,2} = \{ ((L_1,L_2),(\lambda _1,\lambda _2)) \,\, \lambda _1+\lambda _2=1, \lambda _1L_1+\lambda _2L_2 = 0 \}\) and the map
In order to compute the general fiber of this map, note that since \(d\ge 4\), it follows from Theorem 3 and its proof that the map has finite fibers. Hence, it is enough to restrict a general fiber to the open subset \(\lambda _2\ne 0\). There we may assume \(L_2 = \frac{\lambda _1}{\lambda _2}L_1 = \frac{\lambda _1}{1\lambda _1}L_1\). We thus compute the fibers of the induced map
In explicit terms, this map is given by the terms from degree 3 to degree d of the logarithm \(\log (\lambda {\mathrm{e}}^{L}+(1\lambda ){\mathrm{e}}^{\frac{\lambda }{\lambda 1}L})\). A computation shows that the first terms are:
Now suppose that \(d=4\), and let \(L\in V\) and \( \lambda \in \mathbb {R}\setminus \{1\}\) be general elements. In fact, it is enough to assume \(L\ne 0\) and \(\lambda \ne 0,1,\frac{1}{2}\), so that \(\kappa _3 = f_3(\lambda )L^3 \ne 0\). In order to compute the fiber of the point \((\kappa _3,\kappa _4) = F_{n,2,4}(L,\lambda )\), first observe that \(\kappa _3=f_3(\lambda _0)L_0^3 = (\root 3 \of {f_3(\lambda _0)}L_0)^3\) and that the polynomial \(L_0:=\root 3 \of {f_3(\lambda )}L\) can be computed explicitly: from the expression
then one obtains
In particular, \(L=f_3(\lambda )^{\frac{1}{3}}L_0\), so that the equation \(\kappa _4 = f_4(\lambda )L^4\) translates into \(\frac{f_4(\lambda )}{f_3(\lambda )^{\frac{4}{3}}} = \frac{\kappa _4}{L_0^4}\). Observe that \(a := \frac{\kappa _4}{L_0^4}\) is a constant that can be computed explicitly by comparing a single nonzero coefficient of \(L_0^4\) with the corresponding coefficient of \(\kappa _4\): for example, if \(\root 3 \of {\kappa _{300..0}} \ne 0\), then
Now, the equation \(\frac{f_4(\lambda )}{f_3(\lambda )^{\frac{4}{3}}} = a\) is equivalent to \(\frac{f_4(\lambda )^3}{f_3(\lambda )^4} = a^3\), or more explicitly
Note that this expression is invariant under exchanging \(\lambda \) with \(1\lambda \), as is expected from the symmetry of the situation. Hence, set \(\gamma := \lambda (1\lambda )\) and rewrite this expression as
This is a cubic equation with three possible solutions for \(\gamma \), which means there is no rational identifiability. In order to get such, consider also the cumulants \(\kappa _5\) of order 5: this adds the data \(\kappa _5\) and the condition \(\kappa _5 = f_5(\lambda )L^5\). In the above notation \(L=f_3(\lambda )^{\frac{1}{3}}L_0\), so that the condition \(\kappa _5 = f_5(\lambda )L^5\) becomes \(\frac{f_5(\lambda )}{f_3(\lambda )^{\frac{5}{3}}} = \frac{\kappa _5}{L_0^5}\). As before, we see that \(b := \frac{\kappa _5}{L_0^5}\) is a constant that can be computed explicitly by comparing a single nonzero coefficient of \(L_0^5\) with the corresponding coefficient of \(\kappa _5\): for example, if \(\root 3 \of {\kappa _{300..0}} \ne 0\), then
Now, the equation \(\frac{f_5(\lambda )}{f_3(\lambda )^{\frac{5}{3}}} = a\) is equivalent to \(\frac{f_5(\lambda )^3}{f_3(\lambda )^5} = b^3\), or more explicitly, as above, with the substitution \(\gamma = \lambda (1\lambda )\),
Hence, rational identifiability is obtained if the two Eqs. (70) and (72) have a unique common solution \(\gamma \). This means that the map \(\mathbb {R}\dashrightarrow \mathbb {R}^2, \gamma \mapsto (g(\gamma ),h(\gamma ))\) is generically injective. This map extends to \(\mathbb {\mathbb {R}} \rightarrow \mathbb {P}^2\) via
i.e., a map defined by polynomials of degree 7. It is generically injective if and only if the closure of its image is a plane curve of degree 7. This can be verified with Macaulay2 [9]: the resulting curve is given by the equation
\(\square \)
Even though there is no rational identifiability above when \(d=4\), it is worth noting that in a purely statistical setting, \(\gamma \) can be recovered uniquely, as seen below.
Corollary 1
For \(k=2\), the statistical mixture parameters can be recovered uniquely with moments up to order \(d=4\).
Proof
This is equivalent to saying that the Eq. (70) has a unique statistically relevant solution in \(\gamma = \lambda (1 \lambda )\). Note that since \(\lambda \in (0,1)\setminus \{\frac{1}{2} \}\), we have that \(\gamma \in (0, \frac{1}{4})\). Consider the real valued function coming from (70):
Its derivative, \(a'(\gamma ) = \frac{1}{2\root 3 \of {36}\gamma (14\gamma )\root 3 \of {4\gamma (14\gamma )^2}}\), is always negative for \(0<\gamma <\frac{1}{4}\) so that the function \(a(\gamma )\) is strictly decreasing and, in particular, injective in this statistically meaningful interval (Fig. 1).
The corresponding inverse is given by the cubic equation in \(\gamma \)
The discriminant of (74) is \(\varDelta = 3072a^6(64a^3+81)\). It is zero precisely when \(a=\frac{3\root 3 \of {3}}{4}\), which corresponds to the horizontal asymptote of a. If \(a<\frac{3\root 3 \of {3}}{4}\), there are 3 real solutions, but one is negative and the other one is larger than \(\frac{1}{4}\). The remaining solution is also the unique real solution when \(a>\frac{3\root 3 \of {3}}{4}\), given explicitly by
\(\square \)
This proof gives an explicit algorithm to recover the parameters of a homoscedastic mixture of two Gaussians from the cumulants up to order four.
Observe that this algorithm needs all the cumulants of order one, all the cumulants of order two, n cumulants of order three, and one cumulant of order four. Hence, it needs in total \(n+\frac{n(n+1)}{2}+n+1\) cumulants.
Remark 7
We have seen in Remark 6 that \({\text {Sec}}^H_2({\mathcal {G}}_{n,d})\) in cumulant coordinates is a cone over \(C^0_{n,2,d} \subseteq \mathbb {A}^{K,3}_{n,d}\). Up to taking the Zariski closure, the proof of Theorem 4 shows that \(C^0_{n,2,d}\) is the image of the map
For \(\lambda \) constant we get a projected dth Veronese variety of V. If instead L is constant, then we get a rational curve given by a linear combination of \((f_3(\lambda ),f_4(\lambda ),\ldots ,f_d(\lambda ))\).
4.3 The Univariate Case \(n=1\)
We use the standard notation \(\sigma ^2\) for the variance \(\varSigma =( \sigma _{11})\) when \(n=1\).
For \(n=1\), the moment variety \({\text {Sec}}_k^H({\mathcal {G}}_{1,d})\) is never defective. The moment map
is finite to one. In the statistics literature, it is known that in the case of homoscedastic secants, one may recover mixture parameters from given moments (i.e., compute the fiber of the map above), with an algorithm closely related to the wellknown Prony’s method [20]. This procedure was introduced by Lindsay as an application of moment matrices [15] and we briefly recall the algorithm here.
First, how does one recover the locations \(\mu _i\) and weights \(\lambda _i\) of the k components of a Dirac mixture from \(2k1\) moments? This is known as the quadrature rule and it works as follows. Given the moment sequence \(m=(m_1,m_2,\ldots ,m_{2k1})\) one considers the polynomial resulting from the following \((k+1) \times (k+1)\) determinant
The k roots \(\mu _1, \mu _2, \ldots , \mu _k\) of \(P_k(t)\) are precisely the sought locations. This follows since the equations of the secant varieties of the rational normal curve are classically known to be given by the minors of the moment matrices. For a modern reference see [14].
Once the locations are known, the weights \(\lambda _i\) are found by solving the \(k \times k\) Vandermonde linear system
Back to the Gaussian case, if we knew the value of the common variance \( \sigma ^2\), we can reduce to the above instance. In terms of the Gaussian moment generating function:
Hence, the Dirac moments \({\tilde{m}}\) on the right hand side are linear combinations of the Gaussian moments m. Explicitly, for \(1 \le j \le 2k1\)
Applying the quadrature rule to the vector \({\tilde{m}}=({\tilde{m}}_1,{\tilde{m}}_2, \ldots , {\tilde{m}}_{2k1})\) would allow us to obtain the means \(\mu _1, \mu _2, \ldots , \mu _k\).
However, \(\sigma \) is unknown. To find an estimate for \(\sigma \) we consider the first 2k moments \(m = (m_1, m_2, \ldots , m_{2k})\). If \({\tilde{m}}=({\tilde{m}}_1,{\tilde{m}}_2, \ldots , {\tilde{m}}_{2k})\) comes from a mixture of k Dirac measures, then
One thus treats \(\sigma \) as a variable and substitutes expressions (79) into (80). This results in a polynomial \(D_k(\sigma )\) of degree \(\left( {\begin{array}{c}k+1\\ 2\end{array}}\right) \) in \(\sigma ^2\) and the estimator \({\hat{\sigma }}^2\) is obtained as its smallest nonnegative root [15, Theorem 5B]. So the algebraic degree for estimating \(\sigma ^2\) is \(\left( {\begin{array}{c}k+1\\ 2\end{array}}\right) \). With \(\sigma ^2\) specified, one proceeds as above.
More generally, the discussion under (28) shows that the moment variety \({\text {Sec}}_k^H({\mathcal {G}}_{1,d})\) with \(k\le d/2\) is a union
where \(V_{1,d}^{\sigma }\) is the translation of the moment curve \(V_{1,d}\) by the variance \(\sigma ^2\) as defined by the Gaussian moments. The secant variety \({\text {Sec}}_k(V_{1,d}^{\sigma })\) is defined for each \(\sigma \) by the \((k+1)\times (k+1)\) minors of
As soon as the kth secant variety of a smooth curve is not linear, the curve can be recovered as the singular locus of highest multiplicity in the secant variety. Therefore, since curves \(V_{1,d}^{\sigma }\) are distinct, their kth secant varieties are distinct as well, as long as the latter are not linear. In particular, since the variety \({\text {Sec}}_k(V_{1,d}^{\sigma })\) has dimension \(2k1\), it follows that the union \({\text {Sec}}_k^H({\mathcal {G}}_{1,d})\) has dimension 2k. Given the moments \(m_i\) up to degree d of a point on a homoscedastic ksecant, the \((k+1)\times (k+1)\) minors of \(M_{k,d}\) are polynomials in \(\sigma ^2\) with a zero at the common variance. Given the variance, the means can be inferred as above.
When \(d=2k+1\), then the variety \({\text {Sec}}_k^H({\mathcal {G}}_{1,d})\subset \mathbb {A}^M_{1,2k+1}\) is a hypersurface, defined by the resultant of \((k+1)\)minors of \(M_{k,d}\), the polynomial obtained by elimination of \(\sigma ^2\) in the ideal defined by the \((k+1)\times (k+1)\) minors. Denote this polynomial by \(P_{2k+1}.\) It is a polynomial in \(m_1,\ldots ,m_{2k+1}\) (or \(\kappa _3,\kappa _4,\ldots ,\kappa _{2k+1}\)). For example,
Proposition 4
The polynomial \(P_{2k+1}\) is homogeneous of total degree
in the multigraded weights \(\deg m_i=\deg \kappa _i = i\).
Proof
Let
where \(\sigma \) is the last coordinate, and consider the projective closure \(\mathbb {P}\) of \(\mathbb {A}\). Then, the matrix (81) defines a map between vector bundles E and F on \(\mathbb {A}\). The vector bundles E and F and the map extends to \(\mathbb {P}\); E extends to a sum of line bundles \({{\tilde{E}}}=\mathcal{O}_\mathbb {P}\oplus \mathcal{O}_\mathbb {P}(1)\oplus \ldots ,\mathcal{O}_\mathbb {P}(k)\), while F extends to a sum of line bundles \({{\tilde{F}}}=\mathcal{O}_\mathbb {P}\oplus \mathcal{O}_\mathbb {P}(1)\oplus \ldots ,\mathcal{O}_\mathbb {P}(k+1)\). By the Thom–Porteous formula, [8, Theorem 14.4], the degree in \(\mathbb {P}\) of the rank k locus of the map is given by the Chern class
since the Chern polynomials of \({{\tilde{E}}}\) and \({{\tilde{F}}}\) in are
and
This rank k locus has codimension 2 and its intersection with \(\mathbb {A}\) is projected to the hypersurface defined by \(P_{2k+1}\) in \(\mathbb {A}^M_{1,2k+1}\). The coordinate \(\sigma \) appears only in even degree in the equations defining the rank k locus, so the projection to \(\mathbb {A}^M_{1,2k+1}\) is 2 : 1, so the degree of \(P_{2k+1}\) is half the degree of the rank k locus. \(\square \)
Question 1
It would be interesting to understand better the structure of the polynomials \(P_{2k+1}\), e.g., is there a closed form expression for all k?
If \(P_{2k+1}\) vanishes on a set \((m_1,\ldots ,m_{2k+1})\) of moments, and \(P_{2l+1}\) does not vanish on \((m_1,\ldots ,m_{2l+1})\) for any \(l<k\), then the moments lie on a homoscedastic ksecant but not on any l secant for \(l<k\). Therefore the polynomials \(P_{2k+1}\) may be used to estimate the number of components in a homoscedastic Gaussian mixture (compare to the rank test proposed in [15, Section 3.1] for the known variance case).
5 Conclusion
We have completely classified all defective cases for the moment varieties associated with homoscedastic Gaussian mixtures whenever \(k<n+1\), \(d=3\), \(k=2\) or \(n=1\). The question concerning a complete classification for all n, d, k remains open, although our computations did not reveal any further defective examples.
Our identifiability results also cover special structures in the covariance matrix, by Remark 2. For example, a common mixture submodel involves isotropic Gaussians, which means that the covariance matrix is a scalar multiple of the identity, \(\varSigma = \sigma I\). The kmeans algorithm used in clustering can be interpreted as parameter estimation for a homoscedastic isotropic mixture of Gaussians. In [10], Hsu and Kakade consider the learning of mixtures of isotropic Gaussians from the moments up to order \(d=3\) when \(k \le n+1\). They prove identifiability for the homoscedastic isotropic submodel (see [6, Theorem 3.2]), and in order to solve the moment equations, they find orthogonal decompositions of the second and third order moment tensors.
On the other hand, in [17] Lindsay and Basak proposed a ‘fast consistent’ method of moments for homoscedastic Gaussian mixtures in the multivariate case, based on a ‘primary axis’ to which the onedimensional case presented in Sect. 4.3 is applied. This means that the method uses some moments of order 2k. Knowing that in some cases there are explicit equations for secant varieties of higher dimensional Veronese varieties [14], an alternative method with minimal order based on these should be possible.
Finally, a similar approach can be made to study moment varieties of homoscedastic mixtures of other location families. In the case of Example 4, we saw that Gaussian moments and Laplacian moments coincide up to \(d=3\). This means that Theorem 2 applies verbatim to homoscedastic mixtures of Laplace distributions.
References
Agostini, D., Améndola, C.: Discrete Gaussian distributions via theta functions. SIAM Journal on Applied Algebra and Geometry 3(1), 1–30 (2019)
Alexander, J., Hirschowitz, A.: Polynomial interpolation in several variables. Journal of Algebraic Geometry 4(2), 201–222 (1995)
Améndola, C.: Algebraic statistics of Gaussian mixtures. Ph.D. thesis, Technische Universität Berlin (2017). https://doi.org/10.14279/depositonce6557
Améndola, C., Faugère, J.C., Sturmfels, B.: Moment varieties of Gaussian mixtures. Journal of Algebraic Statistics 7, 14–28 (2016)
Améndola, C., Ranestad, K., Sturmfels, B.: Algebraic identifiability of Gaussian mixtures. International Mathematics Research Notices (2017)
Anandkumar, A., Ge, R., Hsu, D.J., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. Journal of Machine Learning Research 15(1), 2773–2832 (2014)
Brambilla, M.C., Ottaviani, G.: On the Alexander–Hirschowitz theorem. Journal of Pure and Applied Algebra 212(5), 1229–1251 (2008)
Fulton, W.: Intersection theory, vol. 2. Springer, UK (2013)
Grayson, D.R., Stillman, M.E.: Macaulay 2, a software system for research in algebraic geometry (2002)
Hsu, D., Kakade, S.M.: Learning mixtures of spherical Gaussians: moment methods and spectral decompositions. In: Proceedings of the 4th conference on Innovations in Theoretical Computer Science, pp. 11–20. ACM (2013)
Kohn, K., Shapiro, B., Sturmfels, B.: Moment varieties of measures on polytopes. Annali della Scuola Normale Superiore di Pisa (Classe Scienze), Serie V (2019). https://doi.org/10.2422/20362145.201808003
Kotz, S., Kozubowski, T., Podgorski, K.: The Laplace distribution and generalizations: a revisit with applications to communications, economics, engineering, and finance. Springer, UK (2012)
Koutsoumpelias, A.G., Wageringel, M.: Moment ideals of local Dirac mixtures. SIAM Journal on Applied Algebra and Geometry 4(1), 1–27 (2020)
Landsberg, J.M., Ottaviani, G.: Equations for secant varieties of Veronese and other varieties. Annali di Matematica Pura ed Applicata 192(4), 569–606 (2013)
Lindsay, B.G.: Moment matrices: applications in mixtures. The Annals of Statistics pp. 722–740 (1989)
Lindsay, B.G.: Method of moments. Wiley StatsRef: Statistics Reference Online (2014)
Lindsay, B.G., Basak, P.: Multivariate normal mixtures: a fast consistent method of moments. Journal of the American Statistical Association 88(422), 468–476 (1993)
Pearson, K.: Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London. A 185, 71–110 (1894)
Sullivant, S.: Algebraic Statistics, Graduate Studies in Mathematics, vol. 194. American Mathematical Society, Providence, RI (2018)
Weiss, L., McDonough, R.: Prony’s method, ztransforms, and Padé approximation. Siam Review 5(2), 145–149 (1963)
Wu, Y., Yang, P.: Optimal estimation of Gaussian mixtures via denoised method of moments. The Annals of Statistics, to appear. arXiv preprint arXiv:1807.07237 (2018)
Acknowledgements
Open Access funding provided by Projekt DEAL. The authors are grateful to the Max Planck Institute for Mathematics in the Sciences, Leipzig and the Institute for Computational and Experimental Research in Mathematics in Providence, RI for facilitating discussions about this work. We thank anonymous referees for suggestions to improve the presentation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Peter Bürgisser.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
C. Améndola was partially supported by the Deutsche Forschungsgemeinschaft (DFG) in the context of the Emmy Noether junior research group KR 4512/11.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Agostini, D., Améndola, C. & Ranestad, K. Moment Identifiability of Homoscedastic Gaussian Mixtures. Found Comput Math 21, 695–724 (2021). https://doi.org/10.1007/s10208020094696
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10208020094696