Moment Identifiability of Homoscedastic Gaussian Mixtures

We consider the problem of identifying a mixture of Gaussian distributions with the same unknown covariance matrix by their sequence of moments up to certain order. Our approach rests on studying the moment varieties obtained by taking special secants to the Gaussian moment varieties, defined by their natural polynomial parametrization in terms of the model parameters. When the order of the moments is at most three, we prove an analogue of the Alexander–Hirschowitz theorem classifying all cases of homoscedastic Gaussian mixtures that produce defective moment varieties. As a consequence, identifiability is determined when the number of mixed distributions is smaller than the dimension of the space. In the two-component setting, we provide a closed form solution for parameter recovery based on moments up to order four, while in the one-dimensional case we interpret the rank estimation problem in terms of secant varieties of rational normal curves.


Introduction
In the context of algebraic statistics [19], moments of probability distributions have recently been explored from an algebraic and geometric point of view [1,4,11,13].
The key point for this connection is that in many cases the sets of moments define algebraic varieties, hence called moment varieties. In the case of moments of mixture distributions, there is a natural correspondence to secant varieties of the moment varieties. Studying geometric invariants such as their dimension reveals properties such as model identifiability. One of the main applications for statistical inference is in the context of the method of moments, which matches the distribution's moments to moment estimates obtained from a sample.
Gaussian mixtures are a prominent statistical model with multiple applications (see [3] and references therein). They are probability distributions on R n with a density that is a convex combination of Gaussian densities: where μ 1 , . . . , μ k ∈ R n are the k means, Σ 1 , . . . , Σ k ∈ Sym 2 (R n ) are the covariance matrices, and the 0 ≤ λ i ≤ 1 with λ 1 + · · · + λ k = 1 are the mixture weights. The starting point is thus the Gaussian moment variety G n,d , as introduced in [4], whose points are the vectors of all moments of order at most d of an n-dimensional Gaussian distribution. The moments corresponding to the mixture density (1) form the secant variety Sec k (G n,d ), and identifiability in this general setting was the focus of [5].
In this work, we study special families of Gaussian mixtures, called homoscedastic mixtures, where all the Gaussian components share the same covariance matrix. In other words, a homoscedastic Gaussian mixture has a density of the form where the Gaussian probability densities f N ( μ i ,Σ) (x) have all different means μ i and same covariance matrix Σ. The moments, up to order d, of homoscedastic Gaussian mixtures are still polynomials in the parameters (the means and the covariance matrix), and form the moment variety Sec H k (G n,d ). This is a set of special k-secants inside the secant variety Sec k (G n,d ).
The main question we are concerned with is: when can a general homoscedastic k-mixture of n-dimensional Gaussians be identified by its moments of order d? More precisely, denote by Θ H n,k the parameter space of means, covariances and mixture weights for homoscedastic mixtures, and the moment map by M n,k,d : Θ H n,k → Sec H k (G n,d ).
The mixture parameters of a point on the moment variety Sec H k (G n,d ) can be uniquely recovered if the fiber of the moment map (3) is a singleton up to natural permutations of the parameters. If this happens for a general point on the moment variety, we say that the mixture is rationally identifiable from its moments up to order d. If the fiber of a general point is finite, we say that we have algebraic identifiability. The parameters are not identifiable if the general fiber of the moment map has positive dimension.
If the dimension of the parameter space is larger than the dimension of the space of moments, then one may expect any moment to lie on the moment variety. Clearly, the fiber of the moment map must have positive dimension and we cannot have identifiability. We therefore distinguish the unexpected cases: when the dimension of the moment variety is less than the dimension of both the parameter space and the moment space, then we say that the moment variety Sec H k (G n,d ) is defective. In particular, defectivity implies non-identifiability.
As is often observed [1,4,13], a change of coordinates to cumulants tends to yield simpler representations and faster computations. This is the case here and hence we also study the cumulant varieties of the homoscedastic Gaussian mixtures. For Example 1, the moment variety in cumulant coordinates is simply the cone over a twisted cubic curve (see Example 5). This is not a coincidence, as is shown in Sect. 3.
Our main results, Theorems 2 and 3, identify the defective homoscedastic moment varieties when d = 3 and show that the homoscedastic moment variety is not defective when k ≤ n + 1. These are analogues of the Alexander-Hirschowitz theorem on secant-defective Veronese varieties [2]. This paper is organized as follows: In Sect. 2 we present the connection between moments and cumulants. The moment varieties corresponding to homoscedastic mixtures are defined in Sect. 3. In Sect. 4 we give general algebraic identifiability considerations and do a careful analysis of the subcases d = 3, k = 2 and n = 1. Finally, we conclude with a summary of results and list further research directions.

Moments and Cumulants
To get started, we make some remarks about moments and cumulants from an algebraic perspective. To a sufficiently integrable random variable X on R n , associate its moments m a 1 ,...,a n [X ] and cumulants κ a 1 ,...,a n [X ] through the generating functions in R[[u 1 , . . . , u n ]]: u a 1 1 . . . u a n n a 1 ! . . . a n ! K X (u) = (a 1 ,...,a n ) κ a 1 ,...,a n [X ] u a 1 1 . . . u a n n a 1 ! . . . a n ! .
The information obtained from moments is equivalent to that from cumulants, since they are obtained from one another through the simple transformations which are well-defined, because the 0-th moment is always one, whereas the 0-th cumulant is always zero: m 0 [X ] = 1, κ 0 [X ] = 0 for every random variable X . In particular, moments and cumulants take values in the affine hyperplanes A M n and A K n of R[[u 1 , . . . , u n ]] defined by We call these hyperplanes the moment space and the cumulant space. Example 2 (Dirac distribution) Let μ = (μ 1 , . . . , μ n ) in R n be a point. The Dirac distribution δ μ with center μ on R n is given by If X is a random variable on R n with this distribution, its moment-generating function is ...,a n ) μ a 1 1 . . . μ a n n u a 1 1 . . . u a n n a 1 ! . . . a n ! .
The moments of X are monomials evaluated at μ. On the other hand, for the cumulant generating function the linear cumulants coincide with the coordinates of μ, and the higher order cumulants are all zero. This has an immediate translation into algebro-geometric terms: the parameter space for all Dirac distributions is the space R n , and the image of the moment map of degree d, M : On the other hand, the image of the cumulant map K : R n → A K n,d is the linear subspace given by

Example 3 (Gaussian distribution)
Let μ ∈ R n be a point, and Σ ∈ Sym 2 R n an n × n symmetric and positive-definite matrix. The Gaussian distribution on R n with mean μ and covariance matrix Σ is given by the density If X ∼ N (μ, Σ) is a Gaussian random variable with these parameters, its momentgenerating function and cumulant-generating function are given by The Gaussian moment variety G n,d ⊆ A M n,d consists of all Gaussian moments up to order d. Observe that the corresponding cumulant variety is given simply by the linear subspace {κ 3 = · · · = κ d = 0} ⊆ A K n,d . While our focus is on Gaussian distributions, our approach applies to general location families that admit moment and cumulant varieties. We illustrate this with the next example.

Example 4 (Laplace distribution)
The (symmetric) multivariate Laplace distribution has a location parameter μ ∈ R n and a covariance parameter Σ, a positive-definite n × n matrix. Its density function involves the modified Bessel function of the second kind (see [12,Chapter 5]), but it can be defined via its simpler moment generating function: with radius of convergence such that |u t Σu| < 2.
Moments and cumulants up to order d = 3 match with the Gaussian case. Also note that when Σ = 0, the Dirac moment generating function is recovered. However, when d ≥ 4, the Laplace cumulants are no longer a linear space in the cumulant space.
The multiplicative structure of the power series ring R[[u 1 , · · · , u n ]] makes it particularly suitable to independence statements with respect to moments. Indeed, if X , Y are two independent random variables on R n , then With cumulants it is even simpler: it holds that The group of affine transformations Aff(R n ) acts naturally on both moments and cumulants: indeed, for any A ∈ G L(n, R) and b ∈ R n and a random variable X on R n , and K AX+b (u) = log(M AX+b (u)) = log(e u t b M X (A t u)) = u t b + K X (A t u).
In particular, note that translations correspond simply to translations in cumulant coordinates, whereas they induce a more complicated expression in moment coordinates.

Homoscedastic Secants
When Karl Pearson introduced Gaussian mixtures to model subpopulations of crabs [18], he also proposed the method of moments in order to estimate the parameters. The basic idea is to compute sample moments from observed data, and match them to the distribution's moments expressed in terms of the unknown parameters. The method of moments estimates are the parameters that solve these equations. This is a classical estimation method in statistics; a good survey is [16], and a recent 'denoised' version for Gaussian mixtures is [21].
The method of moments is very friendly for mixture models because computing moments of mixture densities is straightforward, since for every measurable function g : and thus, the moments are just linear combinations of the corresponding Gaussian moments.
As hinted in the introduction, this discussion can be rephrased in geometric terms: let G n,d ⊆ A M n,d be the Gaussian moment variety on R n of order d. Then, the moments of mixtures of Gaussians are linear combinations of points in G n,d , so that their corresponding variety is the k-th secant variety Sec k (G n,d ).
The densities of homoscedastic Gaussian mixtures, where the Gaussian components share a common covariance matrix, have the form: where the μ i ∈ R n are the mean parameters, the Σ ∈ Sym 2 R n is the common covariance parameters, and the λ i ∈ R with λ 1 + · · · + λ k = 1 are the mixture parameters. Thus, the parameter space for homoscedastic mixtures is and it has dimension The moment map for homoscedastic mixtures is then an algebraic map Points on the image, the moments of homoscedastic mixtures, are linear combinations of points in G n,d ⊆ A M n,d which share the same covariance matrix. We The feasibility of the method of moments is based on computing points on the fibers of the moment map M n,k,d . Algebraic identifiability of Sec H k (G n,d ) means that a general homoscedastic Gaussian mixture in the homoscedastic k-secant variety is identifiable from its moments up to order d in the sense that only finitely many Gaussian mixture distributions share the same moments up to order d, whereas we reserve the term rationally identifiable if a general fiber consists of a single point, up to label swapping. In case the general fiber is not finite, then it is positive-dimensional, there is no identifiability of the parameters from the moments up to order d, and a higher order is needed for identifiability (cf. Remark Proof The moment space A M n,d is an affine hyperplane inside the vector space which is exactly the inequality in the statement. We expect that in general situations the inequality (18) is in fact an equality. Hence, define the defect to be We say that Sec H k (G n,d ) is defective if δ H n,k,d > 0. As observed earlier, defectivity implies non-identifiability.

Cumulant Representation
Let us explore how homoscedastic secants become simpler in cumulant coordinates, and how this representation can be used to check identifiability.
First, rephrase the situation in terms of random variables: let Z = Z Σ be a Gaussian random variable with mean 0 and covariance matrix Σ, and let B = B (μ 1 ,...,μ k ),(λ 1 ,...,λ k ) an independent random variable with distribution given by a mixture of Dirac distributions: Then, the random variable Z + B has density given by the homoscedastic mixture (1). Moreover, if m = μ 1 λ 1 + · · · + μ k λ k is the mean of B, we write B = A + m, where A is a centered mixture of Dirac distributions.
One can compute cumulants of this random variable as follows: and this suggests to parametrize the homoscedastic secants in cumulant coordinates as follows: where Θ 0 n,k parametrizes the centered mixtures of Dirac distributions The cumulant homoscedastic secant variety log(Sec H k (G n,d )) is the image of the map K . Since in this variety, one can freely translate by the elements in R n and Sym 2 (R n ), the first cumulants and the second cumulants can take any value. The constraints are in the cumulants of order three and higher. We summarize this discussion in the following lemma.

Lemma 2 Let A K ,3
n,d be the space of cumulants of order at least three and at most d, let be the cumulant map and let C 0 n,k,d denote the closure φ n,k,d (Θ 0 n,k ). Then, the cumulant homoscedastic secant variety log(Sec H k (G n,d )) is a cone over C 0 n,k,d .

Remark 1
In particular, the equations for the cumulant homoscedastic secant variety log(Sec H (G n,d )) inside A K n,d are exactly the same as the equations for C 0 n,k,d inside The fiber dimension Δ H n,k,d can also be computed as the fiber dimension of the map φ n,k,d : Moreover, Lemma 2 says that log(Sec H k (G n,d )) is the cone over C 0 n,k,d , which is precisely R n × Sym 2 R n × C 0 n,k,d , so that the first equality follows. For the second equality, the dimension of Θ 0 n,k can be computed as nk + k − 1 − n = (n + 1)(k − 1). 9 is expected to be a hypersurface but it is actually of codimension 2. The ideal of Sec H 2 (G 2,3 ) is Cohen-Macaulay and determinantal (generated by the maximal minors of a 6 × 5-matrix) as described in [4,Proposition 19]. The homoscedastic cumulant variety log(Sec H 2 (G 2,3 )) is defined by the vanishing of the 2 × 2 minors of Note that indeed the first-and second-order cumulants k 10 , k 01 , k 20 , k 11 , k 22 do not appear in the equations above, so that the cumulant variety is the cone over the twisted cubic curve.

Remark 2
To estimate the mixture parameters from the cumulants, it is enough to consider the map φ n,k,d of Lemma 2. Indeed, suppose that we have a homoscedastic mixture with parameters (((λ 1 , . . . , λ k ), (μ 1 , . . . , μ k )), m, Σ) ∈ Θ 0 n,k × R n × Sym 2 R n and suppose that its cumulants are known, so that in polynomial form Then, to recover the parameters one can first try to recover the λ i and the μ i from the cumulants of order three and higher, and then compute m and Σ from the cumulants of order one and two.

Veronese Secants
We briefly observe that we can recast the above discussion in a way that makes apparent the connection to mixtures of Dirac distributions and, hence, to secants of Veronese varieties. To work with classical secant varieties, this time we work in moment coordinates. Now, every homoscedastic mixture is the distribution of a random variable of the form Z + B, where B is a mixture of Dirac distributions and Z is a centered Gaussian of covariance Σ, independent from B. Thus, the moment generating function of this variable is Therefore, the role of the covariance parameter is decoupled from the others: In particular, for Σ = 0, one obtains the moment variety for mixtures of Dirac distributions. When restricting to moments M(u) d of degree at most d, this is precisely the k-secants to the Veronese variety Sec k (V n,d ). The additive group Sym 2 R n acts on the moment space A M n,d by and so (28) says that Sec H k (G n,d ) is the union of all the orbits of the points in Sec k (V n,d ) under this action. This is useful because we can exploit well-known results on secants of Veronese varieties to address identifiability. First, let Δ V n,k,d denote the fiber dimension of the k-secants to the Veronese variety Sec A basic estimate for the dimension of Sec k (V n,d ) is given by the dimension of the so that we can define the defect of the k-secants to the Veronese variety as This number was famously computed by Alexander and Hirschowitz [2], see also [7]: The defect for the Veronese variety is always zero, except in the following exceptional cases Moreover, for a general point M(u) ∈ Sec k (V n,d ), consider the closed subset of Sym 2 R n given by We have the following relation between the fiber dimensions (17) and (30): where M ∈ Sec k (V n,d ) is a general point.
Proof By the previous discussion, the moment map for homoscedastic mixtures factors as a composition of two surjective maps Hence, the fiber dimension of the composite map is the sum of the fiber dimensions of the two factors. For the first one this is Δ V n,k,d , so it remains to consider the second. Denote the second factor by ρ : concluding the proof.

Moment Identifiability
Now we start to determine identifiability in various cases. To do so, it is convenient to change notation slightly. Up to now, we have identified moments and cumulants with their corresponding generating functions. In the next sections, it is useful to identify the parameters with polynomials as well. We replace the location parameter μ = (μ 1 , . . . , μ n ) with the corresponding linear polynomial u t μ = μ 1 u 1 +· · ·+μ n u n and we replace the covariance parameter Σ with the quadric 1 2 u t Σu. Of course, the two representations are equivalent, but the polynomial formalism is better suited to the cumulant space and the moment space. In particular, the linear polynomials live in the dual vector space V = Hom(R n , R), whereas the quadratic polynomials live in Sym 2 V . The next inequality reflects the fact that increasing the order of moments (or cumulants) measured results in better identifiability:

Lemma 4
The fiber dimensions of general fibers of M n,k,d and M n,k,d+1 satisfy:

Remark 4
Since Gaussian mixtures are identifiable from finitely many moments (see, e.g., [4]), the sequence must stabilize at 0 for some large enough d.
The following observation is less trivial. It allows a reduction to the case n = k − 1.

Proposition 2 Suppose that d ≥ 3 and n
Proof Use Lemma 3, which says that the fiber dimension Δ H n,k,d is equal to the fiber dimension of the map This dimension can be computed by looking at the differential of the map at a general point. The parameter space is defined as Let p = ((λ 1 , . . . , λ k ), (L 1 , . . . , L k )) ∈ Θ 0 n,k be a general point. Then, the tangent space to Θ 0 n,k at the point is given by The fiber dimension of φ n,k,d coincides with the dimension of the kernel of the differential dφ n,k,d at the general point p. In particular, since the point is general and n ≥ k − 1, we can suppose that L i = u i for i = 1, . . . , k − 1 and that all the λ i are nonzero. In particular L k is a linear combination of u 1 , . . . , u k−1 . Now, we claim that if ((ε 1 , . . . , ε k ), (H 1 , . . . , H k )) is in the kernel of dφ n,k,d then the only variables appearing in the H i are u 1 , . . . , u k−1 . If this is true, then we are done, because the kernel of dφ n,k,d coincides with the kernel of dφ k−1,k,d at the point ((λ 1 , . . . , λ k ), (L 1 , . . . , L k ) To prove the claim, observe that the map is given by the cumulant functions φ n,k,d =  (κ 3 , κ 4 , . . . , κ d ), so the kernel of dφ n,k,d equals the intersection of the kernels of the dκ i for i = 3, . . . , d. Therefore, it is enough to prove the analogous claim for the kernel of the differential dκ 3 of κ 3 . Since the first moment is zero by construction, the third cumulant coincides with the third moment Hence, the differential is the linear map Since λ k = 0, this is equivalent to k i=1 h i (λ k L i ) 2 = 0 and since λ 1 L 1 +· · ·+λ k L k = 0, we see that By assumption L i = u i for i = 1, . . . , k − 1, so this last expression is equal to zero if and only if If this is true, then h k uses only the variables u 1 , . . . , u k−1 . Indeed, if some other variable, say y, appears in h k , then on the right-hand side there is the monomial yu 1 u 2 , while there is no such a monomial on the left-hand side. Likewise, if the variable y appears in one of the h i for i = 1, . . . , k − 1: then on the left-hand side there would be a monomial of the form yu 2 i , while there is no such monomial on the right hand side.
Hence, the h i are polynomials in the u 1 , . . . , u k , and, by definition of the h i , it follows that the same holds for the H i . This proves the claim and the result follows.

Moments Up to Order d = 3
When d = 3 we determine the defect δ H n,k,3 and the fiber dimension Δ H n,k,3 of the map for each n and k, and use Lemma 3. When d = 3, the space A K ,3 n,3 is identified with the space Sym 3 V of homogeneous polynomials of degree three, and as noted in the proof of Proposition 2, the third cumulants coincide with the third moments, so that: We compute the closure C 0 n,k,3 of the image.

Lemma 5
The set C 0 n,k,3 is the Zariski closure of To compute the Zariski closure, suppose that all the λ i are strictly positive, so that in particular we can write Since cubic roots are well defined over R, In particular, this shows immediately that λ 1 L 3 1 + · · ·+λ k L 3 k can be written as a sum of cubic powers of linearly dependent linear forms.
For the converse, let H 1 , . . . , H k be linearly dependent linear forms. For the Zariski closure, it suffices to assume that H k = −β 1 H 1 − · · · − β k−1 H k−1 for some general β 1 , . . . , β k−1 ∈ R strictly positive. So we want to write for some positive λ 1 , . . . , λ k ∈ R such that λ 1 + · · · + λ k = 1. Given such λ i , the above computations yields where To conclude, it remains to show that Eq. (46) have a solution: these equations are equivalent to Observe that the square roots are well defined since β i > 0 for all i = 1, . . . , k − 1. Moreover, if (λ 1 , . . . , λ k−1 ) is a solution to (48), then it is easy to see that all the λ i must be strictly positive: indeed, since the β i are positive, λ i and 1 − λ 1 − · · · − λ k−1 have the same sign. Thus, if one of the λ i is negative, then all the λ i are negative, but then 1 − λ 1 − · · · − λ k−1 > 0 which is absurd. Now, setting b i = √ β i 3 , rewrite the equations as the linear system The matrix determinant lemma gives that det(I+b·1 T ) = 1+1 T b = 1+b 1 +· · ·+b k−1 , which is positive since the β i are positive. This means the system (49) has a unique solution.

Remark 6
The set of sums of cubes of k dependent linear forms has a natural interpretation in terms of the projective Veronese variety: indeed consider the third Veronese embedding of P(V ) = P n−1 : For each (k − 2)-dimensional linear subspace Π ⊆ P n−1 let Sec k (v 3 (Π )) ⊆ P(Sym 3 V ) be the k-th secant variety of its image v 3 (Π ). Then, by Lemma 5, the variety C 0 n,k,3 is the affine cone over the union of these secants: We compute the dimension of this variety, dividing it in the cases k ≤ n + 1 and k ≥ n + 1: and when k ≥ 5, Proof (i) Since k ≥ n + 1, Remark 6 shows that C 0 n,k,3 is the cone over the k-th secant variety Sec k (v 3 (P n−1 )). The dimension of this variety is computed by the Alexander-Hirschowitz theorem, so that dim C 0 n,k,3 = min kn, with the single exception of n = 5, k = 7, where the dimension is one less than the expected, hence dim C 0 5,7,3 = 34.
(ii) Since k ≤ n + 1, Proposition 2 shows that Δ H n,k,3 = Δ H k−1,k,3 . Hence, for k = 2, 3, 4 we see directly from Table 1 that For k ≥ 5 instead, we follow the proof of Proposition 2 and show that the differential of φ k−1,k,3 : Θ 0 k−1,k,3 → Sym 3 V at a general point is injective. For this, consider the kernel of the differential at a point p = ((λ 1 , . . . , λ k ), (L 1 , . . . , L k )). It consists of elements ((ε 1 , . . . , ε k ), (H 1 , . . . , H k )) ∈ R k × V k such that ε 1 + · · · + ε k = 0, ε 1 L 1 + · · · + ε k L k + λ 1 H 1 + · · · + λ k H k = 0 and and h = 3λ k H k + ε k L k . Now choose the specific point p given by λ i = 1 k for each i = 1, . . . , k, L i = u i for i = 1, . . . , k − 1 and L k = −u 1 − · · · − u k−1 . Then, the above equation becomes Let us write h = h 1 u 1 + · · · + h k−1 u k−1 . Then, in (59), the coefficient of u a u b u c is 2 Hence The matrix appearing in the linear system is invertible, so h a = h b = h c = h d = 0. Since this holds for an arbitrary choice of four distinct indices, it follows that h = 0. Now, relation (59) tells us that k−1 i=1 u 2 i i = 0, but since u 2 1 , . . . , u 2 k−1 form a complete intersection of quadrics, they do not have linear syzygies, which implies that i = 0 for each i. From the definitions of i and h, it follows that 3λ i H i + ε i L i = 0 for each i but then the other two relations i ε i = 0 and i (λ i H i + ε i L i ) = 0 imply that H i = 0, ε i = 0 for all i, which is what was needed. Now we are ready for a complete classification of defectivity when d = 3.  where δ H n,k,3 = n n 2 +3n+2 Proof First consider the case when n ≥ k: then Proposition 3 applies. It is straightforward to check that δ H n,k,3 = Δ H n,k,d , from which the statement of the theorem follows. For the cases where k ≥ n + 1, start with the exceptional case n = 5, k = 7: Proposition 3 gives that dim φ n,k,3 (Θ 0 5,7 ) = 34, and Lemma 3 yields Δ H 5,7,3 = 2 and δ H 5,7,3 = 1. Now, consider the other cases: Proposition 3 gives that dim φ n,k,3 (Θ n,k ) = min nk, n + 2 3 (62) and then Lemma 3 shows that − max 0, (n + 1) k − n 2 + 2n + 6 6 .
As a consequence, identifiability can be characterized whenever k ≤ n + 1: Theorem 3 Suppose k ≤ n + 1. If k ≥ 5 then a general homoscedastic mixture is algebraically identifiable from moments up to order 3. If instead k = 2, 3, 4, then a general homoscedastic mixture is algebraically identifiable from the moments up to order d = 4.
Proof When k ≥ 5 this follows immediately from Theorem 2 and Lemma 4. If instead k = 2, 3, 4, thanks to Proposition 2, it is enough to set n = k − 1 and check the first d for which we have identifiability: these are a finite number of cases that can be done by direct computation (e.g., in Macaulay2 [9]), and we find that such a d is 4.

Mixtures with k = 2 Components
When k = 2 we characterize the rational identifiability as well. Since the case d = 3 is already covered, consider only d ≥ 4.

Theorem 4
The homoscedastic secant variety Sec H 2 (G n,4 ) is algebraically identifiable. If d ≥ 5, the homoscedastic secant variety Sec H 2 (G n,d ) is also rationally identifiable. Proof By Lemma 3 and Remark 2, it is enough to consider the parameter space given by Θ 0 n,2 = {((L 1 , L 2 ), (λ 1 , λ 2 )) | λ 1 + λ 2 = 1, λ 1 L 1 + λ 2 L 2 = 0} and the map In order to compute the general fiber of this map, note that since d ≥ 4, it follows from Theorem 3 and its proof that the map has finite fibers. Hence, it is enough to restrict a general fiber to the open subset λ 2 = 0. There we may assume L 2 = − λ 1 λ 2 L 1 = − λ 1 1−λ 1 L 1 . We thus compute the fibers of the induced map In explicit terms, this map is given by the terms from degree 3 to degree d of the logarithm log(λe L + (1 − λ)e − λ λ−1 L ). A computation shows that the first terms are: Now suppose that d = 4, and let L ∈ V and λ ∈ R \ {1} be general elements. In fact, it is enough to assume L = 0 and λ = 0, 1, 1 2 , so that κ 3 = f 3 (λ)L 3 = 0. In order to compute the fiber of the point (κ 3 , κ 4 ) = F n,2,4 (L, λ), first observe that 3 and that the polynomial L 0 := 3 √ f 3 (λ)L can be computed explicitly: from the expression κ 3 = κ 300..0 u 3 1 + κ 030..0 u 3 2 + · · · + κ 00..03 u 3 n + ( terms with mixed monomials ) then one obtains In particular, Now, the equation f 4 (λ) f 3 (λ) 4 3 = a is equivalent to f 4 (λ) 3 f 3 (λ) 4 = a 3 , or more explicitly Note that this expression is invariant under exchanging λ with 1−λ, as is expected from the symmetry of the situation. Hence, set γ := λ(1 − λ) and rewrite this expression as This is a cubic equation with three possible solutions for γ , which means there is no rational identifiability. In order to get such, consider also the cumulants κ 5 of order 5: this adds the data κ 5 and the condition κ 5 = f 5 (λ)L 5 . In the above notation L = f 3 (λ) − 1 3 L 0 , so that the condition κ 5 = f 5 (λ)L 5 becomes f 5 (λ) f 3 (λ) 5 3 = κ 5 Now, the equation f 5 (λ) 3 , or more explicitly, as above, with the substitution γ = λ(1 − λ), Hence, rational identifiability is obtained if the two Eqs. (70) and (72) have a unique common solution γ . This means that the map R R 2 , γ → (g(γ ), h(γ )) is generically injective. This map extends to R → P 2 via i.e., a map defined by polynomials of degree 7. It is generically injective if and only if the closure of its image is a plane curve of degree 7. This can be verified with Macaulay2 [9]: the resulting curve is given by the equation Even though there is no rational identifiability above when d = 4, it is worth noting that in a purely statistical setting, γ can be recovered uniquely, as seen below.

Remark 7
We have seen in Remark 6 that Sec H 2 (G n,d ) in cumulant coordinates is a cone over C 0 n,2,d ⊆ A K ,3 n,d . Up to taking the Zariski closure, the proof of Theorem 4 shows that C 0 n,2,d is the image of the map For λ constant we get a projected d-th Veronese variety of V . If instead L is constant, then we get a rational curve given by a linear combination of ( f 3 (λ), f 4 (λ), . . . , f d (λ)).

The Univariate Case n = 1
We use the standard notation σ 2 for the variance Σ = (σ 11 ) when n = 1. For n = 1, the moment variety Sec H k (G 1,d ) is never defective. The moment map fiber of the map above), with an algorithm closely related to the well-known Prony's method [20]. This procedure was introduced by Lindsay as an application of moment matrices [15] and we briefly recall the algorithm here. First, how does one recover the locations μ i and weights λ i of the k components of a Dirac mixture from 2k −1 moments? This is known as the quadrature rule and it works as follows. Given the moment sequence m = (m 1 , m 2 , . . . , m 2k−1 ) one considers the polynomial resulting from the following (k + 1) × (k + 1) determinant The k roots μ 1 , μ 2 , . . . , μ k of P k (t) are precisely the sought locations. This follows since the equations of the secant varieties of the rational normal curve are classically known to be given by the minors of the moment matrices. For a modern reference see [14].
Once the locations are known, the weights λ i are found by solving the k × k Vandermonde linear system Back to the Gaussian case, if we knew the value of the common variance σ 2 , we can reduce to the above instance. In terms of the Gaussian moment generating function: Hence, the Dirac momentsm on the right hand side are linear combinations of the Gaussian moments m. Explicitly, for 1 ≤ j ≤ 2k − 1 Applying the quadrature rule to the vectorm = (m 1 ,m 2 , . . . ,m 2k−1 ) would allow us to obtain the means μ 1 , μ 2 , . . . , μ k . However, σ is unknown. To find an estimate for σ we consider the first 2k moments m = (m 1 , m 2 , . . . , m 2k ). Ifm = (m 1 ,m 2 , . . . ,m 2k ) comes from a mixture of k Dirac measures, then One thus treats σ as a variable and substitutes expressions (79) into (80). This results in a polynomial D k (σ ) of degree k+1 2 in σ 2 and the estimatorσ 2 is obtained as its smallest non-negative root [15,Theorem 5B]. So the algebraic degree for estimating σ 2 is k+1 2 . With σ 2 specified, one proceeds as above. More generally, the discussion under (28) shows that the moment variety where V σ 1,d is the translation of the moment curve V 1,d by the variance σ 2 as defined by the Gaussian moments. The secant variety Sec k (V σ 1,d ) is defined for each σ by the (k + 1) × (k + 1) minors of As soon as the k-th secant variety of a smooth curve is not linear, the curve can be recovered as the singular locus of highest multiplicity in the secant variety. Therefore, since curves V σ 1,d are distinct, their k-th secant varieties are distinct as well, as long as the latter are not linear. In particular, since the variety Sec k (V σ 1,d ) has dimension 2k − 1, it follows that the union Sec H k (G 1,d ) has dimension 2k. Given the moments m i up to degree d of a point on a homoscedastic k-secant, the (k + 1) × (k + 1) minors of M k,d are polynomials in σ 2 with a zero at the common variance. Given the variance, the means can be inferred as above.

Proposition 4
The polynomial P 2k+1 is homogeneous of total degree k + 2 2 k + 1 2 in the multigraded weights deg m i = deg κ i = i.

Proof Let
where σ is the last coordinate, and consider the projective closure P of A. Then, the matrix (81)  This rank k locus has codimension 2 and its intersection with A is projected to the hypersurface defined by P 2k+1 in A M 1,2k+1 . The coordinate σ appears only in even degree in the equations defining the rank k locus, so the projection to A M 1,2k+1 is 2 : 1, so the degree of P 2k+1 is half the degree of the rank k locus. Question 1 It would be interesting to understand better the structure of the polynomials P 2k+1 , e.g., is there a closed form expression for all k?
If P 2k+1 vanishes on a set (m 1 , . . . , m 2k+1 ) of moments, and P 2l+1 does not vanish on (m 1 , . . . , m 2l+1 ) for any l < k, then the moments lie on a homoscedastic k-secant but not on any l secant for l < k. Therefore the polynomials P 2k+1 may be used to estimate the number of components in a homoscedastic Gaussian mixture (compare to the rank test proposed in [15,Section 3.1] for the known variance case).

Conclusion
We have completely classified all defective cases for the moment varieties associated with homoscedastic Gaussian mixtures whenever k < n + 1, d = 3, k = 2 or n = 1. The question concerning a complete classification for all n, d, k remains open, although our computations did not reveal any further defective examples.
Our identifiability results also cover special structures in the covariance matrix, by Remark 2. For example, a common mixture submodel involves isotropic Gaussians, which means that the covariance matrix is a scalar multiple of the identity, Σ = σ I . The k-means algorithm used in clustering can be interpreted as parameter estimation for a homoscedastic isotropic mixture of Gaussians. In [10], Hsu and Kakade consider the learning of mixtures of isotropic Gaussians from the moments up to order d = 3 when k ≤ n + 1. They prove identifiability for the homoscedastic isotropic submodel (see [6,Theorem 3.2]), and in order to solve the moment equations, they find orthogonal decompositions of the second and third order moment tensors.
On the other hand, in [17] Lindsay and Basak proposed a 'fast consistent' method of moments for homoscedastic Gaussian mixtures in the multivariate case, based on a 'primary axis' to which the one-dimensional case presented in Sect. 4.3 is applied. This means that the method uses some moments of order 2k. Knowing that in some cases there are explicit equations for secant varieties of higher dimensional Veronese varieties [14], an alternative method with minimal order based on these should be possible.
Finally, a similar approach can be made to study moment varieties of homoscedastic mixtures of other location families. In the case of Example 4, we saw that Gaussian moments and Laplacian moments coincide up to d = 3. This means that Theorem 2 applies verbatim to homoscedastic mixtures of Laplace distributions.