1 Introduction

In the context of algebraic statistics [19], moments of probability distributions have recently been explored from an algebraic and geometric point of view [1, 4, 11, 13]. The key point for this connection is that in many cases the sets of moments define algebraic varieties, hence called moment varieties. In the case of moments of mixture distributions, there is a natural correspondence to secant varieties of the moment varieties. Studying geometric invariants such as their dimension reveals properties such as model identifiability. One of the main applications for statistical inference is in the context of the method of moments, which matches the distribution’s moments to moment estimates obtained from a sample.

Gaussian mixtures are a prominent statistical model with multiple applications (see [3] and references therein). They are probability distributions on \(\mathbb {R}^n\) with a density that is a convex combination of Gaussian densities:

$$\begin{aligned} \lambda _1f_{{\mathcal {N}}{(\mu _1,\varSigma _1)}}(x) + \cdots + \lambda _k f_{{\mathcal {N}}{(\mu _k,\varSigma _k)}}(x) \end{aligned}$$
(1)

where \(\mu _1,\ldots ,\mu _k\in \mathbb {R}^n\) are the k means, \(\varSigma _1,\ldots ,\varSigma _k \in {\text {Sym}}^2(\mathbb {R}^n)\) are the covariance matrices, and the \(0\le \lambda _i \le 1\) with \(\lambda _1+\cdots +\lambda _k=1\) are the mixture weights.

The starting point is thus the Gaussian moment variety \({\mathcal {G}}_{n,d}\), as introduced in [4], whose points are the vectors of all moments of order at most d of an n-dimensional Gaussian distribution. The moments corresponding to the mixture density (1) form the secant variety \({\text {Sec}}_k({\mathcal {G}}_{n,d})\), and identifiability in this general setting was the focus of [5].

In this work, we study special families of Gaussian mixtures, called homoscedastic mixtures, where all the Gaussian components share the same covariance matrix. In other words, a homoscedastic Gaussian mixture has a density of the form

$$\begin{aligned} \sum _{i=1}^k \lambda _i f_{{\mathcal {N}}{(\mu _i,\varSigma )}}(x) \end{aligned}$$
(2)

where the Gaussian probability densities \(f_{{\mathcal {N}}_(\mu _i,\varSigma )}(x)\) have all different means \(\mu _i\) and same covariance matrix \(\varSigma \). The moments, up to order d, of homoscedastic Gaussian mixtures are still polynomials in the parameters (the means and the covariance matrix), and form the moment variety \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\). This is a set of special k-secants inside the secant variety \({\text {Sec}}_k({\mathcal {G}}_{n,d})\).

The main question we are concerned with is: when can a general homoscedastic k-mixture of n-dimensional Gaussians be identified by its moments of order d? More precisely, denote by \(\varTheta ^H_{n,k}\) the parameter space of means, covariances and mixture weights for homoscedastic mixtures, and the moment map by

$$\begin{aligned} M_{n,k,d}:\varTheta ^H_{n,k}\rightarrow {\text {Sec}}^H_k({\mathcal {G}}_{n,d}). \end{aligned}$$
(3)

The mixture parameters of a point on the moment variety \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\) can be uniquely recovered if the fiber of the moment map (3) is a singleton up to natural permutations of the parameters. If this happens for a general point on the moment variety, we say that the mixture is rationally identifiable from its moments up to order d. If the fiber of a general point is finite, we say that we have algebraic identifiability. The parameters are not identifiable if the general fiber of the moment map has positive dimension.

If the dimension of the parameter space is larger than the dimension of the space of moments, then one may expect any moment to lie on the moment variety. Clearly, the fiber of the moment map must have positive dimension and we cannot have identifiability. We therefore distinguish the unexpected cases: when the dimension of the moment variety is less than the dimension of both the parameter space and the moment space, then we say that the moment variety \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\) is defective. In particular, defectivity implies non-identifiability.

We illustrate with an example:

Example 1

Let \(n=2\), \(k=2\) and \(d=3\). That is, we consider moments up to order three for the homoscedastic mixture of two Gaussians in \(\mathbb {R}^2\). The Gaussian moment variety \({\mathcal {G}}_{2,3}\) is 5-dimensional with 2 parameters for the mean vector and 3 for the symmetric covariance matrix. The parameters for the homoscedastic mixture are two mean vectors \(\mu _1= \begin{pmatrix} \mu _{11} \\ \mu _{12} \\ \end{pmatrix}\) and \(\mu _2= \begin{pmatrix} \mu _{21} \\ \mu _{22} \\ \end{pmatrix}\), the common covariance \(\varSigma = \begin{pmatrix} \sigma _{11} &{}\sigma _{12}\\ \sigma _{12} &{}\sigma _{22}\\ \end{pmatrix}\) and the mixture weight \(\lambda \) of the first component, in total \(2 \times 2 + 3 + 1 = 8\) parameters. On the other hand, there are 9 bivariate moments up to order 3. Explicitly, the map is:

$$\begin{aligned} \begin{array}{lll} m_{10} &{} = &{} \lambda \mu _{11} +(1-\lambda ) \mu _{21} \\ m_{01} &{} = &{} \lambda \mu _{12} +(1-\lambda ) \mu _{22} \\ m_{20} &{} = &{} \lambda (\mu _{11}^2+\sigma _{11})+(1-\lambda ) (\mu _{21}^2+\sigma _{11}) \\ m_{02} &{} = &{} \lambda (\mu _{12}^2+\sigma _{22})+(1-\lambda ) (\mu _{22}^2+\sigma _{22}) \\ m_{11} &{} = &{} \lambda (\mu _{11} \mu _{12}+\sigma _{12}) +(1-\lambda ) (\mu _{21} \mu _{22}+\sigma _{12}) \\ m_{30} &{} = &{} \lambda (\mu _{11}^3+3 \sigma _{11} \mu _{11}) +(1-\lambda ) (\mu _{21}^3+3 \sigma _{11} \mu _{21}) \\ m_{03} &{} = &{} \lambda (\mu _{12}^3+3 \sigma _{22} \mu _{12}) +(1-\lambda ) (\mu _{22}^3+3 \sigma _{22} \mu _{22}) \\ m_{21} &{} = &{} \lambda (\mu _{11}^2 \mu _{12}+\sigma _{11} \mu _{12}+2 \sigma _{12} \mu _{11}) +(1-\lambda ) (\mu _{21}^2 \mu _{22}+\sigma _{11} \mu _{22}+2 \sigma _{12} \mu _{21}) \\ m_{12} &{} = &{} \lambda (\mu _{11} \mu _{12}^2+\sigma _{22} \mu _{11}+2 \sigma _{12} \mu _{12}) +(1-\lambda ) (\mu _{21} \mu _{22}^2+\sigma _{22} \mu _{21}+2 \sigma _{12} \mu _{22}) \\ \end{array} \end{aligned}$$

Since there are more moments than parameters, one would expect that the mixture parameters can be recovered. However, the dimension of \(\mathrm{Sec}^H_2({\mathcal {G}}_{2,3})\) equals 7. This is one less than the expected dimension of 8. Therefore, it is defective and there is no algebraic identifiability. This means that the method of moments is doomed to fail in this setting. However, if one measures moments up to order \(d=4\), it is possible to uniquely recover the mixture parameters.

As is often observed [1, 4, 13], a change of coordinates to cumulants tends to yield simpler representations and faster computations. This is the case here and hence we also study the cumulant varieties of the homoscedastic Gaussian mixtures. For Example 1, the moment variety in cumulant coordinates is simply the cone over a twisted cubic curve (see Example 5). This is not a coincidence, as is shown in Sect. 3.

Our main results, Theorems 2 and 3, identify the defective homoscedastic moment varieties when \(d=3\) and show that the homoscedastic moment variety is not defective when \(k\le n+1\). These are analogues of the Alexander–Hirschowitz theorem on secant-defective Veronese varieties [2].

This paper is organized as follows: In Sect. 2 we present the connection between moments and cumulants. The moment varieties corresponding to homoscedastic mixtures are defined in Sect. 3. In Sect. 4 we give general algebraic identifiability considerations and do a careful analysis of the subcases \(d=3\), \(k=2\) and \(n=1\). Finally, we conclude with a summary of results and list further research directions.

2 Moments and Cumulants

To get started, we make some remarks about moments and cumulants from an algebraic perspective. To a sufficiently integrable random variable X on \(\mathbb {R}^n\), associate its moments \(m_{a_1,\ldots ,a_n}[X]\) and cumulants \(\kappa _{a_1,\ldots ,a_n}[X]\) through the generating functions in \(\mathbb {R}[\![u_1,\ldots ,u_n]\!]\):

$$\begin{aligned} \begin{aligned} M_X(u)&=\sum _{(a_1,\ldots ,a_n)} m_{a_1,\ldots ,a_n}[X]\frac{u_1^{a_1}\ldots u_n^{a_n}}{a_1!\ldots a_n!}\\ K_X(u)&=\sum _{(a_1,\ldots ,a_n)} \kappa _{a_1,\ldots ,a_n}[X]\frac{u_1^{a_1}\ldots u_n^{a_n}}{a_1!\ldots a_n!}. \end{aligned} \end{aligned}$$
(4)

The information obtained from moments is equivalent to that from cumulants, since they are obtained from one another through the simple transformations

$$\begin{aligned} M_X(u) = \exp (K_X(u)), \qquad K_X(u) = \log (M_X(u)) \end{aligned}$$
(5)

which are well-defined, because the 0-th moment is always one, whereas the 0-th cumulant is always zero: \(m_{0}[X]=1,\kappa _0[X]=0\) for every random variable X. In particular, moments and cumulants take values in the affine hyperplanes \(\mathbb {A}^M_n\) and \(\mathbb {A}^K_n\) of \(\mathbb {R}[\![u_1,\ldots ,u_n]\!]\) defined by

$$\begin{aligned} \mathbb {A}^M_n = \left\{ m_0 = 1 \right\} , \qquad \mathbb {A}^K_n = \left\{ \kappa _0 = 0 \right\} . \end{aligned}$$
(6)

We call these hyperplanes the moment space and the cumulant space.

Taking only moments up to order d, replace the ring \(\mathbb {R}[\![ u_1,\ldots ,u_n ]\!]\) of power series with the truncated ring \(\mathbb {R}[\![u_1,\ldots ,u_n ]\!]/(u_1,\ldots ,u_n)^{d+1}\), and everything goes through. In particular, there is an analogous definition of the affine hyperplanes \(\mathbb {A}^M_{n,d}\) and \(\mathbb {A}^K_{n,d}\) which we also call moment space and cumulant space.

Example 2

(Dirac distribution) Let \(\mu =(\mu _1,\ldots ,\mu _n)\) in \(\mathbb {R}^n\) be a point. The Dirac distribution \(\delta _{\mu }\) with center \(\mu \) on \(\mathbb {R}^n\) is given by

$$\begin{aligned} \int _{\mathbb {R}^n} f(x) \delta _\mu (x) := f(\mu ). \end{aligned}$$
(7)

If X is a random variable on \(\mathbb {R}^n\) with this distribution, its moment-generating function is

$$\begin{aligned} M_X(u) = \mathbb {E}[{\mathrm{e}}^{u^tX}] = {\mathrm{e}}^{u^t\mu } = \sum _{(a_1,\ldots ,a_n)} \mu _1^{a_1}\ldots \mu _n^{a_n} \frac{u_1^{a_1}\ldots u_n^{a_n}}{a_1! \dots a_n!}. \end{aligned}$$
(8)

The moments of X are monomials evaluated at \(\mu \). On the other hand, for the cumulant generating function

$$\begin{aligned} K_X(u) = \log M_X(u) = \log {\mathrm{e}}^{u^t\mu } = u^t\mu = \mu _1u_1+\cdots +\mu _nu_n, \end{aligned}$$
(9)

the linear cumulants coincide with the coordinates of \(\mu \), and the higher order cumulants are all zero.

This has an immediate translation into algebro-geometric terms: the parameter space for all Dirac distributions is the space \(\mathbb {R}^n\), and the image of the moment map of degree d, \(M:\mathbb {R}^n \rightarrow \mathbb {A}^M_{n,d}\) is the affine d-th Veronese variety \(V_{n,d} \subseteq \mathbb {A}^M_{n,d}\). On the other hand, the image of the cumulant map \(K:\mathbb {R}^n \rightarrow \mathbb {A}^K_{n,d}\) is the linear subspace given by \(\{ \kappa _2 = \kappa _3 = \cdots = \kappa _d = 0 \}\), where \(\kappa _i\) is the degree i-part of an element in \(\mathbb {A}^K_{n,d}\).

Example 3

(Gaussian distribution) Let \(\mu \in \mathbb {R}^n\) be a point, and \(\varSigma \in {\text {Sym}}^2\mathbb {R}^n\) an \(n\times n\) symmetric and positive-definite matrix. The Gaussian distribution on \(\mathbb {R}^n\) with mean \(\mu \) and covariance matrix \(\varSigma \) is given by the density

$$\begin{aligned} f_{(\mu ,\varSigma )}(x) := \frac{1}{\sqrt{\det (2\pi \varSigma )}} {\mathrm{e}}^{-\frac{1}{2}(x-\mu )^t \varSigma ^{-1} (x-\mu )}. \end{aligned}$$
(10)

If \(X\sim {\mathcal {N}}(\mu ,\varSigma )\) is a Gaussian random variable with these parameters, its moment-generating function and cumulant-generating function are given by

$$\begin{aligned} M_X(u) = {\mathrm{e}}^{u^t\mu + \frac{1}{2}u^t\varSigma u}, \qquad K_X(u) = u^t\mu + \frac{1}{2}u^t\varSigma u. \end{aligned}$$
(11)

The Gaussian moment variety \({\mathcal {G}}_{n,d}\subseteq \mathbb {A}^M_{n,d}\) consists of all Gaussian moments up to order d. Observe that the corresponding cumulant variety is given simply by the linear subspace \(\{ \kappa _3 = \cdots = \kappa _d = 0 \} \subseteq \mathbb {A}^K_{n,d}\).

While our focus is on Gaussian distributions, our approach applies to general location families that admit moment and cumulant varieties. We illustrate this with the next example.

Example 4

(Laplace distribution) The (symmetric) multivariate Laplace distribution has a location parameter \(\mu \in \mathbb {R}^n\) and a covariance parameter \(\varSigma \), a positive-definite \(n\times n\) matrix. Its density function involves the modified Bessel function of the second kind (see [12, Chapter 5]), but it can be defined via its simpler moment generating function:

$$\begin{aligned} M_X(u) = \frac{\exp (u^t \mu )}{1- \frac{1}{2} u^t \varSigma u}, \qquad K_X(u) = u^t\mu - \log \left( 1-\frac{1}{2}u^t\varSigma u\right) \end{aligned}$$
(12)

with radius of convergence such that \(|u^t \varSigma u| < 2 \).

Moments and cumulants up to order \(d=3\) match with the Gaussian case. Also note that when \(\varSigma = 0\), the Dirac moment generating function is recovered. However, when \(d \ge 4\), the Laplace cumulants are no longer a linear space in the cumulant space.

The multiplicative structure of the power series ring \(\mathbb {R}[\![ u_1,\cdots ,u_n ]\!]\) makes it particularly suitable to independence statements with respect to moments. Indeed, if XY are two independent random variables on \(\mathbb {R}^n\), then

$$\begin{aligned} M_{X+Y}(u) = \mathbb {E}[{\mathrm{e}}^{u^t(X+Y)}] = \mathbb {E}[{\mathrm{e}}^{u^tX}{\mathrm{e}}^{u^tY}] = \mathbb {E}[u^tX] \cdot \mathbb {E}[u^tY] = M_X(u)\cdot M_Y(u). \end{aligned}$$

With cumulants it is even simpler: it holds that

$$\begin{aligned} K_{X+Y}(u)= \log (M_{X+Y}) = \log (M_X) + \log (M_Y) = K_X(u) + K_Y(u). \end{aligned}$$

The group of affine transformations \({\text {Aff}}(\mathbb {R}^n)\) acts naturally on both moments and cumulants: indeed, for any \(A\in GL(n,\mathbb {R})\) and \(b\in \mathbb {R}^n\) and a random variable X on \(\mathbb {R}^n\),

$$\begin{aligned} M_{AX+b}(u) = M_{AX}(u)\cdot M_b(u) = \mathbb {E}[{\mathrm{e}}^{u^tAX}] \cdot \mathbb {E}[{\mathrm{e}}^{u^tb}] = {\mathrm{e}}^{u^tb} \cdot M_X(A^tu) \end{aligned}$$

and

$$\begin{aligned} K_{AX+b}(u) = \log (M_{AX+b}(u)) = \log ({\mathrm{e}}^{u^tb} M_X(A^tu)) = u^tb + K_X(A^tu). \end{aligned}$$

In particular, note that translations correspond simply to translations in cumulant coordinates, whereas they induce a more complicated expression in moment coordinates.

3 Homoscedastic Secants

When Karl Pearson introduced Gaussian mixtures to model subpopulations of crabs [18], he also proposed the method of moments in order to estimate the parameters. The basic idea is to compute sample moments from observed data, and match them to the distribution’s moments expressed in terms of the unknown parameters. The method of moments estimates are the parameters that solve these equations. This is a classical estimation method in statistics; a good survey is [16], and a recent ‘denoised’ version for Gaussian mixtures is [21].

The method of moments is very friendly for mixture models because computing moments of mixture densities is straightforward, since for every measurable function \(g:\mathbb {R}^n \rightarrow \mathbb {R}\)

$$\begin{aligned} \int _{\mathbb {R}^n} g(x)\left( \sum _{i=1}^k\lambda _i f_{(\mu _i,\varSigma _i)}(x) \right) {\mathrm{d}}x =\sum _{i=1}^k \lambda _i \int _{\mathbb {R}^n} g(x)f_{(\mu _i,\varSigma _i)}(x){\mathrm{d}}x, \end{aligned}$$
(13)

and thus, the moments are just linear combinations of the corresponding Gaussian moments.

As hinted in the introduction, this discussion can be rephrased in geometric terms: let \({\mathcal {G}}_{n,d}\subseteq \mathbb {A}^M_{n,d}\) be the Gaussian moment variety on \(\mathbb {R}^n\) of order d. Then, the moments of mixtures of Gaussians are linear combinations of points in \({\mathcal {G}}_{n,d}\), so that their corresponding variety is the k-th secant variety \({\text {Sec}}_k({\mathcal {G}}_{n,d})\).

The densities of homoscedastic Gaussian mixtures, where the Gaussian components share a common covariance matrix, have the form:

$$\begin{aligned} \lambda _1f_{(\mu _1,\varSigma )}(x) + \cdots + \lambda _k f_{(\mu _k,\varSigma )}(x) \end{aligned}$$
(14)

where the \(\mu _i \in \mathbb {R}^n\) are the mean parameters, the \(\varSigma \in {\text {Sym}}^2 \mathbb {R}^n\) is the common covariance parameters, and the \(\lambda _i \in \mathbb {R}\) with \(\lambda _1+\cdots +\lambda _k = 1\) are the mixture parameters. Thus, the parameter space for homoscedastic mixtures is

$$\begin{aligned} \begin{aligned} \varTheta ^H_{n,k} :=\,&(\mathbb {R}^n)^{\times k} \times \mathbb {R}^{k-1} \times {\text {Sym}}^2\mathbb {R}^n \\ =\,&\{ ((\mu _1,\ldots ,\mu _k),(\lambda _1,\ldots ,\lambda _k),\varSigma )\,|\, \lambda _1+\cdots +\lambda _k = 1 \}, \end{aligned} \end{aligned}$$
(15)

and it has dimension

$$\begin{aligned} \dim \varTheta ^H_{n,k} = nk + k-1 + \frac{n(n+1)}{2} = (n+1)\left( k + \frac{n}{2} \right) - 1. \end{aligned}$$
(16)

The moment map for homoscedastic mixtures is then an algebraic map

$$\begin{aligned} M_{n,k,d}: \varTheta ^H_{n,k}\rightarrow \mathbb {A}^M_{n,d}. \end{aligned}$$

Points on the image, the moments of homoscedastic mixtures, are linear combinations of points in \({\mathcal {G}}_{n,d}\subseteq \mathbb {A}^M_{n,d}\) which share the same covariance matrix.

Definition 1

The homoscedastic k-secant variety, denoted \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\), is the image of the moment map \(M_{n,k,d}\). The fiber dimension \(\varDelta ^H_{n,k,d}\) is the general fiber dimension of the map \(M_{n,k,d}\),

$$\begin{aligned} \varDelta ^H_{n,k,d} = \dim \varTheta ^H_{n,k} - \dim {\text {Sec}}^H_k({\mathcal {G}}_{n,d}). \end{aligned}$$
(17)

We say that \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\) is algebraically identifiable if \(\varDelta ^H_{n,k,d}=0\).

The feasibility of the method of moments is based on computing points on the fibers of the moment map \(M_{n,k,d}\). Algebraic identifiability of \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\) means that a general homoscedastic Gaussian mixture in the homoscedastic k-secant variety is identifiable from its moments up to order d in the sense that only finitely many Gaussian mixture distributions share the same moments up to order d, whereas we reserve the term rationally identifiable if a general fiber consists of a single point, up to label swapping. In case the general fiber is not finite, then it is positive-dimensional, there is no identifiability of the parameters from the moments up to order d, and a higher order is needed for identifiability (cf. Remark 4 and [4, Problem 17]).

Since the dimension of \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\) is always bounded by the dimension of the ambient space \(\mathbb {A}^M_{n,d}\), there is a simple estimate for the fiber dimension:

Lemma 1

For all ndk it holds that

$$\begin{aligned} \varDelta ^H_{n,k,d} \ge \max \left\{ (n+1)\left( k + \frac{n}{2} \right) - \left( {\begin{array}{c}n+d\\ d\end{array}}\right) , 0 \right\} . \end{aligned}$$
(18)

Proof

The moment space \(\mathbb {A}^M_{n,d}\) is an affine hyperplane inside the vector space \(\mathbb {R}[[ u_1,\ldots ,u_n ]]/(u_1,\ldots ,u_n)^{d+1}\); hence, it has dimension

$$\begin{aligned} \dim \mathbb {A}^M_{n,d} = \dim \mathbb {R}[[u_1,\ldots ,u_n]]/(u_1,\ldots ,u_n)^{d+1} - 1 = \left( {\begin{array}{c}n+d\\ d\end{array}}\right) -1. \end{aligned}$$
(19)

Since \({\text {Sec}}^H_k({\mathcal {G}}_{n,d}) \subseteq \mathbb {A}^M_{n,d}\), note that

$$\begin{aligned} \varDelta ^H_{n,k,d} = \dim \varTheta ^H_{n,k} - \dim {\text {Sec}}^H({\mathcal {G}}_{n,d}) \ge \dim \varTheta ^H_{n,h} - \dim \mathbb {A}^M_{n,d} \end{aligned}$$
(20)

which is exactly the inequality in the statement. \(\square \)

We expect that in general situations the inequality (18) is in fact an equality. Hence, define the defect to be

$$\begin{aligned} \delta ^H_{n,k,d} := \varDelta ^H_{n,k,d} - \max \left\{ (n+1)\left( k + \frac{n}{2} \right) - \left( {\begin{array}{c}n+d\\ d\end{array}}\right) , 0 \right\} . \end{aligned}$$
(21)

We say that \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\) is defective if \(\delta ^H_{n,k,d}>0\). As observed earlier, defectivity implies non-identifiability.

3.1 Cumulant Representation

Let us explore how homoscedastic secants become simpler in cumulant coordinates, and how this representation can be used to check identifiability.

First, rephrase the situation in terms of random variables: let \(Z=Z_{\varSigma }\) be a Gaussian random variable with mean 0 and covariance matrix \(\varSigma \), and let \(B=B_{(\mu _1,\ldots ,\mu _k),(\lambda _1,\ldots ,\lambda _k)}\) an independent random variable with distribution given by a mixture of Dirac distributions:

$$\begin{aligned} \lambda _1\delta _{\mu _1}(x)+\cdots +\lambda _k\delta _{\mu _k}(x). \end{aligned}$$
(22)

Then, the random variable \(Z+B\) has density given by the homoscedastic mixture (1). Moreover, if \(m=\mu _1\lambda _1+\cdots +\mu _k\lambda _k\) is the mean of B, we write \(B=A+m\), where A is a centered mixture of Dirac distributions.

One can compute cumulants of this random variable as follows:

$$\begin{aligned} K_{B+Z}(u) = K_{B}(u) + K_Z(u) = K_{A}(u)+m^t u + \frac{1}{2}u^t\varSigma u \end{aligned}$$
(23)

and this suggests to parametrize the homoscedastic secants in cumulant coordinates as follows:

$$\begin{aligned} K:\varTheta _{n,k}^0 \times \mathbb {R}^n \times {\text {Sym}}^2\mathbb {R}^n \rightarrow \mathbb {A}_{n,d}^K, \qquad (A,m,\varSigma ) \mapsto K_A(u) + m^tu + \frac{1}{2}u^t\varSigma u\nonumber \\ \end{aligned}$$
(24)

where \(\varTheta _{n,k}^0\) parametrizes the centered mixtures of Dirac distributions

$$\begin{aligned} \varTheta _{n,k}^0 = \{ (\mu _1,\ldots ,\mu _k),(\lambda _1,\ldots ,\lambda _k) \,|\, \lambda _1\mu _1+\cdots +\lambda _k\mu _k=0, \, \lambda _1+\cdots +\lambda _k=1 \} \end{aligned}$$

The cumulant homoscedastic secant variety \(\log ({\text {Sec}}^H_{k}({\mathcal {G}}_{n,d}))\) is the image of the map K. Since in this variety, one can freely translate by the elements in \(\mathbb {R}^n\) and \({\text {Sym}}^2(\mathbb {R}^n)\), the first cumulants and the second cumulants can take any value. The constraints are in the cumulants of order three and higher. We summarize this discussion in the following lemma.

Lemma 2

Let \(\mathbb {A}^{K,3}_{n,d}\) be the space of cumulants of order at least three and at most d, let

$$\begin{aligned} \phi _{n,k,d}:\varTheta ^0_{n,k} \rightarrow \mathbb {A}^{K,3}_{n,d}, \qquad A\mapsto K_A(u)_3+K_A(u)_4+\cdots +K_A(u)_d \end{aligned}$$
(25)

be the cumulant map and let \(C^0_{n,k,d}\) denote the closure \(\overline{\phi _{n,k,d}(\varTheta ^0_{n,k})}\). Then, the cumulant homoscedastic secant variety \(\log ({\text {Sec}}^H_k({\mathcal {G}}_{n,d}))\) is a cone over \(C_{n,k,d}^0\).

Remark 1

In particular, the equations for the cumulant homoscedastic secant variety \(\log ({\text {Sec}}^H({\mathcal {G}}_{n,d}))\) inside \(\mathbb {A}^{K}_{n,d}\) are exactly the same as the equations for \(C^0_{n,k,d}\) inside \(\mathbb {A}^{K,3}_{n,d}\).

The fiber dimension \(\varDelta _{n,k,d}^{H}\) can also be computed as the fiber dimension of the map \(\phi _{n,k,d}\):

Lemma 3

The fiber dimension \(\varDelta ^H_{n,k,d}\) is equal to the fiber dimension of \(\phi _{n,k,d}\). In other words

$$\begin{aligned} \varDelta ^H_{n,k,d} = \dim \varTheta ^0_{n,k} - \dim C^0_{n,k,d} = (k-1)(n+1) - \dim C^0_{n,k,d}. \end{aligned}$$
(26)

Proof

The fiber dimension is the difference \(\dim \varTheta ^H_{n,k} - \dim \log ({\text {Sec}}^H_{k}({\mathcal {G}}_{n,d}))\). We know that \(\varTheta ^H_{n,k} \cong \varTheta ^0_{n,k} \times \mathbb {R}^n \times {\text {Sym}}^2\mathbb {R}^n\). Moreover, Lemma 2 says that \(\log ({\text {Sec}}^H_{k}({\mathcal {G}}_{n,d}))\) is the cone over \(C^0_{n,k,d}\), which is precisely \(\mathbb {R}^n\times {\text {Sym}}^2\mathbb {R}^n \times C^0_{n,k,d}\), so that the first equality follows. For the second equality, the dimension of \(\varTheta ^0_{n,k}\) can be computed as \(nk+k-1-n = (n+1)(k-1)\). \(\square \)

Example 5

(\(n=k=2 \) , \( d=3\)) Revisiting Example 1 from the introduction, we concluded that \(\mathrm{Sec}^H_2({\mathcal {G}}_{2,3}) \subset \mathbb {A}^M_{2,3} \cong \mathbb {A}^9\) is expected to be a hypersurface but it is actually of codimension 2. The ideal of \(\mathrm{Sec}^H_2({\mathcal {G}}_{2,3})\) is Cohen–Macaulay and determinantal (generated by the maximal minors of a \(6 \times 5\)-matrix) as described in [4, Proposition 19]. The homoscedastic cumulant variety \(\log ({\text {Sec}}^H_{2}({\mathcal {G}}_{2,3}))\) is defined by the vanishing of the \(2\times 2\) minors of

$$\begin{aligned} \begin{pmatrix}k_{30}&{}\quad k_{21}&{}\quad k_{12}\\ k_{21}&{}\quad k_{12}&{}\quad k_{03}\\ \end{pmatrix}. \end{aligned}$$

Note that indeed the first- and second-order cumulants \(k_{10},k_{01},k_{20},k_{11},k_{22}\) do not appear in the equations above, so that the cumulant variety is the cone over the twisted cubic curve.

Remark 2

To estimate the mixture parameters from the cumulants, it is enough to consider the map \(\phi _{n,k,d}\) of Lemma 2. Indeed, suppose that we have a homoscedastic mixture with parameters \((((\lambda _1,\ldots ,\lambda _k),(\mu _1,\ldots ,\mu _k)),m,\varSigma ) \in \varTheta ^0_{n,k}\times \mathbb {R}^n \times {\text {Sym}}^2\mathbb {R}^n\) and suppose that its cumulants are known, so that in polynomial form

$$\begin{aligned} \begin{aligned} \kappa _1(u)&= m^t u \\ \kappa _2(u)&= K_A(u)_2 + \frac{1}{2}u^t\varSigma u \\ \kappa _3(u)&= K_A(u)_3 \\ \kappa _4(u)&= K_A(u)_4 \\ \vdots \end{aligned}. \end{aligned}$$
(27)

Then, to recover the parameters one can first try to recover the \(\lambda _i\) and the \(\mu _i\) from the cumulants of order three and higher, and then compute m and \(\varSigma \) from the cumulants of order one and two.

3.2 Veronese Secants

We briefly observe that we can recast the above discussion in a way that makes apparent the connection to mixtures of Dirac distributions and, hence, to secants of Veronese varieties. To work with classical secant varieties, this time we work in moment coordinates. Now, every homoscedastic mixture is the distribution of a random variable of the form \(Z+B\), where B is a mixture of Dirac distributions and Z is a centered Gaussian of covariance \(\varSigma \), independent from B. Thus, the moment generating function of this variable is

$$\begin{aligned} M_{Z+B}(u) = M_Z(u)M_B(u) = {\mathrm{e}}^{\frac{1}{2}u^t\varSigma u} \cdot M_B(u). \end{aligned}$$
(28)

Therefore, the role of the covariance parameter is decoupled from the others: In particular, for \(\varSigma =0\), one obtains the moment variety for mixtures of Dirac distributions. When restricting to moments \(M(u)_d\) of degree at most d, this is precisely the k-secants to the Veronese variety \({\text {Sec}}_k({\mathcal {V}}_{n,d})\). The additive group \({\text {Sym}}^2\mathbb {R}^n\) acts on the moment space \(\mathbb {A}^M_{n,d}\) by

$$\begin{aligned} {\text {Sym}}^2\mathbb {R}^n \times \mathbb {A}^M_{n,d} \rightarrow \mathbb {A}^M_{n,d} , \qquad (\varSigma ,M(u)_d) \mapsto {\mathrm{e}}^{\frac{1}{2}u^t\varSigma u} \cdot M(u)_d \end{aligned}$$
(29)

and so (28) says that \({\text {Sec}}^H_k({\mathcal {G}}_{n,d})\) is the union of all the orbits of the points in \({\text {Sec}}_k({\mathcal {V}}_{n,d})\) under this action.

This is useful because we can exploit well-known results on secants of Veronese varieties to address identifiability. First, let \(\varDelta ^{{\mathcal {V}}}_{n,k,d}\) denote the fiber dimension of the k-secants to the Veronese variety \({\text {Sec}}_k({\mathcal {V}}_{n,d}) \subseteq \mathbb {A}^M_{n,d}\): by definition, this is

$$\begin{aligned} \varDelta ^{{\mathcal {V}}}_{n,k,d} := nk+k-1 - \dim {\text {Sec}}_k({\mathcal {V}}_{n,d}). \end{aligned}$$
(30)

A basic estimate for the dimension of \({\text {Sec}}_k({\mathcal {V}}_{n,d})\) is given by the dimension of the ambient space \(\dim \mathbb {A}^{M}_{n,d} = \left( {\begin{array}{c}n+d\\ d\end{array}}\right) -1\), hence

$$\begin{aligned} \varDelta ^{{\mathcal {V}}}_{n,k,d} \ge \max \left\{ (n+1)k - \left( {\begin{array}{c}n+d\\ d\end{array}}\right) , 0 \right\} \end{aligned}$$
(31)

so that we can define the defect of the k-secants to the Veronese variety as

$$\begin{aligned} \delta ^{{\mathcal {V}}}_{n,k,d} := \varDelta ^{{\mathcal {V}}}_{n,k,d} - \max \left\{ (n+1)k - \left( {\begin{array}{c}n+d\\ d\end{array}}\right) , 0 \right\} . \end{aligned}$$
(32)

This number was famously computed by Alexander and Hirschowitz [2], see also [7]:

Theorem 1

(Alexander–Hirschowitz) The defect for the Veronese variety is always zero, except in the following exceptional cases

$$\begin{aligned}&d = 2, 2\le k\le n&\varDelta ^{{\mathcal {V}}}_{n,k,2} = \frac{k(k-1)}{2} \nonumber \\&n=2, d=4, k=5&\delta ^{{\mathcal {V}}}_{2,5,4} = 1 \nonumber \\&n=3, d=4, k=9&\delta ^{{\mathcal {V}}}_{3,9,4} = 1 \nonumber \\&n=4, d=3, k=7&\delta ^{{\mathcal {V}}}_{4,7,3} = 1 \nonumber \\&n=4, d=4, k=14&\delta ^{{\mathcal {V}}}_{4,14,4} = 1 \end{aligned}$$
(33)

Moreover, for a general point \(M(u)\in {\text {Sec}}_k({\mathcal {V}}_{n,d})\), consider the closed subset of \({\text {Sym}}^2\mathbb {R}^n\) given by

$$\begin{aligned} D(M) := \{ \varSigma \in {\text {Sym}}^2\mathbb {R}^n \,|\, {\mathrm{e}}^{\frac{1}{2}u^t\varSigma u} \cdot M(u) \in {\text {Sec}}_k({\mathcal {V}}_{n,d}) \}. \end{aligned}$$
(34)

We have the following relation between the fiber dimensions (17) and (30):

Proposition 1

It holds that

$$\begin{aligned} \varDelta ^H_{n,k,d} = \varDelta ^{{\mathcal {V}}}_{n,k,d} + \dim D(M) \end{aligned}$$
(35)

where \(M\in {\text {Sec}}_k({\mathcal {V}}_{n,d})\) is a general point.

Proof

By the previous discussion, the moment map for homoscedastic mixtures factors as a composition of two surjective maps

$$\begin{aligned} \varTheta ^H_{n,k} \rightarrow {\text {Sym}}^2(\mathbb {R}^n) \times {\text {Sec}}_k({\mathcal {V}}_{n,d}) \rightarrow {\text {Sec}}^H_{k}({\mathcal {G}}_{n,d}). \end{aligned}$$
(36)

Hence, the fiber dimension of the composite map is the sum of the fiber dimensions of the two factors. For the first one this is \(\varDelta ^{{\mathcal {V}}}_{n,k,d}\), so it remains to consider the second. Denote the second factor by \(\rho :{\text {Sym}}^2(\mathbb {R}^n) \times {\text {Sec}}_k({\mathcal {V}}_{n,d}) \rightarrow {\text {Sec}}^H_{k}({\mathcal {V}}_{n,d})\) and let \((\varSigma _o,M_o(u)) \in {\text {Sym}}^2(\mathbb {R}^n) \times {\text {Sec}}_k({\mathcal {V}}_{n,d})\) be a general point. The fiber is

$$\begin{aligned} \begin{aligned} \rho ^{-1}(\rho (\varSigma _o,M_o(u))) = \,&\left\{ (\varSigma ,M(u)) \,|\, {\mathrm{e}}^{\frac{1}{2}u^t\varSigma u}\cdot M(u) = {\mathrm{e}}^{\frac{1}{2}u^t\varSigma _o u} \cdot M_o(u) \right\} \\ =\,&\{ (\varSigma ,M(u)) \,|\, M(u) = {\mathrm{e}}^{\frac{1}{2}u^t(\varSigma _o-\varSigma )u} \cdot M_o(u) \} \\ \cong&\{ \varSigma \in {\text {Sym}}^2(\mathbb {R}^n) \,|\, {\mathrm{e}}^{\frac{1}{2}u^t(\varSigma _o-\varSigma )u} \cdot M_o(u) \in {\text {Sec}}_k({\mathcal {V}}_{n,d}) \} \\ =\,&\varSigma _o - \{ \varSigma ' \in {\text {Sym}}^2(\mathbb {R}^n) \,|\, {\mathrm{e}}^{\frac{1}{2}u^t\varSigma 'u} \cdot M_o(u) \in {\text {Sec}}_k({\mathcal {V}}_{n,d}) \} \\ \cong&D^K(M_o), \end{aligned} \end{aligned}$$

concluding the proof. \(\square \)

Remark 3

In the range \((n+1)\left( k + \frac{n}{2}\right) \le \left( {\begin{array}{c}n+d\\ d\end{array}}\right) \) where we expect identifiability for homoscedastic Gaussian mixtures, we see that \(\varDelta ^H_{n,k,d}=\delta ^H_{n,k,d}\), and Alexander–Hirschowitz says that \(\varDelta ^{{\mathcal {V}}}_{n,k,d} = \delta ^{{\mathcal {V}}}_{n,k,d} = 0\). Hence, Proposition 1 yields

$$\begin{aligned} \delta ^{H}_{n,k,d} = \dim D(M) \end{aligned}$$
(37)

4 Moment Identifiability

Now we start to determine identifiability in various cases. To do so, it is convenient to change notation slightly. Up to now, we have identified moments and cumulants with their corresponding generating functions. In the next sections, it is useful to identify the parameters with polynomials as well. We replace the location parameter \(\mu = (\mu _1,\ldots ,\mu _n)\) with the corresponding linear polynomial \(u^t\mu =\mu _1u_1+\cdots +\mu _nu_n\) and we replace the covariance parameter \(\varSigma \) with the quadric \(\frac{1}{2}u^t\varSigma u\). Of course, the two representations are equivalent, but the polynomial formalism is better suited to the cumulant space and the moment space. In particular, the linear polynomials live in the dual vector space \(V={\text {Hom}}(\mathbb {R}^n,\mathbb {R})\), whereas the quadratic polynomials live in \({\text {Sym}}^2 V\).

The next inequality reflects the fact that increasing the order of moments (or cumulants) measured results in better identifiability:

Lemma 4

The fiber dimensions of general fibers of \(M_{n,k,d}\) and \(M_{n,k,d+1}\) satisfy:

$$\begin{aligned} \varDelta ^H_{n,k,d} \ge \varDelta ^H_{n,k,d+1}. \end{aligned}$$
(38)

Proof

By definition, the fiber dimension \(\varDelta ^H_{n,k,d}\) is the dimension of a general nonempty fiber of the moment map \(M_{n,k,d}:\varTheta ^H_{n,k,d} \rightarrow \mathbb {A}^M_{n,d}\). However, this map is the composition of the map \(M_{n,k,d+1}:\varTheta ^H_{n,k,d} \rightarrow \mathbb {A}^M_{n,d+1}\) and the projection map \(\mathbb {A}^M_{n,d+1}\rightarrow \mathbb {A}^M_{n,d}\), that forgets the moments of order \(d+1\), so the conclusion follows. \(\square \)

Remark 4

Since Gaussian mixtures are identifiable from finitely many moments (see, e.g., [4]), the sequence

$$\begin{aligned} \varDelta ^H_{n,k,1} \ge \varDelta ^H_{n,k,2} \ge \cdots \ge \varDelta ^H_{n,k,d} \ge \varDelta ^H_{n,k,d+1} \ge \cdots \end{aligned}$$

must stabilize at 0 for some large enough d.

The following observation is less trivial. It allows a reduction to the case \(n=k-1\).

Proposition 2

Suppose that \(d\ge 3\) and \(n\ge k-1\), then

$$\begin{aligned} \varDelta ^H_{n,k,d} = \varDelta ^H_{k-1,k,d}. \end{aligned}$$
(39)

Proof

Use Lemma 3, which says that the fiber dimension \(\varDelta ^H_{n,k,d}\) is equal to the fiber dimension of the map

$$\begin{aligned} \phi _{n,k,d}:\varTheta ^0_{n,k} \rightarrow \mathbb {A}_{n,d}^{K,3}. \end{aligned}$$
(40)

This dimension can be computed by looking at the differential of the map at a general point. The parameter space is defined as

$$\begin{aligned} \varTheta ^0_{n,k}=\left\{ ((\lambda _1,\ldots ,\lambda _k),(L_1,\ldots ,L_k)) \in \mathbb {R}^{k}\times V^k \,|\, \sum _{i=1}^k \lambda _i=1, \, \sum _{i=1}^k \lambda _iL_i = 0 \right\} . \end{aligned}$$

Let \(p=((\lambda _1,\ldots ,\lambda _k),(L_1,\ldots ,L_k)) \in \varTheta ^0_{n,k}\) be a general point. Then, the tangent space to \(\varTheta ^{0}_{n,k}\) at the point is given by

$$\begin{aligned} T_p\varTheta ^0_{n,k} = \left\{ ((\varepsilon _i)_{i=1}^k,(H_i)_{i=1}^k) \in \mathbb {R}^k\times V^k \,|\, \sum _{i=1}^k\varepsilon _i = 0, \, \sum _{i=1}^k (\varepsilon _iL_i + \lambda _iH_i) = 0 \right\} . \end{aligned}$$

The fiber dimension of \(\phi _{n,k,d}\) coincides with the dimension of the kernel of the differential \(d\phi _{n,k,d}\) at the general point p. In particular, since the point is general and \(n\ge k-1\), we can suppose that \(L_i=u_i\) for \(i=1,\ldots ,k-1\) and that all the \(\lambda _i\) are nonzero. In particular \(L_k\) is a linear combination of \(u_1,\ldots ,u_{k-1}\). Now, we claim that if \(((\varepsilon _1,\ldots ,\varepsilon _k),(H_1,\ldots ,H_k))\) is in the kernel of \(d\phi _{n,k,d}\) then the only variables appearing in the \(H_i\) are \(u_1,\ldots ,u_{k-1}\). If this is true, then we are done, because the kernel of \(d\phi _{n,k,d}\) coincides with the kernel of \(d\phi _{k-1,k,d}\) at the point \(((\lambda _1,\ldots ,\lambda _k),(L_1,\ldots ,L_k)) \in \varTheta ^0_{k-1,k}\)

To prove the claim, observe that the map is given by the cumulant functions \(\phi _{n,k,d}=(\kappa _3,\kappa _4,\ldots ,\kappa _d)\), so the kernel of \(d\phi _{n,k,d}\) equals the intersection of the kernels of the \(d\kappa _i\) for \(i=3,\ldots ,d\). Therefore, it is enough to prove the analogous claim for the kernel of the differential \(d\kappa _3\) of \(\kappa _3\). Since the first moment is zero by construction, the third cumulant coincides with the third moment

$$\begin{aligned} \kappa _3 = \lambda _1L_1^3+\cdots +\lambda _kL_K^3. \end{aligned}$$
(41)

Hence, the differential is the linear map

$$\begin{aligned} d\kappa _{3,p} :T_p\varTheta ^0_{n,k} \rightarrow \mathbb {A}^{K,3}_{n,d}, \quad ((\varepsilon _1,\ldots ,\varepsilon _k),(H_1,\ldots ,H_k)) \mapsto \sum _{i=1}^k(3\lambda _iH_i+\varepsilon _iL_i)L_i^2 \end{aligned}$$

and if \(((\varepsilon _1,\ldots ,\varepsilon _k),(H_1,\ldots ,H_k))\) is in the kernel, then it must be that

$$\begin{aligned} \sum _{i=1}^k h_i L_i^2=0, \qquad \text {where } h_i = 3\lambda _iH_i+\varepsilon _iL_i. \end{aligned}$$
(42)

Since \(\lambda _k\ne 0\), this is equivalent to \(\sum _{i=1}^k h_i(\lambda _kL_i)^2 = 0\) and since \(\lambda _1L_1+\cdots +\lambda _kL_k=0\), we see that

$$\begin{aligned} \sum _{i=1}^k h_i(\lambda _kL_i)^2&= \sum _{i=1}^{k-1}h_i(\lambda _kL_i)^2 + h_k(\lambda _kL_k)^2 = \sum _{i=1}^{k-1}h_i(\lambda _kL_i)^2 + h_k\left( \sum _{i=1}^{k-1}\lambda _iL_i\right) ^{2} \\&= \sum _{i=1}^{k-1}(\lambda _k^2h_i+\lambda _i^2h_k)L_i^2 + 2h_k \left( \sum _{1 \le i < j \le k-1} \lambda _i\lambda _j L_iL_j \right) . \end{aligned}$$

By assumption \(L_i=u_i\) for \(i=1,\ldots ,k-1\), so this last expression is equal to zero if and only if

$$\begin{aligned} \sum _{i=1}^{k-1}(\lambda _k^2h_i + \lambda _i^2h_k) u_i^2 =- 2h_k \left( \sum _{1 \le i < j \le k-1} \lambda _i\lambda _j u_iu_j \right) . \end{aligned}$$
(43)

If this is true, then \(h_k\) uses only the variables \(u_1,\ldots ,u_{k-1}\). Indeed, if some other variable, say y, appears in \(h_k\), then on the right-hand side there is the monomial \(yu_1u_2\), while there is no such a monomial on the left-hand side. Likewise, if the variable y appears in one of the \(h_i\) for \(i=1,\ldots ,k-1\): then on the left-hand side there would be a monomial of the form \(yu_i^2\), while there is no such monomial on the right hand side.

Hence, the \(h_i\) are polynomials in the \(u_1,\ldots ,u_k\), and, by definition of the \(h_i\), it follows that the same holds for the \(H_i\). This proves the claim and the result follows. \(\square \)

4.1 Moments Up to Order \(d=3\)

When \(d=3\) we determine the defect \({\delta }^H_{n,k,3}\) and the fiber dimension \({\varDelta }^H_{n,k,3}\) of the map

$$\begin{aligned} {\phi }_{n,k,3}:\varTheta ^0_{n,k}\rightarrow \mathbb {A}^{K,3}_{n,3} \end{aligned}$$

for each n and k, and use Lemma 3. When \(d=3\), the space \(\mathbb {A}^{K,3}_{n,3}\) is identified with the space \({\text {Sym}}^3V\) of homogeneous polynomials of degree three, and as noted in the proof of Proposition 2, the third cumulants coincide with the third moments, so that:

$$\begin{aligned} \phi _{n,k,3}:\varTheta ^0_{n,k} \rightarrow {\text {Sym}}^3V \qquad ((L_1,\ldots ,L_k),(\lambda _1,\ldots ,\lambda _k)) \mapsto \lambda _1L_1^3+\cdots +\lambda _kL_k^3. \end{aligned}$$

We compute the closure \(C^0_{n,k,3}\) of the image.

Lemma 5

The set \(C^0_{n,k,3}\) is the Zariski closure of

$$\begin{aligned} \{ H_1(u)^3 + \cdots + H_k(u)^3 \,|\, H_1(u),\ldots ,H_k(u) \in \mathbb {R}^n \text { linearly dependent } \}. \end{aligned}$$
(44)

Proof

Recall that

$$\begin{aligned} \varTheta ^0_{n,k} = \{ ((L_i)_{i=1}^k,(\lambda _i)_{i=1}^k)) \in V^{k} \times \mathbb {R}^{k-1} | \lambda _1+\cdots +\lambda _k =1, \lambda _1L_1+\cdots +\lambda _kL_k = 0 \}. \end{aligned}$$

To compute the Zariski closure, suppose that all the \(\lambda _i\) are strictly positive, so that in particular we can write

$$\begin{aligned} L_k = -\frac{\lambda _1}{\lambda _k}L_1 - \cdots -\frac{\lambda _{k-1}}{\lambda _k}L_{k-1}. \end{aligned}$$
(45)

Since cubic roots are well defined over \(\mathbb {R}\),

$$\begin{aligned} \lambda _1&L_1^3 + \cdots +\lambda _kL_k^3 = \lambda _1L_1^3 + \cdots +\lambda _{k-1}L_{k-1}^3 - \lambda _k\left( \frac{\lambda _1}{\lambda _k}L_1 + \cdots +\frac{\lambda _{k-1}}{\lambda _k}L_{k-1}\right) ^3 \\&= H_1^3 + \cdots + H_{k-1}^3 + H_k^3 \end{aligned}$$

where \(H_i := \root 3 \of {\lambda _i} L_i\) for \(i=1,\ldots ,k-1\), and \(H_k := -\sum _{i=1}^{k-1} \left( \frac{\root 3 \of {\lambda _i}}{\root 3 \of {\lambda _k}}\right) ^2 H_i\), using the equality \(\root 3 \of {\lambda _k}\frac{\lambda _i}{\lambda _k}=\left( \frac{\root 3 \of {\lambda _i}}{\root 3 \of {\lambda _k}}\right) ^2\root 3 \of {\lambda _i}\). In particular, this shows immediately that \(\lambda _1L_1^3+\cdots +\lambda _kL_k^3\) can be written as a sum of cubic powers of linearly dependent linear forms.

For the converse, let \(H_1,\ldots ,H_k\) be linearly dependent linear forms. For the Zariski closure, it suffices to assume that \(H_k = -\beta _1H_{1}-\cdots -\beta _{k-1}H_{k-1}\) for some general \(\beta _1,\ldots ,\beta _{k-1} \in \mathbb {R}\) strictly positive. So we want to write

$$\begin{aligned} \beta _ i = \left( \frac{\root 3 \of {\lambda _i}}{\root 3 \of {\lambda _k}}\right) ^2 \end{aligned}$$
(46)

for some positive \(\lambda _1,\ldots ,\lambda _{k}\in \mathbb {R}\) such that \(\lambda _1+\cdots +\lambda _k=1\). Given such \(\lambda _i\), the above computations yields

$$\begin{aligned} H_1^3+\cdots +H_k^3 = \lambda _1L_1^3+\cdots +\lambda _kL_k^3, \end{aligned}$$
(47)

where \(L_i = \frac{1}{\root 3 \of {\lambda _i}}H_i\) for \(i=1,\ldots ,k-1\) and \(L_k = -\frac{\lambda _1}{\lambda _k}L_1-\cdots -\frac{\lambda _{k-1}}{\lambda _k}L_{k-1}\), so that \(\lambda _1L_1+\cdots +\lambda _kL_k = 0\), as wanted.

To conclude, it remains to show that Eq. (46) have a solution: these equations are equivalent to

$$\begin{aligned} (\sqrt{\beta _i})^3 = \frac{\lambda _i}{1-\lambda _1-\cdots -\lambda _{k-1}} \qquad \text { for } i=1,\ldots ,k-1. \end{aligned}$$
(48)

Observe that the square roots are well defined since \(\beta _i>0\) for all \(i=1,\ldots ,k-1\). Moreover, if \((\lambda _1,\ldots ,\lambda _{k-1})\) is a solution to (48), then it is easy to see that all the \(\lambda _i\) must be strictly positive: indeed, since the \(\beta _i\) are positive, \(\lambda _i\) and \(1-\lambda _1-\cdots -\lambda _{k-1}\) have the same sign. Thus, if one of the \(\lambda _i\) is negative, then all the \(\lambda _i\) are negative, but then \(1-\lambda _1-\cdots -\lambda _{k-1}>0\) which is absurd.

Now, setting \(b_i = \sqrt{\beta _i}^3\), rewrite the equations as the linear system

$$\begin{aligned} \begin{pmatrix} 1+b_1 &{} b_1 &{} b_1 &{} \ldots &{} b_1 \\ b_2 &{} 1+b_2 &{} b_2 &{} \ldots &{} b_2 \\ b_3 &{} b_3 &{} 1+b_3 &{} \ldots &{} b_3 \\ \vdots &{} \vdots &{} \vdots &{} \ddots &{} \vdots \\ b_{k-1} &{} b_{k-1} &{} b_{k-1} &{} \ldots &{} 1+b_{k-1} \end{pmatrix} \begin{pmatrix} \lambda _1 \\ \lambda _2 \\ \lambda _3 \\ \vdots \\ \lambda _{k-1} \end{pmatrix} = \begin{pmatrix} b_1 \\ b_2 \\ b_3 \\ \vdots \\ b_{k-1} \end{pmatrix}. \end{aligned}$$
(49)

The matrix determinant lemma gives that \(\det (\mathrm {I}+ b\cdot \mathbb {1}^T) = 1 + \mathbb {1}^T b\) = \(1 + b_1+\cdots +b_{k-1}\), which is positive since the \(\beta _i\) are positive. This means the system (49) has a unique solution. \(\square \)

Remark 5

The proof of Lemma 5 actually gives more: indeed, it shows that the image of the positive part

$$\begin{aligned} \varTheta ^{0,+}_{n,k} = \{ ((L_1,\ldots ,L_k),(\lambda _1,\ldots ,\lambda _k)) \in \varTheta _{n,d}^0 \,|\, \lambda _i > 0 \text { for all } i=1,\ldots ,k \}, \end{aligned}$$

which is the one relevant in statistics, coincides with the set of sums \(\{ H_1(u)^3+\cdots +H_k(u)^3\}\), where the \(H_i\) are positively linearly dependent, meaning that there are coefficients \(\beta _1,\ldots ,\beta _k>0\) such that

$$\begin{aligned} \beta _1H_1+ \cdots + \beta _{k}H_{k} = 0. \end{aligned}$$
(50)

Remark 6

The set of sums of cubes of k dependent linear forms has a natural interpretation in terms of the projective Veronese variety: indeed consider the third Veronese embedding of \(\mathbb {P}(V)=\mathbb {P}^{n-1}\):

$$\begin{aligned} v_{3}:\mathbb {P}(V) \hookrightarrow \mathbb {P}({\text {Sym}}^3V), \qquad [L] \mapsto [L^3]. \end{aligned}$$
(51)

For each \((k-2)\)-dimensional linear subspace \(\varPi \subseteq \mathbb {P}^{n-1}\) let \({\text {Sec}}_k(v_3(\varPi )) \subseteq \mathbb {P}({\text {Sym}}^3V)\) be the k-th secant variety of its image \(v_3(\varPi )\). Then, by Lemma 5, the variety \(C^0_{n,k,3}\) is the affine cone over the union of these secants:

$$\begin{aligned} C^0_{n,k,3} = {\text {Cone}}\left( \overline{\bigcup _{\varPi \subseteq \mathbb {P}^{n-1}} {\text {Sec}}^k(v_3(\varPi ))}\right) . \end{aligned}$$
(52)

We compute the dimension of this variety, dividing it in the cases \(k\le n+1\) and \(k\ge n+1\):

Proposition 3

  1. (i)

    If \(k\ge n+1\), then

    $$\begin{aligned} \dim C^0_{n,k,3} = \min \left\{ kn , \left( {\begin{array}{c}n+2\\ 3\end{array}}\right) \right\} \end{aligned}$$
    (53)

    except in the case \(n=5,k=7\), where \(\dim C^0_{5,7,3} = 34\).

  2. (ii)

    If \(k\le n+1\), then

    $$\begin{aligned} \varDelta ^H_{n,4,3} = 2, \qquad \varDelta ^H_{n,3,3} = 2, \qquad \varDelta ^H_{n,2,3} = 1. \end{aligned}$$
    (54)

    and when \(k\ge 5\),

    $$\begin{aligned} \varDelta ^H_{n,k,3} = 0. \end{aligned}$$
    (55)

Proof

(i) Since \(k\ge n+1\), Remark 6 shows that \(C^0_{n,k,3}\) is the cone over the k-th secant variety \({\text {Sec}}_k(v_3(\mathbb {P}^{n-1}))\). The dimension of this variety is computed by the Alexander–Hirschowitz theorem, so that

$$\begin{aligned} \dim C^0_{n,k,3} = \min \left\{ kn,\left( {\begin{array}{c}n+2\\ 3\end{array}}\right) \right\} \end{aligned}$$
(56)

with the single exception of \(n=5,k=7\), where the dimension is one less than the expected, hence \(\dim C^0_{5,7,3} = 34\).

(ii) Since \(k\le n+1\), Proposition 2 shows that \(\varDelta ^H_{n,k,3} = \varDelta ^H_{k-1,k,3}\). Hence, for \(k=2,3,4\) we see directly from Table 1 that

$$\begin{aligned} \varDelta ^H_{3,4,3}=2, \qquad \varDelta ^H_{2,3,3} =2 ,\qquad \varDelta ^H_{1,2,3}=1. \end{aligned}$$
(57)

For \(k\ge 5\) instead, we follow the proof of Proposition 2 and show that the differential of \(\phi _{k-1,k,3}:\varTheta ^0_{k-1,k,3} \rightarrow {\text {Sym}}^3 V\) at a general point is injective. For this, consider the kernel of the differential at a point \(p=((\lambda _1,\ldots ,\lambda _k),(L_1,\ldots ,L_k))\). It consists of elements \(((\varepsilon _1,\ldots ,\varepsilon _k),(H_1,\ldots ,H_k)) \in \mathbb {R}^k\times V^k\) such that \(\varepsilon _1+\cdots +\varepsilon _k = 0, \varepsilon _1L_1+\cdots +\varepsilon _kL_k + \lambda _1H_1+\cdots +\lambda _k H_k = 0\) and

$$\begin{aligned} \sum _{i=1}^{k-1}\ell _i L_i^2 + 2h\left( \sum _{1\le i<j\le k-1} \lambda _i\lambda _j L_i L_j \right) = 0, \end{aligned}$$
(58)

where \(\ell _i = \lambda _k^2(3\lambda _iH_i+\varepsilon _iL_i)+\lambda _i^2(3\lambda _kH_k+\varepsilon _kL_k)\) and \(h=3\lambda _k H_k+\varepsilon _kL_k\). Now choose the specific point p given by \(\lambda _i = \frac{1}{k}\) for each \(i=1,\ldots ,k\), \(L_i = u_i\) for \(i=1,\ldots ,k-1\) and \(L_k = -u_1-\cdots -u_{k-1}\). Then, the above equation becomes

$$\begin{aligned} \sum _{i=1}^{k-1}\ell _i u_i^2 +\frac{2}{k^2} \cdot h\cdot \left( \sum _{1\le i<j\le k-1} u_i u_j \right) = 0. \end{aligned}$$
(59)

Let us write \(h=h_1u_1+\cdots +h_{k-1}u_{k-1}\). Then, in (59), the coefficient of \(u_au_bu_c\) is \(\frac{2}{k^2}(h_a+h_b+h_c)\) for all \(1\le a<b<c \le k-1\). Hence

$$\begin{aligned} h_a+h_b+h_c = 0, \qquad \text { for all } 1\le a<b<c \le k-1. \end{aligned}$$
(60)

Let \(1\le a< b< c< d\le k-1\) be any four distinct indices between 1 and \(k-1\). Then, the previous equations translate into the linear system

$$\begin{aligned} \begin{pmatrix} 1 &{} \quad 1 &{}\quad 1 &{}\quad 0 \\ 1 &{}\quad 1 &{}\quad 0 &{}\quad 1 \\ 1 &{} \quad 0 &{}\quad 1 &{}\quad 1 \\ 0 &{} \quad 1 &{}\quad 1 &{}\quad 1 \end{pmatrix} \begin{pmatrix} h_a \\ h_b \\ h_c \\ h_d \end{pmatrix} = 0. \end{aligned}$$
(61)

The matrix appearing in the linear system is invertible, so \(h_a=h_b=h_c=h_d=0\). Since this holds for an arbitrary choice of four distinct indices, it follows that \(h=0\). Now, relation (59) tells us that \(\sum _{i=1}^{k-1}u_i^2 \ell _i = 0\), but since \(u_1^2,\ldots ,u_{k-1}^2\) form a complete intersection of quadrics, they do not have linear syzygies, which implies that \(\ell _i=0\) for each i. From the definitions of \(\ell _i\) and h, it follows that \(3\lambda _iH_i+\varepsilon _iL_i=0\) for each i but then the other two relations \(\sum _i \varepsilon _i=0\) and \(\sum _i (\lambda _iH_i+\varepsilon _iL_i)=0\) imply that \(H_i=0,\varepsilon _i=0\) for all i, which is what was needed. \(\square \)

Now we are ready for a complete classification of defectivity when \(d=3\).

Table 1 All instances of defective varieties \({\text {Sec}}^H_{k}({\mathcal {G}}_{n,3})\) for \(n=1,\ldots , 7\) with \(d=3\)

Theorem 2

For \(d=3\), the defect \(\delta ^H_{n,k,3}=0\) for any k and n, with the following exceptions:

  • \(n\ge k\) and \(k=2\), where \(\delta ^H_{n,2,3}=1\).

  • \(n\ge k\) and \(k=3,4\), where \(\delta ^H_{n,k,3}=2\).

  • \(n=5\) and \(k=7\), where \(\delta ^H_{5,7,3}=1\).

  • \(n\ge 4\) and \(n+1 < k \le \frac{n^2+2n+6}{6}\) where \(\delta ^H_{n,k,3} = k-n-1\).

  • \(n\ge 4\) and \(\frac{n^2+2n+6}{6} \le k < \frac{n^2+3n+2}{6}\) where \(\delta ^H_{n,k,3} = n\left( \frac{n^2+3n+2}{6} -k \right) \).

Proof

First consider the case when \(n\ge k\): then Proposition 3 applies. It is straightforward to check that \(\delta ^H_{n,k,3} = \varDelta ^H_{n,k,d}\), from which the statement of the theorem follows.

For the cases where \(k\ge n+1\), start with the exceptional case \(n=5,k=7\): Proposition 3 gives that \(\dim \overline{\phi _{n,k,3}(\varTheta ^0_{5,7})} = 34\), and Lemma 3 yields \(\varDelta ^H_{5,7,3} = 2\) and \(\delta ^H_{5,7,3} = 1\).

Now, consider the other cases: Proposition 3 gives that

$$\begin{aligned} \dim \overline{\phi _{n,k,3}(\varTheta _{n,k})} = \min \left\{ nk , \left( {\begin{array}{c}n+2\\ 3\end{array}}\right) \right\} \end{aligned}$$
(62)

and then Lemma 3 shows that

$$\begin{aligned} \varDelta ^H_{n,k,3} =\,&(k-1)(n+1) - \min \left\{ nk , \left( {\begin{array}{c}n+2\\ 3\end{array}}\right) \right\} \\ =\,&\max \left\{ k-n-1, k-n-1 + n\left( k- \frac{n^2+3n+2}{6}\right) \right\} \end{aligned}$$

so that

$$\begin{aligned} \delta ^H_{n,k,3} = \max&\left\{ k-n-1, k-n-1 + n\left( k- \frac{n^2+3n+2}{6}\right) \right\} \\&- \max \left\{ 0, (n+1)\left( k - \frac{n^2+2n+6}{6} \right) \right\} . \end{aligned}$$

Suppose first that \(n=1,2,3\): this implies \(k\ge n+1 \ge \frac{n^2+2n+6}{6} \ge \frac{n^2+3n+2}{6}\) so that

$$\begin{aligned} \delta ^H_{n,k,3} = k-n-1 + n\left( k- \frac{n^2+3n+2}{6}\right) - (n+1)\left( k - \frac{n^2+2n+6}{6} \right) = 0. \end{aligned}$$

Now, suppose that \(n\ge 4\). Then \(5\le n+1\le \frac{n^2+2n+6}{6} \le \frac{n^2+3n+2}{6}\) and there are three possibilities for k: if \(k\ge \frac{n^2+3n+2}{6}\), then

$$\begin{aligned} \delta ^H_{n,k,3} = k-n-1 + n\left( k- \frac{n^2+3n+2}{6}\right) - (n+1)\left( k - \frac{n^2+2n+6}{6} \right) = 0. \end{aligned}$$

If instead \(\frac{n^2+2n+6}{6}\le k < \frac{n^2+3n+2}{6}\), then

$$\begin{aligned} \delta ^H_{n,k,3} = k-n-1-(n+1)\left( k-\frac{n^2+2n+6}{6} \right) = n\left( \frac{n^2+3n+2}{6} -k \right) \end{aligned}$$
(63)

which is strictly positive. Finally, if \(n+1\le k < \frac{n^2+2n+6}{6}\), the defect is

$$\begin{aligned} \delta ^H_{n,k,3} = k-n-1, \end{aligned}$$
(64)

which is positive if and only if \(k>n+1\). \(\square \)

As a consequence, identifiability can be characterized whenever \(k\le n+1\):

Theorem 3

Suppose \(k\le n+1\). If \(k\ge 5\) then a general homoscedastic mixture is algebraically identifiable from moments up to order 3. If instead \(k=2,3,4\), then a general homoscedastic mixture is algebraically identifiable from the moments up to order \(d=4\).

Proof

When \(k\ge 5\) this follows immediately from Theorem 2 and Lemma 4. If instead \(k=2,3,4\), thanks to Proposition 2, it is enough to set \(n=k-1\) and check the first d for which we have identifiability: these are a finite number of cases that can be done by direct computation (e.g., in Macaulay2 [9]), and we find that such a d is 4. \(\square \)

4.2 Mixtures with \(k=2\) Components

When \(k=2\) we characterize the rational identifiability as well. Since the case \(d=3\) is already covered, consider only \(d\ge 4\).

Theorem 4

The homoscedastic secant variety \({\text {Sec}}^H_{2}({\mathcal {G}}_{n,4})\) is algebraically identifiable. If \(d\ge 5\), the homoscedastic secant variety \({\text {Sec}}^H_2({\mathcal {G}}_{n,d})\) is also rationally identifiable.

Proof

By Lemma 3 and Remark 2, it is enough to consider the parameter space given by \(\varTheta ^0_{n,2} = \{ ((L_1,L_2),(\lambda _1,\lambda _2)) \,|\, \lambda _1+\lambda _2=1, \lambda _1L_1+\lambda _2L_2 = 0 \}\) and the map

$$\begin{aligned} \phi _{n,2,d}:\varTheta ^0_{n,2} \rightarrow C^0_{n,2,d} \subseteq \mathbb {A}^{K,3}_{n,d}. \end{aligned}$$
(65)

In order to compute the general fiber of this map, note that since \(d\ge 4\), it follows from Theorem 3 and its proof that the map has finite fibers. Hence, it is enough to restrict a general fiber to the open subset \(\lambda _2\ne 0\). There we may assume \(L_2 = -\frac{\lambda _1}{\lambda _2}L_1 = -\frac{\lambda _1}{1-\lambda _1}L_1\). We thus compute the fibers of the induced map

$$\begin{aligned} F_{n,2,d}:V \times (\mathbb {R}\setminus \{1\}) \rightarrow \mathbb {A}^{K,3}_{n,d}, \qquad (L,\lambda ) \mapsto \phi _{n,2,d}\left( (\lambda ,1-\lambda ),\left( L, -\frac{\lambda }{1-\lambda }L\right) \right) . \end{aligned}$$

In explicit terms, this map is given by the terms from degree 3 to degree d of the logarithm \(\log (\lambda {\mathrm{e}}^{L}+(1-\lambda ){\mathrm{e}}^{-\frac{\lambda }{\lambda -1}L})\). A computation shows that the first terms are:

$$\begin{aligned}&\log (\lambda {\mathrm{e}}^{L}+(1-\lambda ){\mathrm{e}}^{-\frac{\lambda }{\lambda -1}L}) = f_3(\lambda )L^3 + f_4(\lambda )L^4 + f_5(\lambda )L^5 + \ldots \nonumber \\&f_3(\lambda ) = \frac{\lambda (1-\lambda )(1-2\lambda )}{6(1-\lambda )^3}, \qquad f_4(\lambda ) = \frac{\lambda (1-\lambda )(1-6\lambda (1-\lambda ))}{24(1-\lambda )^4},\nonumber \\&f_5(\lambda ) = \frac{\lambda (1-\lambda )(1-2\lambda )(1-12\lambda (1-\lambda ))}{120(1-\lambda )^5}. \end{aligned}$$
(66)

Now suppose that \(d=4\), and let \(L\in V\) and \( \lambda \in \mathbb {R}\setminus \{1\}\) be general elements. In fact, it is enough to assume \(L\ne 0\) and \(\lambda \ne 0,1,\frac{1}{2}\), so that \(\kappa _3 = f_3(\lambda )L^3 \ne 0\). In order to compute the fiber of the point \((\kappa _3,\kappa _4) = F_{n,2,4}(L,\lambda )\), first observe that \(\kappa _3=f_3(\lambda _0)L_0^3 = (\root 3 \of {f_3(\lambda _0)}L_0)^3\) and that the polynomial \(L_0:=\root 3 \of {f_3(\lambda )}L\) can be computed explicitly: from the expression

$$\begin{aligned} \kappa _3 = \kappa _{300..0}u_1^3+\kappa _{030..0}u_2^3+\cdots +\kappa _{00..03}u_n^3 + (\text { terms with mixed monomials }) \end{aligned}$$

then one obtains

$$\begin{aligned} L_0 = \root 3 \of {\kappa _{300..0}}\cdot u_1+\root 3 \of {\kappa _{030..0}}\cdot u_2+\ldots +\root 3 \of {\kappa _{00..03}}\cdot u_n. \end{aligned}$$
(67)

In particular, \(L=f_3(\lambda )^{-\frac{1}{3}}L_0\), so that the equation \(\kappa _4 = f_4(\lambda )L^4\) translates into \(\frac{f_4(\lambda )}{f_3(\lambda )^{\frac{4}{3}}} = \frac{\kappa _4}{L_0^4}\). Observe that \(a := \frac{\kappa _4}{L_0^4}\) is a constant that can be computed explicitly by comparing a single nonzero coefficient of \(L_0^4\) with the corresponding coefficient of \(\kappa _4\): for example, if \(\root 3 \of {\kappa _{300..0}} \ne 0\), then

$$\begin{aligned} a = \frac{\kappa _{400..0}}{(\root 3 \of {\kappa _{300..0}})^4}. \end{aligned}$$
(68)

Now, the equation \(\frac{f_4(\lambda )}{f_3(\lambda )^{\frac{4}{3}}} = a\) is equivalent to \(\frac{f_4(\lambda )^3}{f_3(\lambda )^4} = a^3\), or more explicitly

$$\begin{aligned} \frac{3}{32} \cdot \frac{(1-6\lambda (1-\lambda ))^3}{\lambda (1-\lambda )(1-4\lambda (1-\lambda ))^2} = a^3. \end{aligned}$$
(69)

Note that this expression is invariant under exchanging \(\lambda \) with \(1-\lambda \), as is expected from the symmetry of the situation. Hence, set \(\gamma := \lambda (1-\lambda )\) and rewrite this expression as

$$\begin{aligned} \frac{3}{32}\cdot \frac{(1-6\gamma )^3}{\gamma (1-4\gamma )^2} = a^3. \end{aligned}$$
(70)

This is a cubic equation with three possible solutions for \(\gamma \), which means there is no rational identifiability. In order to get such, consider also the cumulants \(\kappa _5\) of order 5: this adds the data \(\kappa _5\) and the condition \(\kappa _5 = f_5(\lambda )L^5\). In the above notation \(L=f_3(\lambda )^{-\frac{1}{3}}L_0\), so that the condition \(\kappa _5 = f_5(\lambda )L^5\) becomes \(\frac{f_5(\lambda )}{f_3(\lambda )^{\frac{5}{3}}} = \frac{\kappa _5}{L_0^5}\). As before, we see that \(b := \frac{\kappa _5}{L_0^5}\) is a constant that can be computed explicitly by comparing a single nonzero coefficient of \(L_0^5\) with the corresponding coefficient of \(\kappa _5\): for example, if \(\root 3 \of {\kappa _{300..0}} \ne 0\), then

$$\begin{aligned} b = \frac{\kappa _{500..0}}{(\root 3 \of {\kappa _{300..0}})^5}. \end{aligned}$$
(71)

Now, the equation \(\frac{f_5(\lambda )}{f_3(\lambda )^{\frac{5}{3}}} = a\) is equivalent to \(\frac{f_5(\lambda )^3}{f_3(\lambda )^5} = b^3\), or more explicitly, as above, with the substitution \(\gamma = \lambda (1-\lambda )\),

$$\begin{aligned} \frac{15}{128}\cdot \frac{(1-6\gamma )^5}{\gamma (1-\gamma )^3(1-12\gamma )} = b^3. \end{aligned}$$
(72)

Hence, rational identifiability is obtained if the two Eqs. (70) and (72) have a unique common solution \(\gamma \). This means that the map \(\mathbb {R}\dashrightarrow \mathbb {R}^2, \gamma \mapsto (g(\gamma ),h(\gamma ))\) is generically injective. This map extends to \(\mathbb {\mathbb {R}} \rightarrow \mathbb {P}^2\) via

$$\begin{aligned} \left[ \frac{3}{32}(1-6\gamma )^3(1-\gamma )^3(1-12\gamma ), \frac{15}{128}(1-6\gamma )^5(1-4\gamma )^2,\gamma (1-4\gamma )(1-\gamma )^3(1-12\gamma ) \right] , \end{aligned}$$

i.e., a map defined by polynomials of degree 7. It is generically injective if and only if the closure of its image is a plane curve of degree 7. This can be verified with Macaulay2 [9]: the resulting curve is given by the equation

$$\begin{aligned}&849346560x^5y^2-679477248x^4y^3-29491200x^5yz+2674483200x^4y^2z-2439217152x^3y^3z\\&\quad +256000x^5z^2 +79744000x^4yz^2+2415168000x^3y^2z^2-2616192000x^2y^3z^2\\&\quad +499500000x^2y^2z^3-406500000xy^3z^3+474609375y^3z^4 = 0. \end{aligned}$$

\(\square \)

Fig. 1
figure 1

Plot of the real-valued function \(a(\gamma )\) in (73)

Even though there is no rational identifiability above when \(d=4\), it is worth noting that in a purely statistical setting, \(\gamma \) can be recovered uniquely, as seen below.

Corollary 1

For \(k=2\), the statistical mixture parameters can be recovered uniquely with moments up to order \(d=4\).

Proof

This is equivalent to saying that the Eq. (70) has a unique statistically relevant solution in \(\gamma = \lambda (1- \lambda )\). Note that since \(\lambda \in (0,1)\setminus \{\frac{1}{2} \}\), we have that \(\gamma \in (0, \frac{1}{4})\). Consider the real valued function coming from (70):

$$\begin{aligned} a(\gamma )= \frac{\root 3 \of {3}(1-6\gamma )}{2\root 3 \of {4\gamma (1-4\gamma )^2}}. \end{aligned}$$
(73)

Its derivative, \(a'(\gamma ) = -\frac{1}{2\root 3 \of {36}\gamma (1-4\gamma )\root 3 \of {4\gamma (1-4\gamma )^2}}\), is always negative for \(0<\gamma <\frac{1}{4}\) so that the function \(a(\gamma )\) is strictly decreasing and, in particular, injective in this statistically meaningful interval (Fig. 1).

The corresponding inverse is given by the cubic equation in \(\gamma \)

$$\begin{aligned} (256a^3+324)\gamma ^3-(128a^3+162)\gamma ^2+(16a^3+27)\gamma - \frac{3}{2}\, = \, 0. \end{aligned}$$
(74)

The discriminant of (74) is \(\varDelta = -3072a^6(64a^3+81)\). It is zero precisely when \(a=-\frac{3\root 3 \of {3}}{4}\), which corresponds to the horizontal asymptote of a. If \(a<-\frac{3\root 3 \of {3}}{4}\), there are 3 real solutions, but one is negative and the other one is larger than \(\frac{1}{4}\). The remaining solution is also the unique real solution when \(a>-\frac{3\root 3 \of {3}}{4}\), given explicitly by

$$\begin{aligned} \gamma = \frac{4a^3}{3 \eta } + \frac{\eta }{3 (64 a^3 + 81)}+\frac{1}{6}, \end{aligned}$$
(75)
$$\begin{aligned} \eta =(-4096 a^9 - 10368 a^6 - 6561 a^3 + 9 \sqrt{262144 a^{15} + 995328 a^{12} + 1259712 a^9 + 531441 a^6} )^{\frac{1}{3}}. \end{aligned}$$

\(\square \)

This proof gives an explicit algorithm to recover the parameters of a homoscedastic mixture of two Gaussians from the cumulants up to order four.

figure a

Observe that this algorithm needs all the cumulants of order one, all the cumulants of order two, n cumulants of order three, and one cumulant of order four. Hence, it needs in total \(n+\frac{n(n+1)}{2}+n+1\) cumulants.

Remark 7

We have seen in Remark 6 that \({\text {Sec}}^H_2({\mathcal {G}}_{n,d})\) in cumulant coordinates is a cone over \(C^0_{n,2,d} \subseteq \mathbb {A}^{K,3}_{n,d}\). Up to taking the Zariski closure, the proof of Theorem 4 shows that \(C^0_{n,2,d}\) is the image of the map

$$\begin{aligned} F_{n,2,d}:V\times \mathbb {R}\setminus \{1\} \rightarrow \mathbb {A}^{K,3}_{n,d}, \, \, (L,\lambda ) \mapsto f_3(\lambda )L^3 + f_4(\lambda )L^4 + f_5(\lambda )L^5 + f_6(\lambda )L^6+ \ldots \end{aligned}$$

For \(\lambda \) constant we get a projected d-th Veronese variety of V. If instead L is constant, then we get a rational curve given by a linear combination of \((f_3(\lambda ),f_4(\lambda ),\ldots ,f_d(\lambda ))\).

4.3 The Univariate Case \(n=1\)

We use the standard notation \(\sigma ^2\) for the variance \(\varSigma =( \sigma _{11})\) when \(n=1\).

For \(n=1\), the moment variety \({\text {Sec}}_k^H({\mathcal {G}}_{1,d})\) is never defective. The moment map

$$\begin{aligned} M_{1,k,2k}:\varTheta ^H_{1,k} \rightarrow \mathbb {A}^M_{1,2k} \end{aligned}$$

is finite to one. In the statistics literature, it is known that in the case of homoscedastic secants, one may recover mixture parameters from given moments (i.e., compute the fiber of the map above), with an algorithm closely related to the well-known Prony’s method [20]. This procedure was introduced by Lindsay as an application of moment matrices [15] and we briefly recall the algorithm here.

First, how does one recover the locations \(\mu _i\) and weights \(\lambda _i\) of the k components of a Dirac mixture from \(2k-1\) moments? This is known as the quadrature rule and it works as follows. Given the moment sequence \(m=(m_1,m_2,\ldots ,m_{2k-1})\) one considers the polynomial resulting from the following \((k+1) \times (k+1)\) determinant

$$\begin{aligned} P_k(t) = \det \begin{pmatrix} 1 &{} m_1 &{} \ldots &{} m_{k-1} &{} 1 \\ m_1 &{} m_2 &{} \ldots &{} m_k &{} t \\ \vdots &{} &{} &{} \vdots &{} \vdots \\ m_k &{} m_{k+1} &{} \ldots &{} m_{2k-1} &{} t^k \\ \end{pmatrix}. \end{aligned}$$
(76)

The k roots \(\mu _1, \mu _2, \ldots , \mu _k\) of \(P_k(t)\) are precisely the sought locations. This follows since the equations of the secant varieties of the rational normal curve are classically known to be given by the minors of the moment matrices. For a modern reference see [14].

Once the locations are known, the weights \(\lambda _i\) are found by solving the \(k \times k\) Vandermonde linear system

$$\begin{aligned} \begin{pmatrix} 1 &{} 1 &{} \ldots &{} 1 \\ \mu _1 &{} \mu _2 &{} \ldots &{} \mu _k \\ \vdots &{} &{} \vdots &{} \\ \mu _1^{k-1} &{} \mu _2^{k-1} &{} \ldots &{} \mu _k^{k-1} \\ \end{pmatrix} \begin{pmatrix} \lambda _1 \\ \lambda _2 \\ \vdots \\ \lambda _k \end{pmatrix}= \begin{pmatrix} 1 \\ m_1 \\ \vdots \\ m_{k-1} \end{pmatrix}. \end{aligned}$$
(77)

Back to the Gaussian case, if we knew the value of the common variance \( \sigma ^2\), we can reduce to the above instance. In terms of the Gaussian moment generating function:

$$\begin{aligned} {\mathrm{e}}^{-\frac{1}{2} \sigma ^2 u^2} M_X(u) = {\mathrm{e}}^{\mu u}. \end{aligned}$$
(78)

Hence, the Dirac moments \({\tilde{m}}\) on the right hand side are linear combinations of the Gaussian moments m. Explicitly, for \(1 \le j \le 2k-1\)

$$\begin{aligned} {\tilde{m}}_j(\sigma ) = \sum _{i=0}^{\lfloor j/2 \rfloor } \frac{j!}{(-2)^i i! (j-2i)!} m_{j-2i} \sigma ^{2i}. \end{aligned}$$
(79)

Applying the quadrature rule to the vector \({\tilde{m}}=({\tilde{m}}_1,{\tilde{m}}_2, \ldots , {\tilde{m}}_{2k-1})\) would allow us to obtain the means \(\mu _1, \mu _2, \ldots , \mu _k\).

However, \(\sigma \) is unknown. To find an estimate for \(\sigma \) we consider the first 2k moments \(m = (m_1, m_2, \ldots , m_{2k})\). If \({\tilde{m}}=({\tilde{m}}_1,{\tilde{m}}_2, \ldots , {\tilde{m}}_{2k})\) comes from a mixture of k Dirac measures, then

$$\begin{aligned} D_k = \det \begin{pmatrix} 1 &{} {\tilde{m}}_1 &{} \ldots &{} {\tilde{m}}_{k-1} &{} {\tilde{m}}_k \\ {\tilde{m}}_1 &{} {\tilde{m}}_2 &{} \ldots &{} {\tilde{m}}_k &{} {\tilde{m}}_{k+1} \\ \vdots &{} &{} &{} \vdots &{} \vdots \\ {\tilde{m}}_k &{} {\tilde{m}}_{k+1} &{} \ldots &{} {\tilde{m}}_{2k-1} &{} {\tilde{m}}_{2k} \\ \end{pmatrix} = 0. \end{aligned}$$
(80)

One thus treats \(\sigma \) as a variable and substitutes expressions (79) into (80). This results in a polynomial \(D_k(\sigma )\) of degree \(\left( {\begin{array}{c}k+1\\ 2\end{array}}\right) \) in \(\sigma ^2\) and the estimator \({\hat{\sigma }}^2\) is obtained as its smallest non-negative root [15, Theorem 5B]. So the algebraic degree for estimating \(\sigma ^2\) is \(\left( {\begin{array}{c}k+1\\ 2\end{array}}\right) \). With \(\sigma ^2\) specified, one proceeds as above.

More generally, the discussion under (28) shows that the moment variety \({\text {Sec}}_k^H({\mathcal {G}}_{1,d})\) with \(k\le d/2\) is a union

$$\begin{aligned} {\text {Sec}}_k^H({\mathcal {G}}_{1,d})=\bigcup _{\sigma } {\text {Sec}}_k(V_{1,d}^{\sigma }), \end{aligned}$$

where \(V_{1,d}^{\sigma }\) is the translation of the moment curve \(V_{1,d}\) by the variance \(\sigma ^2\) as defined by the Gaussian moments. The secant variety \({\text {Sec}}_k(V_{1,d}^{\sigma })\) is defined for each \(\sigma \) by the \((k+1)\times (k+1)\) minors of

$$\begin{aligned} M_{k,d} = \begin{pmatrix} 1 &{} {\tilde{m}}_1 &{} \ldots &{} {\tilde{m}}_{d-k-1} &{} {\tilde{m}}_{d-k} \\ {\tilde{m}}_1 &{} {\tilde{m}}_2 &{} \ldots &{} {\tilde{m}}_{d-k} &{} {\tilde{m}}_{d-k+1} \\ \vdots &{} &{} &{} \vdots &{} \vdots \\ {\tilde{m}}_k &{} {\tilde{m}}_{k+1} &{} \ldots &{} {\tilde{m}}_{d-1} &{} {\tilde{m}}_{d} \\ \end{pmatrix}. \end{aligned}$$
(81)

As soon as the k-th secant variety of a smooth curve is not linear, the curve can be recovered as the singular locus of highest multiplicity in the secant variety. Therefore, since curves \(V_{1,d}^{\sigma }\) are distinct, their k-th secant varieties are distinct as well, as long as the latter are not linear. In particular, since the variety \({\text {Sec}}_k(V_{1,d}^{\sigma })\) has dimension \(2k-1\), it follows that the union \({\text {Sec}}_k^H({\mathcal {G}}_{1,d})\) has dimension 2k. Given the moments \(m_i\) up to degree d of a point on a homoscedastic k-secant, the \((k+1)\times (k+1)\) minors of \(M_{k,d}\) are polynomials in \(\sigma ^2\) with a zero at the common variance. Given the variance, the means can be inferred as above.

When \(d=2k+1\), then the variety \({\text {Sec}}_k^H({\mathcal {G}}_{1,d})\subset \mathbb {A}^M_{1,2k+1}\) is a hypersurface, defined by the resultant of \((k+1)\)-minors of \(M_{k,d}\), the polynomial obtained by elimination of \(\sigma ^2\) in the ideal defined by the \((k+1)\times (k+1)\) minors. Denote this polynomial by \(P_{2k+1}.\) It is a polynomial in \(m_1,\ldots ,m_{2k+1}\) (or \(\kappa _3,\kappa _4,\ldots ,\kappa _{2k+1}\)). For example,

$$\begin{aligned} P_3= & {} \kappa _3 = 2m_1^3-3m_1m_2+m_3,\\ P_5= & {} 108\kappa _3^6 - 32\kappa _3^2\kappa _4^3 + 36\kappa _3^3\kappa _4\kappa _5 - \kappa _4^2\kappa _5^2 + \kappa _3\kappa _5^3 \end{aligned}$$

Proposition 4

The polynomial \(P_{2k+1}\) is homogeneous of total degree

$$\begin{aligned} \left( {\begin{array}{c}k+2\\ 2\end{array}}\right) \left( {\begin{array}{c}k+1\\ 2\end{array}}\right) \end{aligned}$$

in the multigraded weights \(\deg m_i=\deg \kappa _i = i\).

Proof

Let

$$\begin{aligned} \mathbb {A}=\mathbb {A}^M_{1,2k+1}\times \mathbb {A}^1 \end{aligned}$$

where \(\sigma \) is the last coordinate, and consider the projective closure \(\mathbb {P}\) of \(\mathbb {A}\). Then, the matrix (81) defines a map between vector bundles E and F on \(\mathbb {A}\). The vector bundles E and F and the map extends to \(\mathbb {P}\); E extends to a sum of line bundles \({{\tilde{E}}}=\mathcal{O}_\mathbb {P}\oplus \mathcal{O}_\mathbb {P}(-1)\oplus \ldots ,\mathcal{O}_\mathbb {P}(-k)\), while F extends to a sum of line bundles \({{\tilde{F}}}=\mathcal{O}_\mathbb {P}\oplus \mathcal{O}_\mathbb {P}(1)\oplus \ldots ,\mathcal{O}_\mathbb {P}(k+1)\). By the Thom–Porteous formula, [8, Theorem 14.4], the degree in \(\mathbb {P}\) of the rank k locus of the map is given by the Chern class

$$\begin{aligned} c_2({{\tilde{F}}}-{{\tilde{E}}})= 2\left( {\begin{array}{c}k+2\\ 2\end{array}}\right) \left( {\begin{array}{c}k+1\\ 2\end{array}}\right) \end{aligned}$$

since the Chern polynomials of \({{\tilde{E}}}\) and \({{\tilde{F}}}\) in are

$$\begin{aligned} c({{\tilde{E}}})=(1-t)(1-2t)\ldots (1-kt) \end{aligned}$$

and

$$\begin{aligned} c({{\tilde{F}}})=(1+t)(1+2t)\ldots (1+(k+1)t). \end{aligned}$$

This rank k locus has codimension 2 and its intersection with \(\mathbb {A}\) is projected to the hypersurface defined by \(P_{2k+1}\) in \(\mathbb {A}^M_{1,2k+1}\). The coordinate \(\sigma \) appears only in even degree in the equations defining the rank k locus, so the projection to \(\mathbb {A}^M_{1,2k+1}\) is 2 : 1, so the degree of \(P_{2k+1}\) is half the degree of the rank k locus. \(\square \)

Question 1

It would be interesting to understand better the structure of the polynomials \(P_{2k+1}\), e.g., is there a closed form expression for all k?

If \(P_{2k+1}\) vanishes on a set \((m_1,\ldots ,m_{2k+1})\) of moments, and \(P_{2l+1}\) does not vanish on \((m_1,\ldots ,m_{2l+1})\) for any \(l<k\), then the moments lie on a homoscedastic k-secant but not on any l secant for \(l<k\). Therefore the polynomials \(P_{2k+1}\) may be used to estimate the number of components in a homoscedastic Gaussian mixture (compare to the rank test proposed in [15, Section 3.1] for the known variance case).

5 Conclusion

We have completely classified all defective cases for the moment varieties associated with homoscedastic Gaussian mixtures whenever \(k<n+1\), \(d=3\), \(k=2\) or \(n=1\). The question concerning a complete classification for all ndk remains open, although our computations did not reveal any further defective examples.

Our identifiability results also cover special structures in the covariance matrix, by Remark 2. For example, a common mixture submodel involves isotropic Gaussians, which means that the covariance matrix is a scalar multiple of the identity, \(\varSigma = \sigma I\). The k-means algorithm used in clustering can be interpreted as parameter estimation for a homoscedastic isotropic mixture of Gaussians. In [10], Hsu and Kakade consider the learning of mixtures of isotropic Gaussians from the moments up to order \(d=3\) when \(k \le n+1\). They prove identifiability for the homoscedastic isotropic submodel (see [6, Theorem 3.2]), and in order to solve the moment equations, they find orthogonal decompositions of the second and third order moment tensors.

On the other hand, in [17] Lindsay and Basak proposed a ‘fast consistent’ method of moments for homoscedastic Gaussian mixtures in the multivariate case, based on a ‘primary axis’ to which the one-dimensional case presented in Sect. 4.3 is applied. This means that the method uses some moments of order 2k. Knowing that in some cases there are explicit equations for secant varieties of higher dimensional Veronese varieties [14], an alternative method with minimal order based on these should be possible.

Finally, a similar approach can be made to study moment varieties of homoscedastic mixtures of other location families. In the case of Example 4, we saw that Gaussian moments and Laplacian moments coincide up to \(d=3\). This means that Theorem 2 applies verbatim to homoscedastic mixtures of Laplace distributions.