Skip to main content

Robust estimation of mixtures of regressions with random covariates, via trimming and constraints


A robust estimator for a wide family of mixtures of linear regression is presented. Robustness is based on the joint adoption of the cluster weighted model and of an estimator based on trimming and restrictions. The selected model provides the conditional distribution of the response for each group, as in mixtures of regression, and further supplies local distributions for the explanatory variables. A novel version of the restrictions has been devised, under this model, for separately controlling the two sources of variability identified in it. This proposal avoids singularities in the log-likelihood, caused by approximate local collinearity in the explanatory variables or local exact fits in regressions, and reduces the occurrence of spurious local maximizers. In a natural way, due to the interaction between the model and the estimator, the procedure is able to resist the harmful influence of bad leverage points along the estimation of the mixture of regressions, which is still an open issue in the literature. The given methodology defines a well-posed statistical problem, whose estimator exists and is consistent to the corresponding solution of the population optimum, under widely general conditions. A feasible EM algorithm has also been provided to obtain the corresponding estimation. Many simulated examples and two real datasets have been chosen to show the ability of the procedure, on the one hand, to detect anomalous data, and, on the other hand, to identify the real cluster regressions without the influence of contamination.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  • Bai, X., Yao, W., Boyer, J.: Robust fitting of mixture regression models. Comput. Stat. Data Anal. 56(7), 2347–2359 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Bashir, S., Carter, E.: Robust mixture of linear regression models. Stat. Commun. Theory Methods 41(18), 3371–3388 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Cohen, E.: Some effects on inharmonic partials on interval perception. Music Percept. 1(3), 323–349 (1984)

    Article  Google Scholar 

  • Cuesta-Albertos, J.A., Gordaliza, A., Matrán, C.: Trimmed \(k\)-means: an attempt to robustify quantizers. Ann. Stat. 25(2), 553–576 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  • Dasgupta, A., Raftery, A.E.: Detecting features in spatial point processes with clutter via model-based clustering. J. Am. Stat. Assoc. 93(441), 209–302 (1998)

    Article  MATH  Google Scholar 

  • Day, N.: Estimating the components of a mixture of normal distributions. Biometrika 56(3), 463–474 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  • de Veaux, R.: Mixtures of linear regressions. Comput. Stat. Data Anal. 8(3), 227–245 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  • DeSarbo, W., Cron, W.: A maximum likelihood methodology for clusterwise linear regression. J. Classif. 5(2), 249–282 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  • Fritz, H., García-Escudero, L., Mayo-Iscar, A.: A fast algorithm for robust constrained clustering. Comput. Stat. Data Anal. 61, 124–136 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Gallegos, M., Ritter, G.: Trimmed ML estimation of contaminated mixtures. Sankhya A 71, 164–220 (2009)

    MathSciNet  MATH  Google Scholar 

  • García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A general trimming approach to robust cluster analysis. Ann. Stat. 36(3), 1324–1345 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • García-Escudero, L., Gordaliza, A., San Martín, R., Mayo-Iscar, A.: Robust clusterwise linear regression through trimming. Comput. Stat. Data Anal. 54(12), 3057–3069 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Exploring the number of groups in robust model-based clustering. Stat. Comput. 21(4), 585–599 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • García-Escudero, L., Gordaliza, A., Mayo-Iscar, A.: A constrained robust proposal for mixture modeling avoiding spurious solutions. Adv. Data Anal. Classif. 8(1), 27–43 (2014)

    Article  MathSciNet  Google Scholar 

  • García-Escudero, L., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Avoiding spurious local maximizers in mixture modelling. Stat. Comput. 25(3), 619–633 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Gershenfeld, N.: Nonlinear inference and cluster-weighted modeling. Ann. N. Y. Acad. Sci. 808(1), 18–24 (1997)

    Article  Google Scholar 

  • Gershenfeld, N., Schoner, B., Metois, E.: Cluster-weighted modelling for time-series analysis. Nature 397, 329–332 (1999)

    Article  Google Scholar 

  • Greselin, F., Ingrassia, S.: Constrained monotone EM algorithms for mixtures of multivariate \(t\) distributions. Stat. Comput. 20(1), 9–22 (2010)

    Article  MathSciNet  Google Scholar 

  • Grün, B., Leisch, F.: Flexmix version 2: finite mixtures with concomitant variables and varying and constant parameters. J. Stat. Softw. 28(4), 1–35 (2008)

    Article  Google Scholar 

  • Hathaway, R.: A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann. Stat. 13(2), 795–800 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  • Hennig, C.: Fixed point clusters for linear regression: computation and comparison. J. Classif. 19(2), 249–276 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Huber, P.J.: Robust Statistics. Wiley, New York (1981)

    Book  MATH  Google Scholar 

  • Ingrassia, S., Rocci, R.: Constrained monotone EM algorithms for finite mixture of multivariate Gaussians. Comput. Stat. Data Anal. 51, 5339–5351 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Ingrassia, S., Minotti, S.C., Vittadini, G.: Local statistical modeling via the Cluster-Weighted approach with elliptical distributions. J. Classif. 29(3), 363–401 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Ingrassia, S., Minotti, S.C., Punzo, A.: Model-based clustering via linear Cluster-Weighted models. Comput. Stat. Data Anal. 71, 159–182 (2014)

    Article  MathSciNet  Google Scholar 

  • Ingrassia, S., Punzo, A., Vittadini, G., Minotti, S.C.: The generalized linear mixed Cluster-Weighted model. J. Classif. 32(1), 85–113 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Krasker, W., Welsch, R.: Efficient bounded-influence regression estimation. J. Am. Stat. Assoc. 379(77), 595–604 (1992)

    MathSciNet  MATH  Google Scholar 

  • McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2004)

  • Neykov, N., Filzmoser, P., Dimova, R., Neytchev, P.: Robust fitting of mixtures using the trimmed likelihood estimator. Comput. Stat. Data Anal. 52(1), 299–308 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Ritter, G.: Robust Cluster Analysis and Variable Selection. CRC Press, Boca Raton (2014)

    MATH  Google Scholar 

  • Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley, New York (1987)

    Book  MATH  Google Scholar 

  • Rousseeuw, P.J., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3), 212–223 (1999)

    Article  Google Scholar 

  • Schlittgen, R.: A weighted least-squares approach to clusterwise regression. Adv. Stat. Anal. 95(2), 205–217 (2011)

    Article  MathSciNet  Google Scholar 

  • Song, W., Yao, W., Xing, Y.: Robust mixture regression model fitting by Laplace distribution. Comput. Stat. Data Anal. 71, 128–137 (2014)

    Article  MathSciNet  Google Scholar 

  • van der Vaart, A., Wellner, J.: Weak Convergence and Empirical Processes, Springer Series in Statistics. Springer, New York (1996)

    Book  MATH  Google Scholar 

  • Wedel, M.: Glimmix 2.0 User’s Manual. ProGamma, Groningen (2000)

    Google Scholar 

  • Yao, W., Wei, Y., Yu, C.: Robust mixture regression using the \(t\)-distribution. Comput. Stat. Data Anal. 71, 116–127 (2014)

    Article  MathSciNet  Google Scholar 

Download references


This research is partially supported the Spanish Ministerio de Economía y Competitividad and FEDER, Grant MTM2014-56235-C2-1-P, by Consejería de Educación de la Junta de Castilla y León, Grant VA212U13, by Grant FAR 2014 from the University of Milano-Bicocca, and by Grant FIR 2014 from the University of Catania. The authors also thank the associated editor and the anonymous referees for many valuable suggestions that greatly improved the article.

Author information

Authors and Affiliations


Corresponding author

Correspondence to F. Greselin.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (txt 1 KB)

Supplementary material 2 (pdf 8030 KB)



The following section is organized into four parts: part A contains technical lemmas useful for the proof of the existence of the maximizer \({\varvec{\theta }}\) for \(L({\varvec{\theta }},\,P)\) (Proposition 3.1) which is established in part B; part C shows preliminary results needed to show the consistency of \(\hat{{\varvec{\theta }}}\) as an estimator for \({\varvec{\theta }}\) (Proposition 3.2), which is then proved in part D.

1.1 Part A: preliminary results in view of Proposition 3.1

Four technical lemmas will be needed before attacking the proof of Proposition 3.1.

First of all, let us remark that, given the definition of \(L({\varvec{\theta }},\,P),\) there exist sequences \(\{{\varvec{\theta }}_n\}_{n=1}^{\infty }\) with

$$\begin{aligned} {\varvec{\theta }}_n= & {} \left( \pi _1^n,\ldots ,\pi _G^n,\,{\varvec{\mu }}_1^n,\ldots ,{\varvec{\mu }}_G^n,\, {\varvec{\varSigma }}_1^n,\ldots ,{\varvec{\varSigma }}_G^n,\,b_1^{0,n},\ldots ,\right. \nonumber \\&\left. b_G^{0,n},{{\mathbf {b}}}_1^n,\ldots ,{{\mathbf {b}}}_G^n,\,\sigma _1^{2,n},\ldots , \sigma _G^{2,n}\right) , \end{aligned}$$

and \({\varvec{\theta }}_n \in \varTheta _{c_X,c_{\varepsilon }}\) and such that

$$\begin{aligned} \lim _{n\rightarrow \infty }L\left( {\varvec{\theta }}_n,\,P\right) =\sup _{{\varvec{\theta }}\in \varTheta _{c_X,c_{\varepsilon }}} L({\varvec{\theta }},\,P)>{-}\infty \end{aligned}$$

(the boundedness from below is obtained just by considering the set A as being a ball centered at \((\mathbf {0},\,0)\) with \(P[A] \ge 1-\alpha ,\,\pi _1=1,\,{\varvec{\mu }}_1=\mathbf {0},\,{\varvec{\varSigma }}_1=I_d,\, b_1^0=0\) and \({{\mathbf {b}}}_1=\mathbf {0}\)).

The proof of the existence will be done by proving that we can obtain a convergent subsequence extracted from \(\{{\varvec{\theta }}_n\}_{n=1}^{\infty }\) satisfying (12), and whose limit \({\varvec{\theta }}_0\) is optimal for P.

Let us begin with Lemma 1, which provides a uniformly bounded representation of the regression coefficients, even in case of local collinearity, without loosing their properties in the evaluation of the target function.

Lemma 1

Let \(\{b_n^0\}_{n=1}^{\infty }\) be a sequence in \(\mathbb {R},\, \{{{\mathbf {b}}}_n\}_{n=1}^{\infty }\) be a sequence in \(\mathbb {R}^d\) and \(\{A_n\}_{n=1}^{\infty }\) be a sequence of sets in \(\mathbb {R}^{d+1}\) verifying

$$\begin{aligned} \lim \sup _n P\left[ A_n\right] >0, \end{aligned}$$

and such that

$$\begin{aligned} \lim \sup _n E_P \left[ \left| b_n^0+{{\mathbf {b}}}_n^{\prime } {{\mathbf {X}}}- Y\right| ^2 I_{A_n}({{\mathbf {X}}},\,Y) \right] <\infty . \end{aligned}$$

Then, we can extract subsequences \(\{b_{n_k}^0\}_{k=1}^{\infty },\, \{{{\mathbf {b}}}_{n_k}\}_{k=1}^{\infty }\) and \(\{A_{n_k}\}_{k=1}^{\infty }\) from them and define new sequences \(\{d_k^0\}_{k=1}^{\infty },\, \{{{\mathbf {d}}}_k\}_{k=1}^{\infty }\) and \(\{D_k\}_{k=1}^{\infty }\) which satisfy \(D_k \subseteq A_{n_k},\,P[A_{n_k} {\setminus } D_k]\rightarrow 0,\,d_{n_k}^0 \rightarrow d^0\in \mathbb {R},\) \({{\mathbf {d}}}_{n_k} \rightarrow {{\mathbf {d}}}\in \mathbb {R}^d\) and such that

$$\begin{aligned}&\left( b_{n_k}^0+{{\mathbf {b}}}_{n_k}^{\prime } {{\mathbf {X}}}- Y\right) I_{D_k}({{\mathbf {X}}},\,Y)\nonumber \\&\quad = \left( d_k^0+{{\mathbf {d}}}_k^{\prime } {{\mathbf {X}}}- Y\right) I_{D_k}({{\mathbf {X}}},\,Y),\quad P\text {-a.s.}, \end{aligned}$$

for every \(k\ge 1.\)


To simplify the proof, w.l.o.g., we will use the same notation for the subsequences as that used for the original sequences. If the sequences \(\{b_n^0\}_{n=1}^{\infty }\) and \(\{{{\mathbf {b}}}_n\}_{n=1}^{\infty }\) are bounded, then we just need to extract convergent subsequences and set \(D_n=A_n.\) So, let us assume that either one or both sequences are unbounded, and consider a sequence of compact sets \(\{K_n\}_{n=1}^{\infty }\) such that \(K_n\uparrow \mathbb {R}^{d+1}.\) Let \(\{{{\mathbf {v}}}_{n_l}\}_{l=1}^{d}\) be the normalized eigenvectors obtained from the spectral decomposition of the matrices \(\{\text {Var}_P[{{\mathbf {X}}}/A_n\cap K_n]\}_{n=1}^{\infty }\) (we use \(E_P[\cdot /A]\) and \(\text {Var}_P[\cdot /A]\) for denoting \(E_P[\cdot /({{\mathbf {X}}},\,Y)\in A]\) and \(\text {Var}_P[\cdot /({{\mathbf {X}}},\,Y)\in A]\)).

Now, let us suppose that there exists a direction \({{\mathbf {v}}}_{n_l}\) such that \(\text {Var}_P[{{\mathbf {v}}}_{n_l}^{\prime }{{\mathbf {X}}}/A_n\cap K_n]\rightarrow 0\) then take H with \(0\le H < d\) and such that \(\text {Var}_P[{{\mathbf {v}}}_{n_l}^{\prime }{{\mathbf {X}}}/A_n\cap K_n]\rightarrow 0\) for every \(l\ge H+1,\) after a possible reordering of the units. In this case, there also exist points \(\{{{\mathbf {u}}}_{n_l}\}_{l=H+1}^{d}\) in \(\mathbb {R}^d\) and a sequence \(\varepsilon _n\downarrow 0\) which must satisfy \(E_P[|{{\mathbf {v}}}_{n_l}^{\prime }({{\mathbf {X}}}-{{\mathbf {u}}}_{n_l})|>\varepsilon _n/A_n\cap K_n]\rightarrow 0\) for every \(l\ge H+1.\) The \({{\mathbf {v}}}_{n_l}\) are bounded (unitary vectors) and the \({{\mathbf {u}}}_{n_l}\) must be bounded too (because, otherwise, \({{\mathbf {X}}}\) would not be tight). Therefore, there exist subsequences, that will be denoted as the original ones, such that \({{\mathbf {v}}}_{n_l}\rightarrow {{\mathbf {v}}}_l\in \mathbb {R}^d,\, {{\mathbf {u}}}_{n_l}\rightarrow {{\mathbf {u}}}_l\in \mathbb {R}^d\) and \(P[|{{\mathbf {v}}}_l^{\prime }({{\mathbf {X}}}-{{\mathbf {u}}}_l)|>0/A_n\cap K_n]\rightarrow 0\) for every \(l\ge H+1.\)

Let us now define \(D_n=A_n \cap K_n \cap _{l=H+1}^d \{{{\mathbf {v}}}_l^{\prime }({{\mathbf {X}}}-{{\mathbf {u}}}_l)=0\}\) which trivially verifies \(D_n \subset A_n\) and that \(P[A_n {\setminus } D_n]\rightarrow 0.\) We can rewrite

$$\begin{aligned} b_n^0+{{\mathbf {b}}}_n^{\prime } {{\mathbf {x}}}= b_n^0+\sum _{l=1}^H{{\mathbf {b}}}_n^{\prime } {{\mathbf {v}}}_l {{\mathbf {v}}}_l^{\prime }{{\mathbf {x}}}+\sum _{l=H+1}^d{{\mathbf {b}}}_n^{\prime } {{\mathbf {v}}}_l {{\mathbf {v}}}_l^{\prime } {{\mathbf {x}}}, \end{aligned}$$

and set \(d_n^0=b_n^0+\sum _{l=H+1}^d{{\mathbf {b}}}_n^{\prime } {{\mathbf {u}}}_l \) and \({{\mathbf {d}}}_n=\sum _{l=1}^H {{\mathbf {b}}}_n^{\prime }{{\mathbf {v}}}_l {{\mathbf {v}}}_l'\) for \(H>0\) (while we set \({{\mathbf {d}}}_n=\mathbf {0}\) when \(H=0\)). Then (15) trivially holds and it can be shown that \(\{d_n^0\}_{n=1}^{\infty }\) and \(\{{{\mathbf {d}}}_n\}_{n=1}^{\infty }\) are bounded sequences. This follows from the fact that (14) guarantees that \(\{(b_n^0+{{\mathbf {b}}}_n^{\prime } {{\mathbf {X}}}- Y)I_{D_n}({{\mathbf {X}}},\,Y)\}_{n=1}^{\infty }\) is a tight sequence. Notice that we could see that the previous tightness property would be contradicted if any of the \(\{d_n^0\}_{n=1}^{\infty }\) and \(\{{{\mathbf {d}}}_n\}_{n=1}^{\infty }\) were unbounded by seeing that \({{\mathbf {Z}}}=(Z_1,\ldots ,Z_H)\) with \(Z_l={{\mathbf {v}}}_l^{\prime } {{\mathbf {x}}}\) satisfies \(\det (\text {Var}_P[{{\mathbf {Z}}}/A_n\cap K_n])>0\) and \({{\mathbf {d}}}_n^{\prime }{{\mathbf {x}}}=\sum _{l=1}^H {{\mathbf {b}}}_n^{\prime }{{\mathbf {v}}}_l Z_l.\)

Finally, whenever none of the sequences \(\text {Var}_P[{{\mathbf {v}}}_{n_l}^{\prime }{{\mathbf {X}}}/A_n\cap K_n]\) converges to 0, we can consider the representation \(b_n^0+{{\mathbf {b}}}_n^{\prime } {{\mathbf {x}}}= b_n^0+\sum _{l=1}^H{{\mathbf {b}}}_n^{\prime } {{\mathbf {v}}}_l {{\mathbf {v}}}_l^{\prime } {{\mathbf {x}}}\) and the result would be proven in this case, too, following similar arguments as before. \(\square \)

The following Lemma 2 assures that, under the usual assumption on P,  the associated fitted trimmed CWMs could not be arbitrarily close to a degenerated model concentrated on G points, nor on G regression hyperplanes.

Lemma 2

Let P be a distribution in \(\mathbb {R}^{d+1}\) satisfying (PR):

  1. (a)

    For every \(b_g^0 \in \mathbb {R},\, {{\mathbf {b}}}_g \in \mathbb {R}^d\) and \(A \subseteq \mathbb {R}^{d+1}\) with \(P[A]=1-\alpha ,\) there exists \(\delta >0\) such that

    $$\begin{aligned} E_P\left[ \min _{g=1,\ldots ,G}\left| b_g^0+{{\mathbf {b}}}_g^{\prime } {{\mathbf {X}}}- Y\right| ^2 I_A({{\mathbf {X}}},\,Y)\right] \ge \delta . \end{aligned}$$
  2. (b)

    For every set of G points \(\{{{\varvec{\mu }}}_1,\ldots ,{{\varvec{\mu }}}_G\}\subset \mathbb {R}^d\) and \(A \subseteq \mathbb {R}^{d+1}\) with \(P[A]=1-\alpha ,\) there exists \(\delta >0\) such that

    $$\begin{aligned} E_P\left[ \min _{g=1,\ldots ,G}\left\| {{\mathbf {X}}}- {{\varvec{\mu }}}_g \right\| ^2 I_A({{\mathbf {X}}},\,Y)\right] \ge \delta . \end{aligned}$$

Proof of (a)

Let us suppose that \(\delta \) does not exist. Then, we can choose sequences \(\{A_n\}_{n=1}^{\infty },\,\{b_g^{0,n}\}_{n=1}^{\infty }\) and \(\{{{\mathbf {b}}}_g^n\}_{n=1}^{\infty }\) such that

$$\begin{aligned}&E_P\left[ \min _{g=1,\ldots ,G}\left| b_g^{0,n}+\left( {{\mathbf {b}}}_g^n\right) ^{\prime } {{\mathbf {x}}}- y\right| ^2 I_{A_n}({{\mathbf {x}}},\,y)\right] \rightarrow 0\nonumber \\&\quad \text {with}\,P\left[ A_n\right] \rightarrow 1-\alpha . \end{aligned}$$

Moreover, we can replace the sets \(A_n\) in (16), by the data sets

$$\begin{aligned} A_n^{*}= \left\{ ({{\mathbf {x}}},\,y){\text {:}}\,\min _{g=1,\ldots ,G}\left| b_g^{0,n}+\left( {{\mathbf {b}}}_g^n\right) ^{\prime }{{\mathbf {x}}}- y\right| ^2\le \min \left\{ r_{\alpha }^n,\,\varepsilon \right\} \right\} , \end{aligned}$$

where \(r_{\alpha }^n=\inf _u\{P[({{\mathbf {x}}},\,y){\text {:}}\,\min _{g=1,\ldots ,G} |b_g^{0,n}+({{\mathbf {b}}}_g^n)^{\prime } {{\mathbf {x}}}- y|^2\le u]\ge 1- \alpha \}\) and we also have the same convergence as in (16), with \(P[A_n^{*}] \rightarrow 1-\alpha \) for any fixed choice of \(\varepsilon >0.\) Then, take

$$\begin{aligned} A_g^n= & {} \left\{ ({{\mathbf {x}}},\,y)\in A_n^{*}{\text {:}}\,\left| b_g^{0,n}+\left( {{\mathbf {b}}}_g^n\right) ^{\prime } {{\mathbf {x}}}-y\right| \right. \\= & {} \left. \min _{j=1,\ldots ,G} \left| b_j^{0,n}+\left( {{\mathbf {b}}}_j^n\right) ^{\prime } {{\mathbf {x}}}- y\right| \right\} , \end{aligned}$$

and, we can see that there exists at least one g such that \(P[A_g^n]\rightarrow p_g >0\) through a subsequence (because \(P[A_n^{*}]=\sum _{g=1,\ldots ,G} P[A_g^n]\rightarrow 1-\alpha \)). Thus, consider a reordering of \(\{1,\ldots ,G\}\) such that \(P[A_g^n]\rightarrow p_g >0\) for every \(g\in \{1,\ldots ,H\}\) (for an appropriate subsequence, if needed). If \(A_n^{**}=\cup _{g=1}^H A_g^n,\) then

$$\begin{aligned}&E_P\left[ \min _{g=1,\ldots ,G}\left| b_g^{0,n}+\left( {{\mathbf {b}}}_g^n\right) ^{\prime } {{\mathbf {X}}}- Y\right| ^2 I_{A_n^{**}}({{\mathbf {X}}},\,Y)\right] \\&\quad =\sum _{g=1}^H E_P\left[ \left| b_g^{0,n}+\left( {{\mathbf {b}}}_g^n\right) ^{\prime } {{\mathbf {X}}}- Y\right| ^2 I_{A^n_g}({{\mathbf {X}}},\,Y)\right] , \end{aligned}$$

and \(P[A_n^{**}]\rightarrow 1-\alpha .\) For every \(g\in \{1,\ldots ,H\},\) the \(A_g^n,\,b_g^{0,n}\) and \({{\mathbf {b}}}_g^n\) satisfy the conditions needed to apply Lemma 1 and, therefore, we can replace them by \(D_g^n,\,d_g^{0,n}\) and \({{\mathbf {d}}}_g^n\) satisfying \(D_g^n\subset A_g^n,\, P[A_g^n{\setminus } D_g^n]\rightarrow 0,\,d_g^{0,n}\rightarrow d_g^0\in \mathbb {R}\) and \({{\mathbf {d}}}_g^n\rightarrow {{\mathbf {d}}}_g^0\in \mathbb {R}^d\) and (15).

Now, take \(B_n=\cup _{g=1,\ldots ,H}D_g^n\cap \{({{\mathbf {x}}},\,y){\text {:}}\,\min _{g=1,\ldots ,G}|d_g^{0,n}+({{\mathbf {d}}}_g^n)^{\prime } {{\mathbf {x}}}- y|^2\le \varepsilon \}\) for a fixed \(\varepsilon ,\) with \(P[B_n]\rightarrow 1-\alpha .\) We thus have the pointwise convergence

$$\begin{aligned}&\min _{g=1,\ldots ,H}\left| d_g^{0,n}+\left( {{\mathbf {d}}}_g^n\right) ^{\prime } {{\mathbf {x}}}- y\right| ^2 I_{B_n}({{\mathbf {x}}},\,y)\\&\quad \rightarrow \min _{g=1,\ldots ,H}\left| d_g^{0}+\left( {{\mathbf {d}}}_g^0\right) ^{\prime } {{\mathbf {x}}}- y\right| ^2 I_{B_0}({{\mathbf {x}}},\,y), \end{aligned}$$

for any \(B_0\subset \mathbb {R}^{d+1}\) with \(P[B_0]=1- \alpha ,\) and the uniform bound \( \min _{g=1,\ldots ,H}|d_g^{0,n}+({{\mathbf {d}}}_g^n)^{\prime } {{\mathbf {X}}}- Y|^2 I_{B_n}({{\mathbf {x}}},\,y) \le \varepsilon .\) Then, the dominated convergence theorem implies

$$\begin{aligned}&E_p \left[ \min _{g=1,\ldots ,H}\left| d_g^{0,n}+\left( {{\mathbf {d}}}_g^n\right) ^{\prime } {{\mathbf {X}}}- Y\right| ^2 I_{B_n}({{\mathbf {X}}},\,Y)\right] \\&\quad \rightarrow E_p \left[ \min _{g=1,\ldots ,H}\left| d_g^{0}+\left( {{\mathbf {d}}}_g^0\right) ^{\prime } {{\mathbf {X}}}- Y\right| ^2 I_{B_0}({{\mathbf {X}}},\,Y)\right] . \end{aligned}$$

The latter convergence and (16) would prove that

$$\begin{aligned} E_p \left[ \min _{g=1,\ldots ,H}\left| d_g^{0}+\left( {{\mathbf {d}}}_g^0\right) ^{\prime } {{\mathbf {X}}}- Y\right| ^2 I_{B_0}({{\mathbf {X}}},\,Y)\right] =0, \end{aligned}$$

implying that the distribution P is concentrated on G regression hyperplanes after removing a proportion \(\alpha \) of the probability mass and this would contradict (PR). \(\square \)

Proof of (b)

The proof of this results mimics the steps followed in the proof of (a). We start by assuming the existence of subsequences \(\{A_n\}_{n=1}^{\infty }\) and \(\{{\varvec{\mu }}_g^n\}_{n=1}^{\infty }\) such that

$$\begin{aligned}&E_P\left[ \min _{g=1,\ldots ,G}\left\| {{\mathbf {x}}}- {\varvec{\mu }}_g^n\right\| ^2 I_{A_n}({{\mathbf {x}}},\,y)\right] \rightarrow 0 ~\text {with}\,P\left[ A_n\right] \rightarrow 1-\alpha , \end{aligned}$$

and we would end up by seeing that the support \({{\mathbf {X}}}\) is concentrated in G points in \(\mathbb {R}^d.\) In fact, the proof is easier because only the tightness of P is needed (Lemma 1 is no longer required, here). \(\square \)

Now, since \([0,\,1]^G\) is a compact set, we can trivially choose a subsequence of \(\{{\varvec{\theta }}_n\}_{n=1}^{\infty }\) such that \( \pi _g^n \rightarrow \pi _g \in [0,\,1]\,\text {for}\,1\le g \le G.\) With respect to the scatter matrices and the variances of the error terms, we have the following possibilities:

$$\begin{aligned}&(\text {S1}) {\varvec{\varSigma }}_g^n \rightarrow {\varvec{\varSigma }}_g\,\text {for}\,1\le g \le G\,\text {with}\,{\varvec{\varSigma }}_g\,\text {being p.s.d. matrices},\\&(\text {S2}) \min \nolimits _{g=1,\ldots ,G} \min \nolimits _{l=1,\ldots ,d} \lambda _l({\varvec{\varSigma }}_g^n) \rightarrow \infty ,\\&(\text {S3}) \max \nolimits _{g=1,\ldots ,G} \max \nolimits _{l=1,\ldots ,d} \lambda _l({\varvec{\varSigma }}_g^n) \rightarrow 0, \\&(\text {V1}) \sigma _g^{2,n} \rightarrow \sigma _g^2\,\text {for}\,1\le g \le G\, \text {with}\, \sigma _g >0,\\&(\text {V2}) \min \nolimits _{g=1,\ldots ,G} \sigma _g^{2,n} \rightarrow \infty , \\&(\text {V3}) \max \nolimits _{g=1,\ldots ,G} \sigma _g^{2,n} \rightarrow 0. \end{aligned}$$

Given that \({\varvec{\theta }}_n\in \varTheta _{c_X,c_{\varepsilon }},\) only one of the convergences in S1–3 and only one in V1–3 are possible, and the following Lemma 3 will further delimitate to the bounded results, based on constraints (5) and (6).

Lemma 3

If \(\{{\varvec{\theta }}_n\}_{n=1}^{\infty }\subset \varTheta _{c_X,c_{\varepsilon }}\) converges toward the supremum of \(L({\varvec{\theta }},\,P),\) and (PR) holds for P,  then only convergences (S1) and (V1) are possible.


We have that \(L({\varvec{\theta }}_n;\,P)\) can be bounded from above by

$$\begin{aligned}&- \frac{1}{2}\left[ \log \left( \min _g \sigma _g^{2,n} \right) P\left[ A\left( {\varvec{\theta }}_n\right) \right] \right. \\&\quad \left. +\frac{E_P[\min _g|b_g^{0,n}+({{\mathbf {b}}}_g^n)^{\prime }{{\mathbf {X}}}-Y|^2 I_{A({\varvec{\theta }}_n)}({{\mathbf {X}}},\,Y)]}{\max _g \sigma _g^{2,n}} \right] \\&- \frac{1}{2}\left[ \log \left( \min _g \min _l \lambda _l\left( {\varvec{\varSigma }}_g^n\right) \right) P\left[ A\left( {\varvec{\theta }}_n\right) \right] d\right. \\&\quad \left. +\frac{E_P[\min _g\Vert {{\mathbf {X}}}-{\varvec{\mu }}_g^n\Vert ^2I_{A({\varvec{\theta }}_n)}({{\mathbf {X}}},\,Y)]}{\max _g \max _l \lambda _l({\varvec{\varSigma }}_g^n) } \right] +C, \end{aligned}$$

where C is a constant value, not depending on \({\varvec{\theta }}_n.\)

Therefore, given that \({\varvec{\theta }}_n\in \varTheta _{c_X,c_{\varepsilon }},\) we see that the possible convergence of \(L({\varvec{\theta }}_n;\,P)\) would clearly depend on those for the sequences

$$\begin{aligned}&\log \left( \frac{\sigma _n^2}{c_{\varepsilon }} \right) P\left[ A\left( {\varvec{\theta }}_n\right) \right] \nonumber \\&\quad +\,E_P\left[ \min _g \left| b_g^{0,n}+\left( {{\mathbf {b}}}_g^n\right) ^{\prime } {{\mathbf {X}}}-Y\right| ^2 I_{A({\varvec{\theta }}_n)}({{\mathbf {X}}},\,Y)\right] \frac{1}{\sigma _n^2}, \end{aligned}$$


$$\begin{aligned}&\log \left( \frac{\lambda _n}{c_X} \right) P\left[ A\left( {\varvec{\theta }}_n\right) \right] d\nonumber \\&\quad +\,E_P\left[ \min _g \left\| {{\mathbf {X}}}-{\varvec{\mu }}_g^n\right\| ^2 I_{A({\varvec{\theta }}_n)}({{\mathbf {X}}},\,Y)\right] \frac{1}{\lambda _n}, \end{aligned}$$

where \(\lambda _n=\max _{g=1,\ldots ,G} \max _{l=1,\ldots ,d} \lambda _l({\varvec{\varSigma }}_g^n)\) and \(\sigma _n^2=\max _{g=1,\ldots ,G} \sigma _g^{2,n}.\)

On the other hand, Lemma 2 implies that a constant \(\delta >0\) can be chosen such that \(E_P[\min _g |b_g^{0,n}+({{\mathbf {b}}}_g^n)^{\prime } {{\mathbf {X}}}-Y|^2 I_{A_n}({{\mathbf {X}}},\,Y)]\) and \(E_P[\min _g \Vert {{\mathbf {X}}}-{\varvec{\mu }}_g\Vert ^2 I_{A_n}({{\mathbf {X}}},\,Y)]\) in (17) and (18) are uniformly bounded from below by \(\delta .\) Therefore, other convergences different from (S1) or (V1) would imply that \(\lim _{n\rightarrow \infty }L({\varvec{\theta }}_n,\,P)={-}\infty \) and this would contradict (12). \(\square \)

Lemma 4, stated below, shows that we can always find a subsequence \(\{{\varvec{\theta }}_n\}_{n=1}^\infty \) with converging parameters for at least one mixture component, with weight \(\pi _g^n\) converging toward a strictly positive value.

Lemma 4

There exists a sequence \(\{{\varvec{\theta }}_n\}_{n=1}^{\infty }\) converging toward the supremum of \(L({\varvec{\theta }},\,P)\) and there exists H with \(1\le H\le G\) such that

$$\begin{aligned}&{\varvec{\mu }}_g^n \rightarrow {\varvec{\mu }}_g, \quad b_g^{0,n}\rightarrow b_g^0, \quad {{\mathbf {b}}}_g^n \rightarrow {{\mathbf {b}}}_g ~{{\text {and}}}\quad \pi _g^n \rightarrow \pi _g >0 \\&\text {for every}\,\quad g\le H, \end{aligned}$$

and such that the corresponding \(\{A({\varvec{\theta }}_n)\}_{n=1}^{\infty }\) sets are uniformly bounded.


Let us start from any \(\{{\varvec{\theta }}_n\}_{n=1}^{\infty }\) converging toward the supremum of \(L({\varvec{\theta }},\,P),\) and take \(A_n=A({\varvec{\theta }}_n)\) and

$$\begin{aligned} A_g^n=\left\{ ({{\mathbf {x}}},\,y)\in A_n{\text {:}}\, D_g({{\mathbf {x}}},\,y;\,{\varvec{\theta }})=\max _{j=1,\ldots ,G} D_j({{\mathbf {x}}},\,y;\,{\varvec{\theta }})\right\} , \end{aligned}$$

for \(1\le g \le G.\) Since \(P[A_g^n]\in [0,\,1],\) there exists a subsequence, denoted as the original one, such that each \(P[A_g^n]\) converges for \(1\le g \le G.\) Moreover, after a proper reordering of the components of \({\varvec{\theta }}_n,\) there exists \(H^{*}\ge 1\) such that \(P[A_g^n]\rightarrow p_g>0\) for \(1\le g \le H^{*}.\) Note that this \(H^{*}\) does exist because otherwise we would have \(P[A_n]=\sum _{g=1}^GP[A_g^n]\rightarrow 0.\)

We can also find a convergent subsequence of \({\varvec{\mu }}_g^n\) for every \(g\le H^{*}.\) Otherwise, for every \(\eta \) with \(0<\eta <p_g,\) we could take a ball \(B_g\) centered at \((\mathbf {0},\,0)\) with \(P[B_g]>1-p_g+\eta \) and such that there exists \(n_0\) with \(P[B_g\cap A_g^n]>\eta /2\) when \(n\ge n_0.\) Consequently, we would have \(E_P[ \Vert {{\mathbf {X}}}-{\varvec{\mu }}_g^n \Vert ^2 I_{A_g^n}]\ge E_P[ \Vert {{\mathbf {X}}}-{\varvec{\mu }}_g^n \Vert ^2 I_{B_g\cap A_g^n}] \rightarrow \infty \) which contradicts (12). Note that the contributions of the other terms to \(L({\varvec{\theta }}_n,\,P)\) are controlled, because of Lemma 3.

From (12), we have \(\lim \sup _n E_P[ |b_g^{0,n}+({{\mathbf {b}}}_g^n)^{\prime } {{\mathbf {X}}}-Y|^2 I_{A_g^n}({{\mathbf {X}}},\,Y)]<\infty .\) This, together with the fact that \(\lim \sup _n P[A_g^n]=p_g>0\) for \(g \le H^{*},\) allows us to apply again Lemma 1 to replace the \(\{b_g^{0,n}\},\,\{{{\mathbf {b}}}_g^n\}\) and \(\{A_g^n\}\) sequences by appropriated convergent sequences \(\{d_g^{0,n}\},\,\{{{\mathbf {d}}}_g^n\}\) and \(\{D_g^n\}.\) These convergences also trivially imply that \(\pi _g^n \rightarrow \pi _g >0\) for \(g\le H^{*}.\)

Other g values could also satisfy these convergences (through subsequences and possible alternative representations). In this case, we consider \(H\ge H^{*}\) such that all the convergences in the statement of this lemma hold for \(g\le H.\)

To see that the \(\{A({\varvec{\theta }}_n)\}_{n=1}^{\infty }\) are uniformly bounded, recall that \(A({\varvec{\theta }}_n)=\{({{\mathbf {x}}},\,y){\text {:}}\,D({{\mathbf {x}}},\,y;\,{\varvec{\theta }}_n)\ge R({\varvec{\theta }}_n,\,P)\}\) and let us introduce

$$\begin{aligned}&\widetilde{R}\left( {\varvec{\theta }}_n,\,P\right) \\&\quad =\sup _u\left\{ P\left[ \max _{1\le g\le H} D_g\left( {{\mathbf {X}}},\,Y;\,{\varvec{\theta }}_n\right) \ge u\right] \ge 1-\alpha \right\} . \end{aligned}$$

Given that \(D({{\mathbf {x}}},\,y;\,{\varvec{\theta }}_n)\ge \max _g D_g({{\mathbf {x}}},\,y;\,{\varvec{\theta }}_n),\) we trivially have the bound \(\widetilde{R}({\varvec{\theta }}_n,\,P) \le R({\varvec{\theta }}_n,\,P).\) Moreover, \(\pi _g^n,\,{\varvec{\mu }}_g^n,\,{\varvec{\varSigma }}_g^n,\,b_g^{0,n},\, {{\mathbf {b}}}_g^n,\,\sigma _g^{2,n}\) are convergent sequences when \(g\le H\) and, then, we can also find a strictly positive constant \(R_H\) satisfying \( 0< R_H \le \widetilde{R}({\varvec{\theta }}_n,\,P) \le R({\varvec{\theta }}_n,\,P).\) The sets \(B_n=\{({{\mathbf {x}}},\,y){\text {:}}\,\max _{g\le H} D_g({{\mathbf {x}}},\,y;\,{\varvec{\theta }}_n)\ge R_H\}\) satisfy that \(A_n \subseteq B_n\) and all these \(B_n\) sets are uniformly bounded due to the uniform continuity of the set functions \(\{({{\mathbf {x}}},\,y) \mapsto \max _{g\le H} D_g({{\mathbf {x}}},\,y;\,{\varvec{\theta }}_n)\}_{n=1}^{\infty }\) and because the parameters corresponding to the first H groups in \(\{{\varvec{\theta }}_n\}_{n=1}^{\infty }\) are uniformly bounded. \(\square \)

Having established these crucial findings, we are ready to prove the existence result.

1.2 Part B: proof of Proposition 3.2.1

Let us start from a sequence \(\{{\varvec{\theta }}_n\}_{n=1}^{\infty }\) converging toward the supremum of \(L({\varvec{\theta }},\,P).\) Thanks to Lemma 2, we know that there exists a subsequence of \(\{{\varvec{\theta }}_n\}_{n=1}^{\infty }\) with \({\varvec{\varSigma }}_g^n \rightarrow {\varvec{\varSigma }}_g\) and \(\sigma _g^{2,n} \rightarrow \sigma _g^2\) for \(1\le g \le G.\) Moreover, by applying Lemma 4, a further subsequence (with a proper modification, if needed) can be obtained that also verifies \( {\varvec{\mu }}_g^n \rightarrow {\varvec{\mu }}_g, b_g^{0,n}\rightarrow b_g^0,\, {{\mathbf {b}}}_g^n \rightarrow {{\mathbf {b}}}_g\) and \(\pi _g^n \rightarrow \pi _g \) with \(\pi _g>0\) for any g with \(g\le H\) and \(1< H \le G.\) Let us assume that there exists some g such that \({\varvec{\mu }}_g^n\) is not bounded, or such that a bounded representation for \(b_g^{0,n}\) and \({{\mathbf {b}}}_g^n\) (in the sense that \(\lim \sup _n E_P[ |b_g^{0,n}+({{\mathbf {b}}}_g^n)^{\prime } {{\mathbf {X}}}-Y|^2 I_{A_n}({{\mathbf {X}}},\,Y)]=\infty \)) does not exist. We will see that we necessarily have that \(\pi _g^n \rightarrow 0\) and, consequently, the role played by \({\varvec{\mu }}_g^n,\, b_g^{0,n}\) and \({{\mathbf {b}}}_g^n\) is irrelevant, given that they do not modify the value taken by the target function. Therefore, we could modify them by using other arbitrary convergent parameter values (of course, satisfying the desired constraints) and the proof would be done.

To prove that, let us consider

$$\begin{aligned} M_n= & {} E_P \left[ \left( \log \left( \sum _{g=1}^G D_g\left( {{\mathbf {X}}},\,Y;\,{\varvec{\theta }}_n\right) \right) \right. \right. \\&\left. \left. - \log \left( \sum _{g=1}^{H} D_g\left( {{\mathbf {X}}},\,Y;\,{\varvec{\theta }}_n\right) \right) \right) I_{A_n}({{\mathbf {X}}},\,Y)\right] . \end{aligned}$$

By considering the same \(R_H>0\) used in the proof of Lemma 4 and the fact that \(\log (1+x)\le x,\) we can see that

$$\begin{aligned} M_n \le \sum _{g=H+1}^{G}E_P \left[ \frac{D_g({{\mathbf {X}}},\,Y;\,{\varvec{\theta }}_n)}{R_H} I_{A_n}({{\mathbf {X}}},\,Y)\right] . \end{aligned}$$

Then, it is trivial to see that \(M_n \rightarrow 0\) when \({\varvec{\mu }}_g^n\) is not bounded or when no bounded representation for \(b_g^{0,n}\) and \({{\mathbf {b}}}_g^n\) exists for any \(g>H.\) Consequently, if \(\pi _g^n \rightarrow \pi _g >0\) for any \(g>H\) and \({\varvec{\theta }}^{*}\) is the limit of the subsequence \(\{\pi _1^n,\ldots ,\pi _H^n,\,{\varvec{\mu }}_1^n,\ldots ,{\varvec{\mu }}_H^n,\, {\varvec{\varSigma }}_1^n,\ldots ,{\varvec{\varSigma }}_H^n,\, b_1^{0,n},\ldots ,b_H^{0,n},\, {{\mathbf {b}}}_1^n,\ldots , {{\mathbf {b}}}_H^n,\,\sigma _1^{2,n},\ldots ,\sigma _H^{2,n}\}_{n=1}^{\infty },\) we would have that \(\lim _{n\rightarrow \infty } \sup L({\varvec{\theta }}_n;\,P) = L({\varvec{\theta }}^{*};\,P)\) (because \(M_n \rightarrow 0\)) with \(\sum _{j=1}^{H}\pi _j<1.\) Then, we could define a new subsequence \(\{\widetilde{{\varvec{\theta }}}_n\}_{n=1}^{\infty }=\{\tilde{\pi }_1^n,\ldots ,\tilde{\pi }_G^n,\,\tilde{{\varvec{\mu }}}_1^n,\ldots ,\tilde{{\varvec{\mu }}}_G^n, \,\tilde{{\varvec{\varSigma }}}_1^n,\ldots , \tilde{{\varvec{\varSigma }}}_G^n,\, \tilde{b}_1^{0,n},\ldots ,\tilde{b}_G^{0,n},\,\tilde{{{\mathbf {b}}}}_1^n,\ldots , \tilde{{{\mathbf {b}}}}_G^n,\,\tilde{\sigma }_1^{2,n},\ldots ,\tilde{\sigma }_G^{2,n}\}_{n=1}^{\infty }\) with

$$\begin{aligned}&\widetilde{\pi }^n_{g}=\frac{\pi _{g}^n}{\sum _{g=1}^{k}\pi _{j}^n}\quad \text {for}\,1\le g\le H\quad \\&\quad \text {and}\quad \widetilde{\pi }_{H+1}^n=\cdots =\widetilde{\pi }_{G}^n=0, \end{aligned}$$

with \( \widetilde{{\varvec{\mu }}}_g^n = {\varvec{\mu }}_g^n,\,\widetilde{b}_g^{0,n} = b_g^{0,n},\,\widetilde{{{\mathbf {b}}}}_g^n = {{\mathbf {b}}}_g^n,\,\widetilde{{\varvec{\varSigma }}}_g^n = {\varvec{\varSigma }}_g^n\) and \(\widetilde{\sigma }_g^{2,n} = \sigma _g^{2,n}\,\text {for}\,1\le g\le H \) and parameters arbitrarily chosen when \(g>H\) (only satisfying the required constraints). We finally could see that \( \lim _{n\rightarrow \infty } \sup L(\widetilde{{\varvec{\theta }}}_n;\,P)< \lim _{n\rightarrow \infty } \sup L({\varvec{\theta }}_n;\,P) \) and this would contradict the optimality stated in the hypothesis of the present lemma. \(\square \)

1.3 Part C: preliminary results in view of Proposition 3.2

Before starting the proof of the consistency of the solution for the sample problem to the population solution, we introduce some notation, and state some useful results. Let \(\{\hat{{\varvec{\theta }}}_n\}_{n=1}^{\infty }=\{\hat{\pi }_1^n,\ldots ,\hat{\pi }_G^n\), \(\hat{{\varvec{\mu }}}_1^n,\ldots ,\hat{{\varvec{\mu }}}_G^n\), \(\hat{{\varvec{\varSigma }}}_1^n,\ldots ,\hat{{\varvec{\varSigma }}}_G^n\), \(\hat{b}_1^{0,n}, \ldots ,\hat{b}_G^{0,n}\), \(\hat{{{\mathbf {b}}}}_1^n,\ldots , \hat{{{\mathbf {b}}}}_G^n\), \(\hat{\sigma }_1^{2,n},\ldots ,\hat{\sigma }_G^{2,n}\}_{n=1}^{\infty }\subset \varTheta _{c_X,c_{\varepsilon }}\) denote a sequence of empirical estimators obtained by solving the empirical problems defined from the sequence of empirical measures \(\{P_n\}_{n=1}^{\infty }.\)

First, we prove that there exists a compact set \(K\subset \varTheta _{c_X,c_{\varepsilon }}\) such that \(\hat{{\varvec{\theta }}}_n \in K\) with probability 1. This is done through Lemmas 5 and 6, whose proofs are quite straightforward adaptations of the previously given proofs of Lemmas 14. In those adaptations, appropriate Glivenko–Cantelli class of functions must be considered and the class of balls in \(\mathbb {R}^{d+1}\) (which is a Glivenko–Cantelli class too) is taken to provide bounding compact sets when needed.

Lemma 5

If P satisfies (PR), then only convergences (S1) and (V1) are possible for the \(\hat{{\varvec{\varSigma }}}_g^n\)’s and \(\hat{\sigma }_g^{2,n}\)’s.

Lemma 6

If (PR) holds, then we can choose a sequence \(\{\hat{{\varvec{\theta }}}_n\}_{n=1}^{\infty }\) solving the empirical problem with components \(\hat{{\varvec{\mu }}}_g^n,\,\hat{b}_g^{0,n}\) and \(\hat{{{\mathbf {b}}}}_g^n\) such that their norms are uniformly bounded.

The following two lemmas are the analogous to Lemmas 5 and 6 in García-Escudero et al. (2014). Their proofs mimic the same steps, with the only reformulation of the \(D(\cdot ;\,{\varvec{\theta }})\) functions, which here take into account the conditional distribution on the Y variable.

Lemma 7

Given a compact set \(K\subset \varTheta _{c_X,c_{\varepsilon }},\, B\subset \mathbb {R}^{d+1}\) and \([a,\,b]\subset \mathbb {R},\) the class of functions

$$\begin{aligned} \mathcal {H}:=\left\{ I_B(\cdot )I_{[u,\infty )}(D(\cdot ,\,{\varvec{\theta }}))\log ( D(\cdot ;\,{\varvec{\theta }})){\text {:}}\,{\varvec{\theta }}\in K,u\in [a,\,b] \right\} , \end{aligned}$$

is a Glivenko–Cantelli class.

Lemma 8

Let P be an absolutely continuous distribution with strictly positive density function. Then, for every compact set K,  we have that

$$\begin{aligned} \sup _{{\varvec{\theta }}\in K}\left| R\left( {\varvec{\theta }},\,P_n\right) - R({\varvec{\theta }},\,P)\right| \rightarrow 0,\quad \text {P-a.e.} \end{aligned}$$

In fact, the condition on the existence of a strictly positive density function for P can be removed, but this would imply the use of trimming functions as those introduced in Cuesta-Albertos et al. (1997).

1.4 Part D: proof of Proposition 3.2

Taking into account Lemma 7, the consistency follows from Corollary 3.2.3 in Vaart and Wellner (1996), exactly as it was done in García-Escudero et al. (2008, 2014). Note that Lemmas 5 and 6 guarantee the existence of a compact set K such that \(\{\hat{{\varvec{\theta }}}_n\}_{n=1}^{\infty }\) is included in K with probability 1 and \(R(\hat{{\varvec{\theta }}}_n,\,P_n)\) is also included with probability 1 within an interval \([a,\,b]\) due to Lemma 8. This has been also used to simplify the target function needed to apply the aforementioned result in Vaart and Wellner (1996).\(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

García-Escudero, L.A., Gordaliza, A., Greselin, F. et al. Robust estimation of mixtures of regressions with random covariates, via trimming and constraints. Stat Comput 27, 377–402 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Cluster weighted modeling
  • Mixture of regressions
  • Robustness
  • Trimming
  • Constrained estimation