Skip to main content

New Estimation Method for Mixture of Normal Distributions

  • Chapter
  • First Online:
Robustness in Econometrics

Part of the book series: Studies in Computational Intelligence ((SCI,volume 692))

  • 1703 Accesses

Abstract

Normal mixture models are widely used for statistical modeling of data, including classification and cluster analysis. However the popular EM algorithms for normal mixtures may give imprecise estimates due to singularities or degeneracies. To avoid this, we propose a new two-step estimation method: first truncate the whole data set to tail data sets that contain points belonging to one component normal distribution with very high probability, and obtain initial estimates of parameters; then upgrade the estimates to better estimates recursively. The initial estimates are simply Method of Moments Estimates in this paper. Empirical results show that parameter estimates are more accurate than that with traditional EM and SEM algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Dias JG, Wedel M (2004) An empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods. Stat Comput 14:323–332

    Article  MathSciNet  Google Scholar 

  2. Celeux G, Chauveau D, Diebolt J (1995) On stochastic versions of the EM algorithm. Institute National de Recherche en Informatique et en Automatique, Mars, pp 1–22

    Google Scholar 

  3. Karlis D, Xekalaki E (2003) Choosing initial values for the EM algorithm for finite mixtures. Comput Stat Data Anal 41:577–590

    Article  MathSciNet  MATH  Google Scholar 

  4. Yao W (2013) A note on EM algorithm for mixture models. Stat Probab Lett 83:519–526

    Article  MathSciNet  MATH  Google Scholar 

  5. Chen LS, Prentice RL, Wang P (2014) A penalized EM algorithm incorporating missing data mechanism for gaussian parameter estimation. Biometrics 70:312–322

    Article  MathSciNet  MATH  Google Scholar 

  6. Horrace WC (2005) Notes: some results on the multivariate truncated normal distribution. J Multivariate Anal 94:209–221

    Article  MathSciNet  MATH  Google Scholar 

  7. Horrace WC (2015) Moments of the truncated normal distribution. J Prod Anal 43:133–138

    Article  Google Scholar 

  8. del Castillo J, Daoudi J (2009) The mixture of left–right truncated normal distributions. J Stat Plann Infer 139:3543–3551

    Article  MathSciNet  MATH  Google Scholar 

  9. Emura T, Konno Y (2014) Erratum to: multivariate normal distribution approaches for dependently truncated data. Stat Papers 55:1233–1236

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Baokun Li or Tonghui Wang .

Editor information

Editors and Affiliations

Appendix

Appendix

Proof of Theorem 2.1

For convenience, we set \(a=-\infty \). These results also hold true for \(b=\infty \) and \(a<X<b\).

I: Since the variance of normal distribution always exists, by the law of large numbers:

$$\begin{aligned} \frac{1}{m_1}\sum \limits _{i=1}^{m_1}x_i\overset{p}{\rightarrow } \frac{1}{p}I\mu _x=\frac{1}{p}\int _{-\infty }^{b}xf(x|\mu ,\sigma ^2)dx. \end{aligned}$$

Then

$$\begin{aligned} \hat{\mu }_t\approx p_{t-1}\frac{I\mu _x}{p}+I\hat{\mu }_{_{t-1,missing}}, \end{aligned}$$

where

$$\begin{aligned} I\mu _x= & {} \int _{-\infty }^{b}xf(x|\mu ,\sigma ^2)dx\\= & {} \int _{-\infty }^{b}(x-\mu +\mu )f(x|\mu ,\sigma ^2)dx\\= & {} \mu F(b|\mu ,\sigma ^2)-\sigma ^2f(b|\mu ,\sigma ^2). \end{aligned}$$

Similarly,

$$\begin{aligned} \hat{I\mu }_{_{t-1,x}}= & {} \int _{-\infty }^{b}xf(x|\hat{\mu }_{t-1},\hat{\sigma _{t-1}^2})dx\\= & {} \hat{\mu }_{t-1} F(b|\hat{\mu }_{t-1},\hat{\sigma ^2}_{t-1})-\hat{\sigma }^2_{t-1}f(b|\hat{\mu }_{t-1},\hat{\sigma }^2_{t-1}). \end{aligned}$$

Because the following equation is always true:

$$\begin{aligned} \hat{\mu }_{t-1}= & {} \int _{-\infty }^{b}xf(x|\hat{\mu }_{t-1},\hat{\sigma _{t-1}^2})dx+\int _{b}^{+\infty }xf(x|\hat{\mu }_{t-1},\hat{\sigma _{t-1}^2})dx\\= & {} \hat{I\mu }_{_{t-1,x}}+\hat{I\mu }_{_{t-1,missing}}\\= & {} p_{t-1}\frac{\hat{I\mu }_{_{t-1,x}}}{p_{t-1}}+\hat{I\mu }_{_{t-1,missing}}, \end{aligned}$$
$$\begin{aligned} \hat{\mu }_t-\hat{\mu }_{t-1}= & {} \left( p_{t-1}\frac{\hat{I\mu }_x}{p}+\hat{I\mu }_{_{t-1,missing}}\right) -\left( p_{t-1}\frac{\hat{I\mu }_{_{t-1,x}}}{p_{t-1}}+\hat{I\mu }_{_{t-1,missing}}\right) \\= & {} p_{t-1}\left( \left( \mu -\sigma ^2\frac{f(b|\mu ,\sigma ^2)}{F(b|\mu ,\sigma ^2)}\right) -\left( \hat{\mu }_{t-1}-\hat{\sigma }^2_{t-1}\frac{f(b|\hat{\mu }_{t-1},\hat{\sigma }_{t-1}^2)}{F(b|\hat{\mu }_{t-1},\hat{\sigma }_{t-1}^2)}\right) \right) \\= & {} p_{t-1}\left( \mu -\hat{\mu }_{t-1}-\sigma ^2\frac{f(b|\mu ,\sigma ^2)}{F(b|\mu ,\sigma ^2)}+\hat{\sigma }^2_{t-1}\frac{f(b|\hat{\mu }_{t-1},\hat{\sigma }_{t-1}^2)}{F(b|\hat{\mu }_{t-1},\hat{\sigma }_{t-1}^2)}\right) . \end{aligned}$$

Suppose \(\hat{\sigma }_{0}^2=\sigma ^2\), let \(g(\mu )=\sigma ^2\frac{f(b|\mu ,\sigma ^2)}{F(b|\mu ,\sigma ^2)}\), then

$$\begin{aligned} g'(\mu )=\sigma ^2\left( \frac{f(b|\mu ,\sigma ^2)}{F(b|\mu ,\sigma ^2)}\right) '=\frac{sf(s)F(s)-f^2(s)}{F^2(s)}, \end{aligned}$$

where \(s=\frac{b-\mu }{\sigma }\), and f(s), F(s) are the density function and cumulative function of the standard normal distribution. Then

$$\begin{aligned} \hat{\mu }_t-\hat{\mu }_{t-1}=p_{t-1}\left( \mu -\hat{\mu }_{t-1}-g(\mu )+ g(\hat{\mu }_{t-1})\right) . \end{aligned}$$
(3)

By the Lagrange’s mean value theorem we have

$$\begin{aligned} g(\mu )- g(\hat{\mu }_{t-1})=\left( \mu -\hat{\mu }_{t-1} \right) g'(c), \end{aligned}$$

where c is between \(\mu \) and \(\hat{\mu }_{t-1}\). So the above Eq. (3) is

$$\begin{aligned} \hat{\mu }_t-\hat{\mu }_{t-1}=p_{t-1}(1-g'(c))\left( \mu -\hat{\mu }_{t-1}\right) . \end{aligned}$$

Denote \(s_1=\frac{b-c}{\sigma }\). Then \(1-g'(c)>0\) and \(0<F(s_1)(1-g'(c))<1\) are always true. And when \(0.2<F(s_1)<0.8\), \(0<(F(s_1)+0.3)(1-g'(c))<2\).

If \(\mu <\hat{\mu }_{t-1}\), then \(p_{t-1}<F(s_1)<p\), therefor \(0<p_{t-1}(1-g'(c))<F(s_1)(1-g'(c))<1\) is always true. So \(\mu<\hat{\mu }_t<\hat{\mu }_{t-1}\). And then the upgraded estimate sequence converges.

If \(\mu >\hat{\mu }_{t-1}\), then \(\hat{\mu }_t>\hat{\mu }_{t-1}\), and \(p<F(s_1)<p_{t-1}\). When \(p>0.05\), \(0<p_{t-1}(1-g'(c))<1\). Then \(\hat{\mu }_{t-1}<\hat{\mu }_1<\mu \). When \(0.2<p<0.5\), so if \(|p_{t-1}-p|<0.3\), we have \(0<p_{t-1}(1-g'(c))<(F(s_1)+0.3)(1-g'(c))<c\). This implies \(\hat{\mu }_{t-1}<\hat{\mu }_t<\mu +(\mu -\hat{\mu }_{t-1})\).

If \(\hat{\mu }_t>\mu \), that is to say \(\mu<\hat{\mu }_t<\mu +(\mu -\hat{\mu }_{t-1})\), then from the above paragraph the sequence of following upgraded estimate converges. If \(\hat{\mu }_t<\mu \), that is to say \(\hat{\mu }_{t-1}<\hat{\mu }_t<\mu \). Then we could also have the conclusion that the upgraded estimate sequence converges.

So from the above we can conclude that when \(\hat{\sigma }_{0}^2=\sigma ^2\), the upgraded estimate sequence converges. The results hold true for left truncated and both sides truncated normal distributions.

II: Since the variance of normal distribution always exists, by the law of large numbers:

$$\begin{aligned} \frac{\hat{I\sigma }_x^2}{p_{t-1}}\overset{p}{\rightarrow } \frac{1}{p}I\sigma _x^2=\frac{1}{p}\int _{-\infty }^{b}(x-\mu )^2 f(x|\mu ,\sigma ^2)dx \end{aligned}$$

And

$$\begin{aligned} I\sigma _x^2= & {} \int _{-\infty }^{b}(x-\mu )^2 f(x|\mu ,\sigma ^2)dx\\= & {} \sigma ^2\int _{-\infty }^{b}\frac{x-\mu }{\sigma } \frac{1}{\sqrt{2\pi }}\exp \{ -\frac{(x-\mu )^2}{2\sigma ^2}\}d\frac{(x-\mu )^2}{2\sigma ^2}\\= & {} \sigma ^2[F(t)-tf(t)], \end{aligned}$$
$$ \frac{I\sigma _x^2}{p}=\sigma ^2\left( 1-\frac{sf(s)}{F(s)} \right) , $$

where \(s=\frac{b-\mu }{\sigma }\), and f(s), F(s) are the density function and cumulative function of the standard normal distribution.

Similarly:

$$\begin{aligned} \frac{\hat{I\sigma }_{t-1,x}^2}{p_{t-1}}=\hat{\sigma _{t-1}}^2\left( 1-\frac{\hat{t}f(\hat{t})}{F(\hat{t})} \right) , \end{aligned}$$

where \(\hat{s}=\frac{b-\mu }{\hat{\sigma }}\), and \(f(\dot{)}\), \(F(\dot{)}\) are the density function and cumulative function of the standard normal distribution.

Assume \(\hat{\mu }_{0}=\mu \), Let \(g(\sigma ^2)=\sigma ^2\left( 1-\frac{sf(s)}{F(s)} \right) ,\) then

$$ g'(\sigma ^2)=1-\frac{sf(s)}{F(s)} -\frac{1/2 s^3 f(s)+1/2 t f(s)}{F(s)}-\frac{1/2 s^2 f^2(s)}{F^2(s)}. $$

By the Lagrange’s mean value theorem we have

$$ g(\sigma ^2)-g(\hat{\sigma }_{t-1}^2)=(\sigma ^2-\hat{\sigma }_{t-1}^2)g'(d^2), $$

where \(d^2\) is between \(\sigma ^2\) and \(\hat{\sigma }_{t-1}^2\).

And we also have this equation

$$ \hat{\sigma }_{t-1}^2=\hat{I\sigma }^2_{_{t-1,x}}+\hat{I\sigma }^2_{_{t-1,missing}}, $$

So

$$\begin{aligned} \hat{\sigma }_t^2-\hat{\sigma }_{t-1}^2= & {} (\hat{I\sigma }^2_x+\hat{I\sigma }^2_{_{t-1,missing}})-(\hat{I\sigma }^2_{_{t-1,x}}+\hat{I\sigma }^2_{_{t-1,missing}})\\= & {} p_{t-1}\left( \frac{I\sigma ^2_x}{p}-\frac{\hat{I\sigma }^2_{_{t-1,x}}}{p_{t-1}}\right) \\= & {} p_{t-1}g'(d)(\sigma ^2-\hat{\sigma }_{t-1}^2)\\ \end{aligned}$$

Denote \(s_2=\frac{b-\mu }{d}\). Then from R software \(g'(d)>0\) and \(0<F(s_2)g'(d)<1\) are always true. And when \(0<F(s_2)<0.5\), \(0<(F(s_2)+0.5)g'(d)<1\).

If \(\hat{\sigma }_{t-1}^2<\sigma ^2\), then \(p_{t-1}<F(s_2)<p\), then \(0<p_{t-1}g'(d)<F(s_2)g'(d)<1\) is always true. So \(\hat{\sigma }_{t-1}^2<\hat{\sigma }_t^2<\sigma ^2\) is always true. And then the upgrading process of estimators converges.

If \(\hat{\sigma }_{t-1}^2>\sigma ^2\), then \(\hat{\sigma }_1^2<\hat{\sigma }_{t-1}^2 \), \(p<F(s_2)<p_{t-1}\). As \(|p_{t-1}-p|<0.5\), when \(F(s_2)>0.5\), \(0<p_{t-1}g'(d)<1\), when \(F(s_2)<0.5\), \(0<p_{t-1}g'(d)<(F(s_2)+0.5)g'(d)<1\). That is to say \(\sigma ^2<\hat{\sigma }_1^2<\hat{\sigma }_{t-1}^2\). Then we could also have the conclusion that the upgrading process of estimator converges.

So from the above we can conclude that when \(\hat{\mu }_{t-1}=\mu \), the upgrading process of estimators converges to \(\sigma ^2\).

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Hu, Q., Wei, Z., Li, B., Wang, T. (2017). New Estimation Method for Mixture of Normal Distributions. In: Kreinovich, V., Sriboonchitta, S., Huynh, VN. (eds) Robustness in Econometrics. Studies in Computational Intelligence, vol 692. Springer, Cham. https://doi.org/10.1007/978-3-319-50742-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50742-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50741-5

  • Online ISBN: 978-3-319-50742-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics