New Estimation Method for Mixture of Normal Distributions

Hu, Qianfang; Wei, Zheng; Li, Baokun; Wang, Tonghui

doi:10.1007/978-3-319-50742-2_13

Qianfang Hu⁵,
Zheng Wei⁶,
Baokun Li⁵ &
…
Tonghui Wang⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 692))

1703 Accesses

Abstract

Normal mixture models are widely used for statistical modeling of data, including classification and cluster analysis. However the popular EM algorithms for normal mixtures may give imprecise estimates due to singularities or degeneracies. To avoid this, we propose a new two-step estimation method: first truncate the whole data set to tail data sets that contain points belonging to one component normal distribution with very high probability, and obtain initial estimates of parameters; then upgrade the estimates to better estimates recursively. The initial estimates are simply Method of Moments Estimates in this paper. Empirical results show that parameter estimates are more accurate than that with traditional EM and SEM algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Skew Multiple Scaled Mixtures of Normal Distributions with Flexible Tail Behavior and Their Application to Clustering

Article 06 May 2024

Robust mixture regression modeling based on scale mixtures of skew-normal distributions

Article 19 July 2015

Robust mixture regression modeling based on two-piece scale mixtures of normal distributions

Article 23 March 2022

References

Dias JG, Wedel M (2004) An empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods. Stat Comput 14:323–332
Article MathSciNet Google Scholar
Celeux G, Chauveau D, Diebolt J (1995) On stochastic versions of the EM algorithm. Institute National de Recherche en Informatique et en Automatique, Mars, pp 1–22
Google Scholar
Karlis D, Xekalaki E (2003) Choosing initial values for the EM algorithm for finite mixtures. Comput Stat Data Anal 41:577–590
Article MathSciNet MATH Google Scholar
Yao W (2013) A note on EM algorithm for mixture models. Stat Probab Lett 83:519–526
Article MathSciNet MATH Google Scholar
Chen LS, Prentice RL, Wang P (2014) A penalized EM algorithm incorporating missing data mechanism for gaussian parameter estimation. Biometrics 70:312–322
Article MathSciNet MATH Google Scholar
Horrace WC (2005) Notes: some results on the multivariate truncated normal distribution. J Multivariate Anal 94:209–221
Article MathSciNet MATH Google Scholar
Horrace WC (2015) Moments of the truncated normal distribution. J Prod Anal 43:133–138
Article Google Scholar
del Castillo J, Daoudi J (2009) The mixture of left–right truncated normal distributions. J Stat Plann Infer 139:3543–3551
Article MathSciNet MATH Google Scholar
Emura T, Konno Y (2014) Erratum to: multivariate normal distribution approaches for dependently truncated data. Stat Papers 55:1233–1236
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Statistics, Southwestern University of Finance and Economics, Chengdu, China
Qianfang Hu & Baokun Li
Department of Mathematical Sciences, University of Massachusetts-Amherst, Amherst, USA
Zheng Wei
Department of Mathematical Sciences, New Mexico State University, Las Cruces, USA
Tonghui Wang

Authors

Qianfang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Wei
View author publications
You can also search for this author in PubMed Google Scholar
Baokun Li
View author publications
You can also search for this author in PubMed Google Scholar
Tonghui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Baokun Li or Tonghui Wang .

Editor information

Editors and Affiliations

Department of Computer Science, University of Texas at El Paso Department of Computer Science, El Paso, TX, USA
Vladik Kreinovich
Faculty of Economics, Chiang Mai University Faculty of Economics, Chiang Mai, Thailand
Songsak Sriboonchitta
Japan Adv. Inst. of Sci. & Tech. (JAIST) , Ishikawa, Japan
Van-Nam Huynh

Appendix

Proof of Theorem 2.1

For convenience, we set $a=-\infty $. These results also hold true for $b=\infty $ and $a<X<b$.

I: Since the variance of normal distribution always exists, by the law of large numbers:

$$\begin{aligned} \frac{1}{m_1}\sum \limits _{i=1}^{m_1}x_i\overset{p}{\rightarrow } \frac{1}{p}I\mu _x=\frac{1}{p}\int _{-\infty }^{b}xf(x|\mu ,\sigma ^2)dx. \end{aligned}$$

Then

$$\begin{aligned} \hat{\mu }_t\approx p_{t-1}\frac{I\mu _x}{p}+I\hat{\mu }_{_{t-1,missing}}, \end{aligned}$$

where

$$\begin{aligned} I\mu _x= & {} \int _{-\infty }^{b}xf(x|\mu ,\sigma ^2)dx\\= & {} \int _{-\infty }^{b}(x-\mu +\mu )f(x|\mu ,\sigma ^2)dx\\= & {} \mu F(b|\mu ,\sigma ^2)-\sigma ^2f(b|\mu ,\sigma ^2). \end{aligned}$$

Similarly,

$$\begin{aligned} \hat{I\mu }_{_{t-1,x}}= & {} \int _{-\infty }^{b}xf(x|\hat{\mu }_{t-1},\hat{\sigma _{t-1}^2})dx\\= & {} \hat{\mu }_{t-1} F(b|\hat{\mu }_{t-1},\hat{\sigma ^2}_{t-1})-\hat{\sigma }^2_{t-1}f(b|\hat{\mu }_{t-1},\hat{\sigma }^2_{t-1}). \end{aligned}$$

Because the following equation is always true:

$$\begin{aligned} \hat{\mu }_{t-1}= & {} \int _{-\infty }^{b}xf(x|\hat{\mu }_{t-1},\hat{\sigma _{t-1}^2})dx+\int _{b}^{+\infty }xf(x|\hat{\mu }_{t-1},\hat{\sigma _{t-1}^2})dx\\= & {} \hat{I\mu }_{_{t-1,x}}+\hat{I\mu }_{_{t-1,missing}}\\= & {} p_{t-1}\frac{\hat{I\mu }_{_{t-1,x}}}{p_{t-1}}+\hat{I\mu }_{_{t-1,missing}}, \end{aligned}$$

$$\begin{aligned} \hat{\mu }_t-\hat{\mu }_{t-1}= & {} \left( p_{t-1}\frac{\hat{I\mu }_x}{p}+\hat{I\mu }_{_{t-1,missing}}\right) -\left( p_{t-1}\frac{\hat{I\mu }_{_{t-1,x}}}{p_{t-1}}+\hat{I\mu }_{_{t-1,missing}}\right) \\= & {} p_{t-1}\left( \left( \mu -\sigma ^2\frac{f(b|\mu ,\sigma ^2)}{F(b|\mu ,\sigma ^2)}\right) -\left( \hat{\mu }_{t-1}-\hat{\sigma }^2_{t-1}\frac{f(b|\hat{\mu }_{t-1},\hat{\sigma }_{t-1}^2)}{F(b|\hat{\mu }_{t-1},\hat{\sigma }_{t-1}^2)}\right) \right) \\= & {} p_{t-1}\left( \mu -\hat{\mu }_{t-1}-\sigma ^2\frac{f(b|\mu ,\sigma ^2)}{F(b|\mu ,\sigma ^2)}+\hat{\sigma }^2_{t-1}\frac{f(b|\hat{\mu }_{t-1},\hat{\sigma }_{t-1}^2)}{F(b|\hat{\mu }_{t-1},\hat{\sigma }_{t-1}^2)}\right) . \end{aligned}$$

Suppose $\hat{\sigma }_{0}^2=\sigma ^2$, let $g(\mu )=\sigma ^2\frac{f(b|\mu ,\sigma ^2)}{F(b|\mu ,\sigma ^2)}$, then

$$\begin{aligned} g'(\mu )=\sigma ^2\left( \frac{f(b|\mu ,\sigma ^2)}{F(b|\mu ,\sigma ^2)}\right) '=\frac{sf(s)F(s)-f^2(s)}{F^2(s)}, \end{aligned}$$

where $s=\frac{b-\mu }{\sigma }$, and f(s), F(s) are the density function and cumulative function of the standard normal distribution. Then

$$\begin{aligned} \hat{\mu }_t-\hat{\mu }_{t-1}=p_{t-1}\left( \mu -\hat{\mu }_{t-1}-g(\mu )+ g(\hat{\mu }_{t-1})\right) . \end{aligned}$$

(3)

By the Lagrange’s mean value theorem we have

$$\begin{aligned} g(\mu )- g(\hat{\mu }_{t-1})=\left( \mu -\hat{\mu }_{t-1} \right) g'(c), \end{aligned}$$

where c is between $\mu $ and $\hat{\mu }_{t-1}$. So the above Eq. (3) is

$$\begin{aligned} \hat{\mu }_t-\hat{\mu }_{t-1}=p_{t-1}(1-g'(c))\left( \mu -\hat{\mu }_{t-1}\right) . \end{aligned}$$

Denote $s_1=\frac{b-c}{\sigma }$. Then $1-g'(c)>0$ and $0<F(s_1)(1-g'(c))<1$ are always true. And when $0.2<F(s_1)<0.8$, $0<(F(s_1)+0.3)(1-g'(c))<2$.

If $\mu <\hat{\mu }_{t-1}$, then $p_{t-1}<F(s_1)<p$, therefor $0<p_{t-1}(1-g'(c))<F(s_1)(1-g'(c))<1$ is always true. So $\mu<\hat{\mu }_t<\hat{\mu }_{t-1}$. And then the upgraded estimate sequence converges.

If $\mu >\hat{\mu }_{t-1}$, then $\hat{\mu }_t>\hat{\mu }_{t-1}$, and $p<F(s_1)<p_{t-1}$. When $p>0.05$, $0<p_{t-1}(1-g'(c))<1$. Then $\hat{\mu }_{t-1}<\hat{\mu }_1<\mu $. When $0.2<p<0.5$, so if $|p_{t-1}-p|<0.3$, we have $0<p_{t-1}(1-g'(c))<(F(s_1)+0.3)(1-g'(c))<c$. This implies $\hat{\mu }_{t-1}<\hat{\mu }_t<\mu +(\mu -\hat{\mu }_{t-1})$.

If $\hat{\mu }_t>\mu $, that is to say $\mu<\hat{\mu }_t<\mu +(\mu -\hat{\mu }_{t-1})$, then from the above paragraph the sequence of following upgraded estimate converges. If $\hat{\mu }_t<\mu $, that is to say $\hat{\mu }_{t-1}<\hat{\mu }_t<\mu $. Then we could also have the conclusion that the upgraded estimate sequence converges.

So from the above we can conclude that when $\hat{\sigma }_{0}^2=\sigma ^2$, the upgraded estimate sequence converges. The results hold true for left truncated and both sides truncated normal distributions.

II: Since the variance of normal distribution always exists, by the law of large numbers:

$$\begin{aligned} \frac{\hat{I\sigma }_x^2}{p_{t-1}}\overset{p}{\rightarrow } \frac{1}{p}I\sigma _x^2=\frac{1}{p}\int _{-\infty }^{b}(x-\mu )^2 f(x|\mu ,\sigma ^2)dx \end{aligned}$$

And

$$\begin{aligned} I\sigma _x^2= & {} \int _{-\infty }^{b}(x-\mu )^2 f(x|\mu ,\sigma ^2)dx\\= & {} \sigma ^2\int _{-\infty }^{b}\frac{x-\mu }{\sigma } \frac{1}{\sqrt{2\pi }}\exp \{ -\frac{(x-\mu )^2}{2\sigma ^2}\}d\frac{(x-\mu )^2}{2\sigma ^2}\\= & {} \sigma ^2[F(t)-tf(t)], \end{aligned}$$

$$ \frac{I\sigma _x^2}{p}=\sigma ^2\left( 1-\frac{sf(s)}{F(s)} \right) , $$

where $s=\frac{b-\mu }{\sigma }$, and f(s), F(s) are the density function and cumulative function of the standard normal distribution.

Similarly:

$$\begin{aligned} \frac{\hat{I\sigma }_{t-1,x}^2}{p_{t-1}}=\hat{\sigma _{t-1}}^2\left( 1-\frac{\hat{t}f(\hat{t})}{F(\hat{t})} \right) , \end{aligned}$$

where $\hat{s}=\frac{b-\mu }{\hat{\sigma }}$, and $f(\dot{)}$, $F(\dot{)}$ are the density function and cumulative function of the standard normal distribution.

Assume $\hat{\mu }_{0}=\mu $, Let $g(\sigma ^2)=\sigma ^2\left( 1-\frac{sf(s)}{F(s)} \right) ,$ then

$$ g'(\sigma ^2)=1-\frac{sf(s)}{F(s)} -\frac{1/2 s^3 f(s)+1/2 t f(s)}{F(s)}-\frac{1/2 s^2 f^2(s)}{F^2(s)}. $$

By the Lagrange’s mean value theorem we have

$$ g(\sigma ^2)-g(\hat{\sigma }_{t-1}^2)=(\sigma ^2-\hat{\sigma }_{t-1}^2)g'(d^2), $$

where $d^2$ is between $\sigma ^2$ and $\hat{\sigma }_{t-1}^2$.

And we also have this equation

$$ \hat{\sigma }_{t-1}^2=\hat{I\sigma }^2_{_{t-1,x}}+\hat{I\sigma }^2_{_{t-1,missing}}, $$

So

$$\begin{aligned} \hat{\sigma }_t^2-\hat{\sigma }_{t-1}^2= & {} (\hat{I\sigma }^2_x+\hat{I\sigma }^2_{_{t-1,missing}})-(\hat{I\sigma }^2_{_{t-1,x}}+\hat{I\sigma }^2_{_{t-1,missing}})\\= & {} p_{t-1}\left( \frac{I\sigma ^2_x}{p}-\frac{\hat{I\sigma }^2_{_{t-1,x}}}{p_{t-1}}\right) \\= & {} p_{t-1}g'(d)(\sigma ^2-\hat{\sigma }_{t-1}^2)\\ \end{aligned}$$

Denote $s_2=\frac{b-\mu }{d}$. Then from R software $g'(d)>0$ and $0<F(s_2)g'(d)<1$ are always true. And when $0<F(s_2)<0.5$, $0<(F(s_2)+0.5)g'(d)<1$.

If $\hat{\sigma }_{t-1}^2<\sigma ^2$, then $p_{t-1}<F(s_2)<p$, then $0<p_{t-1}g'(d)<F(s_2)g'(d)<1$ is always true. So $\hat{\sigma }_{t-1}^2<\hat{\sigma }_t^2<\sigma ^2$ is always true. And then the upgrading process of estimators converges.

If $\hat{\sigma }_{t-1}^2>\sigma ^2$, then $\hat{\sigma }_1^2<\hat{\sigma }_{t-1}^2 $, $p<F(s_2)<p_{t-1}$. As $|p_{t-1}-p|<0.5$, when $F(s_2)>0.5$, $0<p_{t-1}g'(d)<1$, when $F(s_2)<0.5$, $0<p_{t-1}g'(d)<(F(s_2)+0.5)g'(d)<1$. That is to say $\sigma ^2<\hat{\sigma }_1^2<\hat{\sigma }_{t-1}^2$. Then we could also have the conclusion that the upgrading process of estimator converges.

So from the above we can conclude that when $\hat{\mu }_{t-1}=\mu $, the upgrading process of estimators converges to $\sigma ^2$.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hu, Q., Wei, Z., Li, B., Wang, T. (2017). New Estimation Method for Mixture of Normal Distributions. In: Kreinovich, V., Sriboonchitta, S., Huynh, VN. (eds) Robustness in Econometrics. Studies in Computational Intelligence, vol 692. Springer, Cham. https://doi.org/10.1007/978-3-319-50742-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-50742-2_13
Published: 13 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50741-5
Online ISBN: 978-3-319-50742-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

New Estimation Method for Mixture of Normal Distributions

Abstract

Access this chapter

Similar content being viewed by others

Skew Multiple Scaled Mixtures of Normal Distributions with Flexible Tail Behavior and Their Application to Clustering

Robust mixture regression modeling based on scale mixtures of skew-normal distributions

Robust mixture regression modeling based on two-piece scale mixtures of normal distributions

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

New Estimation Method for Mixture of Normal Distributions

Abstract

Access this chapter

Similar content being viewed by others

Skew Multiple Scaled Mixtures of Normal Distributions with Flexible Tail Behavior and Their Application to Clustering

Robust mixture regression modeling based on scale mixtures of skew-normal distributions

Robust mixture regression modeling based on two-piece scale mixtures of normal distributions

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation