Skip to main content
Log in

Robust mixture regression modeling based on scale mixtures of skew-normal distributions

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

The traditional estimation of mixture regression models is based on the assumption of normality (symmetry) of component errors and thus is sensitive to outliers, heavy-tailed errors and/or asymmetric errors. In this work we present a proposal to deal with these issues simultaneously in the context of the mixture regression by extending the classic normal model by assuming that the random errors follow a scale mixtures of skew-normal distributions. This approach allows us to model data with great flexibility, accommodating skewness and heavy tails. The main virtue of considering the mixture regression models under the class of scale mixtures of skew-normal distributions is that they have a nice hierarchical representation which allows easy implementation of inference. We develop a simple EM-type algorithm to perform maximum likelihood inference of the parameters of the proposed model. In order to examine the robust aspect of this flexible model against outlying observations, some simulation studies are also presented. Finally, a real data set is analyzed, illustrating the usefulness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc Ser B 36:99–102

    MathSciNet  MATH  Google Scholar 

  • Arellano-Valle RB, Castro LM, Genton MG, Gómez HW (2008) Bayesian inference for shape mixtures of skewed distributions, with application to regression analysis. Bayesian Anal 3(3):513–539

    MathSciNet  MATH  Google Scholar 

  • Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178

    MathSciNet  MATH  Google Scholar 

  • Azzalini A, Capitanio A (2003) Distributions generated and perturbation of symmetry with emphasis on the multivariate skew-t distribution. J R Stat Soc Ser B 61:367–389

    Article  MathSciNet  MATH  Google Scholar 

  • Bai X, Yao W, Boyer JE (2012) Robust fitting of mixture regression models. Comput Stat Data Anal 56:2347–2359

    Article  MathSciNet  MATH  Google Scholar 

  • Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54:2926–2941

    Article  MathSciNet  MATH  Google Scholar 

  • Böhning D (2000) Computer-assisted analysis of mixtures and applications. Meta-analysis, disease mapping and others. Chapman&Hall/CRC, Boca Raton

  • Böhning D, Seidel W, Alfó M, Garel B, Patilea V, Walther G (2007) Editorial: Advances in mixture models. Comput Stat Data Anal 51:5205–5210

    Article  MATH  Google Scholar 

  • Böhning D, Hennig C, McLachlan GJ, McNicholas PD (2014) Editorial: The 2nd special issue on advances in mixture models. Comput Stat Data Anal 71:1–2

    Article  Google Scholar 

  • Branco MD, Dey DK (2001) A general class of multivariate skew-elliptical distributions. J Multivar Anal 79:99–113

    Article  MathSciNet  MATH  Google Scholar 

  • Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux G, Chauveau D, Diebolt J (1996) Stochastic versions of the EM algorithm: an experimental study in the mixture case. J Stat Comput Simul 55:287–314

    Article  MATH  Google Scholar 

  • Celeux G, Hurn M, Robert CP (2000) Computational and inferential difficulties with mixture posterior distributions. J Am Stat Assoc 95:957–970

    Article  MathSciNet  MATH  Google Scholar 

  • Chen J, Tan X, Zhang R (2008) Inference for normal mixture in mean and variance. Stat Sin 18:443–465

    MathSciNet  MATH  Google Scholar 

  • Cohen E (1984) Some effects of inharmonic partials on interval perception. Music Percept 1:323–349

    Article  Google Scholar 

  • Cosslett SR, Lee LF (1985) Serial correlation in latent discrete variable models. J Econ 27(1):79–97

    Article  Google Scholar 

  • Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc Ser B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Depraetere N, Vandebroek M (2014) Order selection in finite mixtures of linear regressions. Stat Pap 55:871–911

    Article  MathSciNet  MATH  Google Scholar 

  • DeSarbo WS, Cron WL (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5:248–282

    Article  MathSciNet  MATH  Google Scholar 

  • DeSarbo WS, Wedel M, Vriens M, Ramaswamy V (1992) Latent class metric conjoint analysis. Market Lett 3(3):273–288

    Article  Google Scholar 

  • DeVeaux RD (1989) Mixtures of linear regressions. Comput Stat Data Anal 8(3):227–245

    Article  MathSciNet  Google Scholar 

  • Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631

    Article  MathSciNet  MATH  Google Scholar 

  • Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York

    MATH  Google Scholar 

  • Galimberti G, Soffritti G (2014) A multivariate linear regression analysis using finite mixtures of t distributions. Comput Stat Data Anal 71:138–150

    Article  MathSciNet  Google Scholar 

  • Hamilton JD (1989) A new approach to the economic analysis of nonstationary time series and the business cycle. Econ J Econ Soc 57:357–384

    MathSciNet  MATH  Google Scholar 

  • Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13:795–800

    Article  MathSciNet  MATH  Google Scholar 

  • Hathaway RJ (1986) A constrained EM algorithm for univariate mixtures. J Stat Comput Simul 23:211–230

    Article  Google Scholar 

  • Hunter DR, Young DS (2012) Semiparametric mixtures of regressions. J Nonparametr Stat 24(1):19–38

    Article  MathSciNet  MATH  Google Scholar 

  • Lachos VH, Ghosh P, Arellano-Valle RB (2010) Likelihood based inference for skew-normal independent linear mixed models. Stat Sin 20:303–322

    MathSciNet  MATH  Google Scholar 

  • Lee G, Scott C (2012) EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Comput Stat Data Anal 56:2816–2829

    Article  MathSciNet  MATH  Google Scholar 

  • Lee S, McLachlan GJ (2014) Finite mixtures of multivariate skew t-distributions: some recent and new results. Stat Comput 24:181–202

    Article  MathSciNet  MATH  Google Scholar 

  • Lin TC, Lin TI (2010) Supervised learning of multivariate skew normal mixture models with missing information. Comput Stat 25:183–201

    Article  MathSciNet  MATH  Google Scholar 

  • Lin TI, Lee JC, Hsieh WJ (2007) Robust mixture modeling using the skew t distribution. Stat Comput 17:81–92

    Article  MathSciNet  Google Scholar 

  • Lindsay BG (1995) Mixture models: theory geometry and applications, vol 51. In: NSF-CBMS regional conference series in probability and statistics, Institute of Mathematical Statistics, Hayward

  • Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 80:267–278

    MathSciNet  MATH  Google Scholar 

  • Liu M, Lin TI (2014) A skew-normal mixture regression model. Educ Psychol Meas 74:139–162

    Article  Google Scholar 

  • Liu M, Hancock GR, Harring JR (2011) Using finite mixture modeling to deal with systematic measurement error: a case study. J Mod Appl Stat Methods 10(1):249–261

    Google Scholar 

  • Lo K, Gottardo R (2012) Flexible mixture modeling via the multivariate t distribution with the Box–Cox transformation: an alternative to the skew-t distribution. Stat Comput 22:33–52

    Article  MathSciNet  MATH  Google Scholar 

  • McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions. Wiley, New Jersey

    Book  MATH  Google Scholar 

  • McLachlan GJ, Peel D (1998) Robust cluster analysis via mixtures of multivariate t-distributions. In: Amin A, Dori D, Pudil P, Freeman H (eds) Lecture notes in computer science, vol 1451, pp 658–666

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    Book  MATH  Google Scholar 

  • Meng X, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 81:633–648

    MathSciNet  MATH  Google Scholar 

  • Mengersen K, Robert CP, Titterington DM (2011) Mixtures: estimation and applications. Wiley, New York

  • Peel D, McLachlan GJ (2000) Robust mixture modelling using the t distribution. Stat Comput 10:339–348

    Article  Google Scholar 

  • Quandt RE (1972) A new approach to estimating switching regressions. J Am Stat Assoc 67:306–310

    Article  MATH  Google Scholar 

  • Quandt RE, Ramsey JB (1978) Estimating mixtures of normal distributions and switching regressions. J Am Stat Assoc 73(364):730–738

    Article  MathSciNet  MATH  Google Scholar 

  • Santana L, Vilca F, Leiva V (2011) Influence analysis in skew-Birnbaum–Saunders regression models and applications. J Appl Stat 38:1633–1649

    Article  MathSciNet  MATH  Google Scholar 

  • Song W, Yao W, Xing Y (2014) Robust mixture regression model fitting by Laplace distribution. Comput Stat Data Anal 71:128–137

    Article  MathSciNet  Google Scholar 

  • Späth H (1979) Algorithm 39 clusterwise linear regression. Computing 22(4):367–373

    Article  MathSciNet  MATH  Google Scholar 

  • Sperrin M, Jaki T, Wit E (2010) Probabilistic relabeling strategies for the label switching problem in Bayesian mixture models. Stat Comput 20:357–366

    Article  MathSciNet  Google Scholar 

  • Stephens M (2002) Dealing with label switching in mixture models. J R Stat Soc Ser B 62:795–809

    Article  MathSciNet  MATH  Google Scholar 

  • Turner TR (2000) Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions. J R Stat Soc Ser C (Appl Stat) 49(3):371–384

    Article  MathSciNet  MATH  Google Scholar 

  • Verbeke G, Lesaffre E (1996) A linear mixed-effects model with heterogeneity in the random-effects population. J Am Stat Assoc 91:217–221

    Article  MATH  Google Scholar 

  • Viele K, Tong B (2002) Modeling with mixtures of linear regressions. Stat Comput 12(4):315–330

    Article  MathSciNet  Google Scholar 

  • Vilca F, Santana L, Leiva V, Balakrishnan N (2011) Estimation of extreme percentiles in Birnbaum–Saunders distributions. Comput Stat Data Anal 55:1665–1678

    Article  MathSciNet  MATH  Google Scholar 

  • Vilca F, Balakrishnan N, Zeller CB (2014) Multivariate skew-normal generalized hyperbolic distribution and its properties. J Multivar Anal 128:73–85

    Article  MathSciNet  MATH  Google Scholar 

  • Wang HX, Zhang QB, Luo B, Wei S (2004) Robust mixture modelling using multivariate t-distribution with missing information. Pattern Recognit Lett 25:701–710

    Article  Google Scholar 

  • Wang J, Genton MG (2006) The multivariate skew-slash distribution. J Stat Plan Inference 136:209–220

    Article  MathSciNet  MATH  Google Scholar 

  • Wei GCG, Tanner MA (1990) A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J Am Stat Assoc 85:699–704

    Article  Google Scholar 

  • Wei Y (2012) Robust mixture regression models using t-distribution. In: Master report, Department of Statistics, Kansas State University

  • Yao W, Lindsay BG (2009) Bayesian mixture labeling by highest posterior density. J Am Stat Assoc 104:758–767

    Article  MathSciNet  MATH  Google Scholar 

  • Yao W (2010) A profile likelihood method for normal mixture with unequal variance. J Stat Plan Inference 140:2089–2098

    Article  MathSciNet  MATH  Google Scholar 

  • Yao W (2012) Model based labeling for mixture models. Stat Comput 22:337–347

    Article  MathSciNet  MATH  Google Scholar 

  • Yao W, Wei Y, Yu C (2014) Robust mixture regression using the t-distribution. Comput Stat Data Anal 71:116–127

    Article  MathSciNet  Google Scholar 

  • Yao W (2015) Label switching and its solutions for frequentist mixture models. J Stat Comput Simul 85:1000–1012

    Article  MathSciNet  Google Scholar 

  • Zeller CB, Lachos VH, Vilca-Labra FE (2011) Local influence analysis for regression models with scale mixtures of skew-normal distributions. J Appl Stat 38:348–363

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

We would like to thank the Associate Editor and two referees for their helpful comments and suggestions, leading to improvement of the paper. Víctor H. Lachos was supported by CNPq-Brazil (BPPesq) and São Paulo State Research Foundation (FAPESP-2014/02938-9). Celso Rômulo Barbosa Cabral was supported by CNPq (via BPPesq, Universal Project and Grant 167731/2013-0), and FAPEAM (via Universal Amazonas Project). Camila Borelli Zeller was supported by CNPq (BPPesq) and Minas Gerais State Research Foundation (FAPEMIG, universal project).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Víctor H. Lachos.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 52 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeller, C.B., Cabral, C.R.B. & Lachos, V.H. Robust mixture regression modeling based on scale mixtures of skew-normal distributions. TEST 25, 375–396 (2016). https://doi.org/10.1007/s11749-015-0460-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-015-0460-4

Keywords

Mathematics Subject Classification

Navigation