Abstract
The traditional estimation of mixture regression models is based on the assumption of normality (symmetry) of component errors and thus is sensitive to outliers, heavy-tailed errors and/or asymmetric errors. In this work we present a proposal to deal with these issues simultaneously in the context of the mixture regression by extending the classic normal model by assuming that the random errors follow a scale mixtures of skew-normal distributions. This approach allows us to model data with great flexibility, accommodating skewness and heavy tails. The main virtue of considering the mixture regression models under the class of scale mixtures of skew-normal distributions is that they have a nice hierarchical representation which allows easy implementation of inference. We develop a simple EM-type algorithm to perform maximum likelihood inference of the parameters of the proposed model. In order to examine the robust aspect of this flexible model against outlying observations, some simulation studies are also presented. Finally, a real data set is analyzed, illustrating the usefulness of the proposed method.
Similar content being viewed by others
References
Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc Ser B 36:99–102
Arellano-Valle RB, Castro LM, Genton MG, Gómez HW (2008) Bayesian inference for shape mixtures of skewed distributions, with application to regression analysis. Bayesian Anal 3(3):513–539
Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178
Azzalini A, Capitanio A (2003) Distributions generated and perturbation of symmetry with emphasis on the multivariate skew-t distribution. J R Stat Soc Ser B 61:367–389
Bai X, Yao W, Boyer JE (2012) Robust fitting of mixture regression models. Comput Stat Data Anal 56:2347–2359
Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54:2926–2941
Böhning D (2000) Computer-assisted analysis of mixtures and applications. Meta-analysis, disease mapping and others. Chapman&Hall/CRC, Boca Raton
Böhning D, Seidel W, Alfó M, Garel B, Patilea V, Walther G (2007) Editorial: Advances in mixture models. Comput Stat Data Anal 51:5205–5210
Böhning D, Hennig C, McLachlan GJ, McNicholas PD (2014) Editorial: The 2nd special issue on advances in mixture models. Comput Stat Data Anal 71:1–2
Branco MD, Dey DK (2001) A general class of multivariate skew-elliptical distributions. J Multivar Anal 79:99–113
Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142
Celeux G, Chauveau D, Diebolt J (1996) Stochastic versions of the EM algorithm: an experimental study in the mixture case. J Stat Comput Simul 55:287–314
Celeux G, Hurn M, Robert CP (2000) Computational and inferential difficulties with mixture posterior distributions. J Am Stat Assoc 95:957–970
Chen J, Tan X, Zhang R (2008) Inference for normal mixture in mean and variance. Stat Sin 18:443–465
Cohen E (1984) Some effects of inharmonic partials on interval perception. Music Percept 1:323–349
Cosslett SR, Lee LF (1985) Serial correlation in latent discrete variable models. J Econ 27(1):79–97
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc Ser B 39:1–38
Depraetere N, Vandebroek M (2014) Order selection in finite mixtures of linear regressions. Stat Pap 55:871–911
DeSarbo WS, Cron WL (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5:248–282
DeSarbo WS, Wedel M, Vriens M, Ramaswamy V (1992) Latent class metric conjoint analysis. Market Lett 3(3):273–288
DeVeaux RD (1989) Mixtures of linear regressions. Comput Stat Data Anal 8(3):227–245
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631
Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York
Galimberti G, Soffritti G (2014) A multivariate linear regression analysis using finite mixtures of t distributions. Comput Stat Data Anal 71:138–150
Hamilton JD (1989) A new approach to the economic analysis of nonstationary time series and the business cycle. Econ J Econ Soc 57:357–384
Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13:795–800
Hathaway RJ (1986) A constrained EM algorithm for univariate mixtures. J Stat Comput Simul 23:211–230
Hunter DR, Young DS (2012) Semiparametric mixtures of regressions. J Nonparametr Stat 24(1):19–38
Lachos VH, Ghosh P, Arellano-Valle RB (2010) Likelihood based inference for skew-normal independent linear mixed models. Stat Sin 20:303–322
Lee G, Scott C (2012) EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Comput Stat Data Anal 56:2816–2829
Lee S, McLachlan GJ (2014) Finite mixtures of multivariate skew t-distributions: some recent and new results. Stat Comput 24:181–202
Lin TC, Lin TI (2010) Supervised learning of multivariate skew normal mixture models with missing information. Comput Stat 25:183–201
Lin TI, Lee JC, Hsieh WJ (2007) Robust mixture modeling using the skew t distribution. Stat Comput 17:81–92
Lindsay BG (1995) Mixture models: theory geometry and applications, vol 51. In: NSF-CBMS regional conference series in probability and statistics, Institute of Mathematical Statistics, Hayward
Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 80:267–278
Liu M, Lin TI (2014) A skew-normal mixture regression model. Educ Psychol Meas 74:139–162
Liu M, Hancock GR, Harring JR (2011) Using finite mixture modeling to deal with systematic measurement error: a case study. J Mod Appl Stat Methods 10(1):249–261
Lo K, Gottardo R (2012) Flexible mixture modeling via the multivariate t distribution with the Box–Cox transformation: an alternative to the skew-t distribution. Stat Comput 22:33–52
McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions. Wiley, New Jersey
McLachlan GJ, Peel D (1998) Robust cluster analysis via mixtures of multivariate t-distributions. In: Amin A, Dori D, Pudil P, Freeman H (eds) Lecture notes in computer science, vol 1451, pp 658–666
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Meng X, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 81:633–648
Mengersen K, Robert CP, Titterington DM (2011) Mixtures: estimation and applications. Wiley, New York
Peel D, McLachlan GJ (2000) Robust mixture modelling using the t distribution. Stat Comput 10:339–348
Quandt RE (1972) A new approach to estimating switching regressions. J Am Stat Assoc 67:306–310
Quandt RE, Ramsey JB (1978) Estimating mixtures of normal distributions and switching regressions. J Am Stat Assoc 73(364):730–738
Santana L, Vilca F, Leiva V (2011) Influence analysis in skew-Birnbaum–Saunders regression models and applications. J Appl Stat 38:1633–1649
Song W, Yao W, Xing Y (2014) Robust mixture regression model fitting by Laplace distribution. Comput Stat Data Anal 71:128–137
Späth H (1979) Algorithm 39 clusterwise linear regression. Computing 22(4):367–373
Sperrin M, Jaki T, Wit E (2010) Probabilistic relabeling strategies for the label switching problem in Bayesian mixture models. Stat Comput 20:357–366
Stephens M (2002) Dealing with label switching in mixture models. J R Stat Soc Ser B 62:795–809
Turner TR (2000) Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions. J R Stat Soc Ser C (Appl Stat) 49(3):371–384
Verbeke G, Lesaffre E (1996) A linear mixed-effects model with heterogeneity in the random-effects population. J Am Stat Assoc 91:217–221
Viele K, Tong B (2002) Modeling with mixtures of linear regressions. Stat Comput 12(4):315–330
Vilca F, Santana L, Leiva V, Balakrishnan N (2011) Estimation of extreme percentiles in Birnbaum–Saunders distributions. Comput Stat Data Anal 55:1665–1678
Vilca F, Balakrishnan N, Zeller CB (2014) Multivariate skew-normal generalized hyperbolic distribution and its properties. J Multivar Anal 128:73–85
Wang HX, Zhang QB, Luo B, Wei S (2004) Robust mixture modelling using multivariate t-distribution with missing information. Pattern Recognit Lett 25:701–710
Wang J, Genton MG (2006) The multivariate skew-slash distribution. J Stat Plan Inference 136:209–220
Wei GCG, Tanner MA (1990) A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J Am Stat Assoc 85:699–704
Wei Y (2012) Robust mixture regression models using t-distribution. In: Master report, Department of Statistics, Kansas State University
Yao W, Lindsay BG (2009) Bayesian mixture labeling by highest posterior density. J Am Stat Assoc 104:758–767
Yao W (2010) A profile likelihood method for normal mixture with unequal variance. J Stat Plan Inference 140:2089–2098
Yao W (2012) Model based labeling for mixture models. Stat Comput 22:337–347
Yao W, Wei Y, Yu C (2014) Robust mixture regression using the t-distribution. Comput Stat Data Anal 71:116–127
Yao W (2015) Label switching and its solutions for frequentist mixture models. J Stat Comput Simul 85:1000–1012
Zeller CB, Lachos VH, Vilca-Labra FE (2011) Local influence analysis for regression models with scale mixtures of skew-normal distributions. J Appl Stat 38:348–363
Acknowledgments
We would like to thank the Associate Editor and two referees for their helpful comments and suggestions, leading to improvement of the paper. Víctor H. Lachos was supported by CNPq-Brazil (BPPesq) and São Paulo State Research Foundation (FAPESP-2014/02938-9). Celso Rômulo Barbosa Cabral was supported by CNPq (via BPPesq, Universal Project and Grant 167731/2013-0), and FAPEAM (via Universal Amazonas Project). Camila Borelli Zeller was supported by CNPq (BPPesq) and Minas Gerais State Research Foundation (FAPEMIG, universal project).
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zeller, C.B., Cabral, C.R.B. & Lachos, V.H. Robust mixture regression modeling based on scale mixtures of skew-normal distributions. TEST 25, 375–396 (2016). https://doi.org/10.1007/s11749-015-0460-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-015-0460-4