On the Use of the Sub-Gaussian $$\alpha $$ -Stable Distribution in the Cluster-Weighted Model

Zarei, Shaho; Mohammadpour, Adel; Ingrassia, Salvatore; Punzo, Antonio

doi:10.1007/s40995-018-0526-8

On the Use of the Sub-Gaussian $\alpha $-Stable Distribution in the Cluster-Weighted Model

Research Paper
Published: 26 February 2018

Volume 43, pages 1059–1069, (2019)
Cite this article

Iranian Journal of Science and Technology, Transactions A: Science Aims and scope Submit manuscript

141 Accesses
7 Citations
Explore all metrics

Abstract

The Gaussian cluster-weighted model (CWM) is a mixture of regression models with random covariates that allows for flexible clustering of a random vector composed of a response variable and some covariates. In each mixture component, a Gaussian distribution is adopted for both the covariates and the response given the covariates. To make the approach robust with respect to the presence of atypical observations, we propose to replace the Gaussian distribution with the sub-Gaussian $\alpha $-stable (SG$\alpha $S) distribution, an elliptical generalization of the Gaussian distribution having one additional parameter, $\alpha $, governing the tails’ weight. The resulting SG$\alpha $S CWM is able to accommodate outliers and leverage points, concepts of primary importance in the robust regression analysis. Advantageously with respect to the t-distribution, the tails of the SG$\alpha $S distribution can be heavier, thus allowing robustness also with respect to gross atypical observations. A new algorithm, based on a combination of stochastic and conditional expectation maximizations, is used to obtain maximum likelihood estimates of the model parameters. Simulated and real data are used to illustrate and compare the proposal with CWMs based on Gaussian and t distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model

Article 20 June 2017

The Generalized Linear Mixed Cluster-Weighted Model

Article 18 April 2015

Seemingly unrelated clusterwise linear regression for contaminated data

Article Open access 06 August 2022

Notes

Available at http://www.robustanalysis.com.

References

Aitkin M, Wilson GT (1980) Mixture models, outliers, and the EM algorithm. Technometrics 22(3):325–331
Article MATH Google Scholar
Bagnato L, Punzo A (2013) Finite mixtures of unimodal beta and gamma densities and the $k$-bumps algorithm. Comput Stat 28(4):1571–1597
Article MathSciNet MATH Google Scholar
Bagnato L, Punzo A, Zoia MG (2017) The multivariate leptokurtic-normal distribution and its application in model-based clustering. Can J Stat 45(1):95–119
Article MathSciNet Google Scholar
Berta P, Ingrassia S, Punzo A, Vittadini G (2016) Multilevel cluster-weighted models for the evaluation of hospitals. Metron 74(3):275–292
Article MathSciNet MATH Google Scholar
Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41(3):561–575
Article MathSciNet MATH Google Scholar
Celeux G, Diebolt J (1985) The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Comput Stat 2(1):73–82
Google Scholar
Dang UJ, Punzo A, McNicholas PD, Ingrassia S, Browne RP (2017) Multivariate response and parsimony for Gaussian cluster-weighted models. J Classif 34(1):4–34
Article MathSciNet MATH Google Scholar
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodological) 39(1):1–38
MathSciNet MATH Google Scholar
DeSarbo WS, Cron WL (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5(2):249–282
Article MathSciNet MATH Google Scholar
Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions. J Cybern 4(1):95–104
Article MathSciNet MATH Google Scholar
Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York
MATH Google Scholar
Gershenfeld N (1997) Nonlinear inference and cluster-weighted modeling. Ann N Y Acad Sci 808(1):18–24
Article Google Scholar
Gómez E, Gómez-Viilegas MA, Marin JM (1998) A multivariate generalization of the power exponential family of distributions. Commun Stat Theory Methods 27(3):589–600
Article MathSciNet MATH Google Scholar
Harrison D, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5(1):81–102
Article MATH Google Scholar
Hennig C (2000) Identifiablity of models for clusterwise linear regression. J Classif 17(2):273–296
Article MathSciNet MATH Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Article MATH Google Scholar
Ingrassia S, Minotti SC, Punzo A (2014) Model-based clustering via linear cluster-weighted models. Comput Stat Data Anal 71:159–182
Article MathSciNet MATH Google Scholar
Ingrassia S, Minotti SC, Vittadini G (2012) Local statistical modeling via the cluster-weighted approach with elliptical distributions. J Classif 29(3):363–401
Article MathSciNet MATH Google Scholar
Ingrassia S, Punzo A (2016) Decision boundaries for mixtures of regressions. J Korean Stat Soc 45(2):295–306
Article MathSciNet MATH Google Scholar
Ingrassia S, Punzo A, Vittadini G, Minotti SC (2015) The generalized linear mixed cluster-weighted model. J Classif 32(1):85–113
Article MathSciNet MATH Google Scholar
Kring S, Rachev ST, Höchstötter M, Fabozzi FJ (2009) Estimation of $\alpha $-stable sub-Gaussian distributions for asset returns. In: Risk assessment: decisions in banking and finance. Springer/Physika, Heidelberg, pp 111–152
Lange KL, Little RJA, Taylor JMG (1989) Robust statistical modeling using the $t$-distribution. J Am Stat Assoc 84(408):881–896
MathSciNet Google Scholar
Maruotti A, Punzo A (2017) Model-based time-varying clustering of multivariate longitudinal data with covariates and outliers. Comput Stat Data Anal 113:475–496
Article MathSciNet MATH Google Scholar
Mazza A, Punzo A (2018) Mixtures of multivariate contaminated normal regression models. Stat Pap. https://doi.org/10.1007/s00362-017-0964-y
Mazza A, Punzo A, Ingrassia S (2018). flexCWM: a flexible framework for cluster-weighted models. J Stat Softw 1–27
Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2):267–278
Article MathSciNet MATH Google Scholar
Nolan JP (1998) Parameterizations and modes of stable distributions. Stat Probab Lett 38(2):187–195
Article MathSciNet MATH Google Scholar
Nolan JP (2013) Multivariate elliptically contoured stable distributions: theory and estimation. Comput Stat 28(5):2067–2089
Article MathSciNet MATH Google Scholar
Nolan JP (2016) Stable distributions: models for heavy-tailed data. Birkhauser, Boston (Unfinished manuscript, Chapter 1 online at academic2.american.edujpnolan)
Nolan JP, Ojeda-Revah D (2013) Linear and nonlinear regression with stable errors. J Econom 172(2):186–194
Article MathSciNet MATH Google Scholar
Punzo A (2014) Flexible mixture modeling with the polynomial Gaussian cluster-weighted model. Stat Model 14(3):257–291
Article MathSciNet Google Scholar
Punzo A, Bagnato L, Maruotti A (2018) Compound unimodal distributions for insurance losses. Insur Math Econ. https://doi.org/10.1016/j.insmatheco.2017.10.007
Punzo A, Browne RP, McNicholas PD (2016) Hypothesis testing for mixture model selection. J Stat Comput Simul 86(14):2797–2818
Article MathSciNet Google Scholar
Punzo A, Ingrassia S (2013) On the use of the generalized linear exponential cluster-weighted model to asses local linear independence in bivariate data. QdS J Methodol Appl Stat 15:131–144
Google Scholar
Punzo A, Ingrassia S (2015) Parsimonious generalized linear Gaussian cluster-weighted models. In: Morlini I, Minerva T, Vichi M (eds) Advances in statistical models for data analysis, studies in classification, data analysis and knowledge organization. Springer International Publishing, Switzerland, pp 201–209
Punzo A, Ingrassia S (2016) Clustering bivariate mixed-type data via the cluster-weighted model. Comput Stat 31(3):989–1013
Article MathSciNet MATH Google Scholar
Punzo A, Maruotti A (2016) Clustering multivariate longitudinal observations: the contaminated Gaussian hidden Markov model. J Comput Graph Stat 25(4):1097–1116
Article MathSciNet Google Scholar
Punzo A, Mazza A, McNicholas PD (2018) ContaminatedMixt: an $\textsf{R}$ package for fitting parsimonious mixtures of multivariate contaminated normal distributions. J Stat Softw 1–25
Punzo A, McNicholas PD (2016) Parsimonious mixtures of multivariate contaminated normal distributions. Biom J 58(6):1506–1537
Article MathSciNet MATH Google Scholar
Punzo A, McNicholas PD (2017) Robust clustering in regression analysis via the contaminated Gaussian cluster-weighted model. J Classif 34(2):249–293
Article MathSciNet MATH Google Scholar
Ritter G (2015) Robust cluster analysis and variable selection, Chapman & Hall/CRC Monographs on Statistics & Applied Probability, vol 137. CRC Press, Boca Raton
Roche A (2011) EM algorithm and variants: an informal tutorial. arXiv:1105.1476
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article MATH Google Scholar
Samorodnitsky G, Taqqu MS (1994) Stable non-Gaussian random processes. Chapman and Hall, New York
MATH Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Article MathSciNet MATH Google Scholar
Subedi S, Punzo A, Ingrassia S, McNicholas PD (2013) Clustering and classification via cluster-weighted factor analyzers. Adv Data Anal Classif 7(1):5–40
Article MathSciNet MATH Google Scholar
Subedi S, Punzo A, Ingrassia S, McNicholas PD (2015) Cluster-weighted $t$-factor analyzers for robust model-based clustering and dimension reduction. Stat Methods Appl 24(4):623–649
Article MathSciNet MATH Google Scholar
Teimouri M, Rezakhah S, Mohammdpour A (2017) Robust mixture modelling using sub-Gaussian stable distribution. arXiv:1701.06749
Teimouri M, Rezakhah S, Mohammdpour A (2018) EM algorithm for symmetric stable mixture model. Commun Stat Simul Comput 47(2):582-604. https://doi.org/10.1080/03610918.2017.1288244
Article MathSciNet MATH Google Scholar
Tukey JW (1960) A survey of sampling from contaminated distributions. In: Olkin I (ed) Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling, Stanford Studies in Mathematics and Statistics, chapter 39. Stanford University Press, California, pp 448–485
Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous referees for their helpful comments and for careful reading that greatly improved the article.

Author information

Authors and Affiliations

Department of Statistics, Faculty of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
Shaho Zarei & Adel Mohammadpour
Department of Economics and Business, University of Catania, Catania, Italy
Salvatore Ingrassia & Antonio Punzo

Authors

Shaho Zarei
View author publications
You can also search for this author in PubMed Google Scholar
Adel Mohammadpour
View author publications
You can also search for this author in PubMed Google Scholar
Salvatore Ingrassia
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Punzo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adel Mohammadpour.

Appendix

Suppose $\varvec{W} \sim S_{d}(\alpha ,\varvec{\Sigma },\varvec{\mu })$. Therefore, $\varvec{W}\mathop {=}\limits ^d\varvec{\mu }+\sqrt{P}\varvec{Z},$ where $\varvec{Z}$ is a d-variate zero-mean Gaussian random vector with variance–covariance matrix $\varvec{\Sigma }$ and $P\sim S(\frac{\alpha }{2},1,(\cos (\frac{\pi \alpha }{4}))^{\frac{2}{\alpha }},0)$ is a positive stable random variable. To compute $E_{1}=E\Big (P^{-1}|\varvec{w},\alpha ,\varvec{\Sigma },\varvec{\mu }\Big )$, we should compute

$$\begin{aligned} f(p|\varvec{w})=\frac{f(\varvec{w},p)}{f(\varvec{w})}=\frac{f(p)f(\varvec{w}|p)}{\int _{0}^{\infty }f(p)f(\varvec{w}|p)\mathrm{d}p}. \end{aligned}$$

Since $\varvec{W}|P=p\sim N(\varvec{\mu },p\varvec{\Sigma })$, we have

$$\begin{aligned} E_{1}=\frac{\displaystyle \int _{0}^{\infty }p^{-d/2-1}f_{P}(p|\alpha )\exp \left\{ \frac{-(\varvec{w}-\varvec{\mu })^{'}\varvec{\Sigma }^{-1}(\varvec{w}-\varvec{\mu })}{2p}\right\} {\mathrm{d}p}}{\displaystyle \int _{0}^{\infty }p^{-d/2}f_{P}(p|\alpha )\exp \left\{ \frac{-(\varvec{w}-\varvec{\mu })^{'}\varvec{\Sigma }^{-1}(\varvec{w}-\varvec{\mu })}{2p}\right\} {\mathrm{d}p}}, \end{aligned}$$

where $f_{P}(.)$ is the density function of P. For approximating $ {E}_{1}$, we use a Monte Carlo method by generating M samples from P and calculating elements of the under integral. If $p_{1},\ldots ,p_{M}$ is a random sample from P, then the approximate value of ${E}_{1}$ is

$$\begin{aligned} \frac{\sum _{i=1}^{M}p_{i}^{-d/2-1}f_{P}(p_{i}|\alpha )\exp \left\{ \frac{-(\varvec{w}-\varvec{\mu })^{'}\varvec{\Sigma }^{-1}(\varvec{w}-\varvec{\mu })}{2p_{i}}\right\} }{\sum _{i=1}^{M} {p_{i}}^{-d/2}f_{P}(p_{i}|\alpha )\exp \left\{ \frac{-(\varvec{w}-\varvec{\mu })^{'}\varvec{\Sigma }^{-1}(\varvec{w}-\varvec{\mu })}{2p_{i}}\right\} }. \end{aligned}$$

We take $M=2000$ and update both $e_{2ig}^{(t)}$ and $e_{3ig}^{(t)}$, $i=1,\ldots ,N$ and $g=1,\ldots ,G$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zarei, S., Mohammadpour, A., Ingrassia, S. et al. On the Use of the Sub-Gaussian $\alpha $-Stable Distribution in the Cluster-Weighted Model. Iran J Sci Technol Trans Sci 43, 1059–1069 (2019). https://doi.org/10.1007/s40995-018-0526-8

Download citation

Received: 04 October 2017
Accepted: 05 February 2018
Published: 26 February 2018
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s40995-018-0526-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Use of the Sub-Gaussian \(\alpha \)-Stable Distribution in the Cluster-Weighted Model

Abstract

Access this article

Similar content being viewed by others

Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model

The Generalized Linear Mixed Cluster-Weighted Model

Seemingly unrelated clusterwise linear regression for contaminated data

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

On the Use of the Sub-Gaussian \(\alpha \)-Stable Distribution in the Cluster-Weighted Model

Abstract

Access this article

Similar content being viewed by others

Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model

The Generalized Linear Mixed Cluster-Weighted Model

Seemingly unrelated clusterwise linear regression for contaminated data

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation