Abstract
The traditional factor analysis rested on the assumption of multivariate normality has been extended by considering the restricted multivariate skew-t (rMST) distribution for the unobserved factors and errors jointly. However, the rMST distribution has limited use for characterising skewness that concentrates in a single direction. This paper is devoted to introducing a more flexible robust factor analysis model based on the broader canonical fundamental skew-t (CFUST) distribution, called the CFUSTFA model. The proposed new model can account for more complex features of skewness toward multiple directions. An efficient alternating expectation conditional maximization algorithm fabricated under several reduced complete-data spaces is developed to estimate parameters under the maximum likelihood (ML) perspective. To assess the variability of parameter estimates, we present an information-based approach to approximating the asymptotic covariance matrix of the ML estimators. The effectiveness and applicability of the proposed techniques are demonstrated through the analysis of simulated and real datasets.
Similar content being viewed by others
References
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: 2nd international symposium on information theory. Akademiai Kiado, Budapest, pp 267–281
Arellano-Valle R, Genton M (2005) On fundamental skew distributions. J Multivar Anal 96:93–116
Azzalini A, Capitaino A (2003) Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\)-distribution. J R Stat Soc Ser B 65:367–389
Azzalini A, Dalla Valle A (1996) The multivariate skew-normal distribution. Biometrika 83:715–726
Basford KE, Greenway DR, Mclachlan GJ, Peel D (1997) Standard errors of fitted means under normal mixture. Comput Stat 12:1–17
Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)?—Arguments against avoiding RMSE in the literature. Geosci Model Dev 7:1247–1250
DÁgostino RB, (1970) Transformation to normality of the null Distribution of g1. Biometrika 57:679–681
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J Roy Stat Soc Ser B 9:1–38
Dunn JE (1973) A note on a sufficiency condition for uniqueness of a restricted factor matrix. Psychometrika 38:141–143
Fox DG (1981) Judginga air quality model performance. Bull Am Meteorol Soc 62:599–609
Galarza CE, Lachos VH (2019) MomTrunc: moments of folded and doubly truncated multivariate distributions. R package version 4.51. http://CRAN.R-project.org/package=MomTrunc
Galarza CE, Lin TI, Wang WL, Lachos VH (2021) On moments of folded and truncated multivariate Student-\(t\) distributions based on recurrence relations. Metrika 84:825–850
Geweke JF, Zhou G (1996) Measuring the pricing error of the arbitrage pricing theory. Rev Financ Stud 9:557–587
Hashemi F, Naderi M, Jamalizadeh A, Lin TI (2020) A skew factor analysis model based on the normal mean-variance mixture of Birnbaum-Saunders distribution. J Appl Stat 47:3007–3029
Ho HJ, Lin TI, Chen HY, Wang WL (2012) Some results on the truncated multivariate \(t\) distribution. J Stat Plan Inference 142:25–40
Ho HJ, Pyne S, Lin TI (2012) Maximum likelihood inference for mixtures of skew student-t-normal distributions through practical EM-type algorithms. Stat Comput 22:287–299
Ho HJ, Lin TI, Wang WL (2015) R TTmoment package: sampling and calculating the first and second moments for the doubly truncated multivariate \(t\) distribution. R package version 1.0. http://cran.r-project.org/web/packages/TTmoment
Jarque CM, Bera AK (1980) Efficient test for normality, homoscedasticity and serial independence of residuals. Econ Lett 6:255–259
Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis, 6th edn. Pearson Prentice-Hall, Upper Saddle River
Jöreskog KG (1977) Factor analysis by least-squares and maximum likelihood methods. In: Enslein K, Ralston A, Wilf HS (eds) Mathematical methods for digital computers. Wiley, New York, pp 125–153
Lawley DN, Maxwell AE (1971) Factor analysis as a statistical method, 2nd edn. Butterworth, London
Lee SX, McLachlan G (2013) On mixtures of skew normal and skew \(t\)-distributions. Adv Data Anal Classif 7:241–266
Lee SX, McLachlan GJ (2016) Finite mixtures of canonical fundamental skew t-distributions: the unification of the restricted and unrestricted skew t-mixture models. Stat Comput 26:573–589
Lee SX, McLachlan GJ (2018) EMMIXcskew: an R package for the fitting of a mixture of canonical fundamental skew \(t\)-distributions. J Stat Softw https://doi.org/10.18637/jss.v083.i03.
Lee SX, McLachlan GJ (2021) On formulations of skew factor models: skew factors and/or skew errors. Stat Probab Lett 168:108935
Lee SX, Lin TI, McLachlan GJ (2021) Mixtures of factor analyzers with fundamental skew symmetric distributions. Adv Data Anal Classif 15:481–512
Lin TI (2010) Robust mixture modeling using multivariate skew \(t\) distributions. Stat Comp 20:343–356
Lin TI, Lin TC (2011) Robust statistical modelling using the multivariate skew \(t\) distribution with complete and incomplete data. Stat Model 11:253–277
Lin TI, Ho HJ, Chen CL (2009) Analysis of multivariate skew normal models with incomplete data. J Multivar Anal 100:2337–2351
Lin TI, Wu PH, MaLachlan GJ, Lee SX (2015) A robust factor analysis model using the restricted skew-\(t\) distribution. TEST 24:510–531
Lin TI, Wang WL, McLachlan GJ, Lee SX (2018) Robust mixtures of factor analysis models using the restricted multivariate skew-\(t\) distribution. Stat Model 18:50–72
Liu M, Lin TI (2015) Skew-normal factor analysis models with incomplete data. J Appl Stat 42:789–805
Liu CH, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81:633–648
Lopes HF, West M (2004) Bayesian model assessment in factor analysis. Stat Sin 4:41–67
Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc B 44:226–233
McDermott J, Forsyth R (2016) Diagnosing a disorder in a classification benchmark. Pattern Recognit Lett 73:41–43
McLachlan GJ, Bean RW, Jones LBT (2007) Extension of the mixture of factor analyzers model to incorporate the multivariate \(t\)-distribution. Comput Stat Data Anal 51:5327–5338
Meilijson I (1989) A fast improvement to the EM algorithm to its own terms. J R Stat Soc Ser B 51:127–138
Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278
Meng XL, van Dyk D (1997) The EM algorithm-an old folk song sung to a fast new tune. J R Stat Soc Ser B 59:511–567
Montanari A, Viroli C (2010) A skew-normal factor model for the analysis of student satisfaction towards university courses. J Appl Stat 37:473–487
Mooijaart A (1985) Factor analysis for non-normal variables. Psychometrika 50:323–342
Pourmousa R, Jamalizadeh A, Rezapour M (2015) Multivariate normal mean-variance mixture distribution based on Birnbaum-Saunders distribution. J Stat Comp Sim 85:2736–2749
Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, Baecher-Allan C, McLachlan GJ, Tamayo P, Hafler DA, De Jager PL, Mesirov JP (2009) Automated high-dimensional flow cytometric data analysis. Proc Natl Acad Sci USA 106:8519–8524
R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with application to Bayesian regression models. Can J Stat 31:129–150
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Spearman C (1904) General intelligence, objectively determined and measured. Am J Psychol 15:201–293
Wang WL, Lin TI (2013) An efficient ECM algorithm for maximum likelihood estimation in mixtures of \(t\)-factor analyzers. Comput Stat 28:751–769
Wang WL, Liu M, Lin TI (2017) Robust skew-\(t\) factor analysis models for handling missing data. Stat Methods Appl 26:649–672
Wang WL, Castro LM, Chang YT, Lin TI (2019) Mixtures of restricted skew-t factor analyzers with common factor loadings. Adv Data Anal Classif 13:445–480
Wang WL, Jamalizadeh A, Lin TI (2020) Finite mixtures of multivariate scale-shape mixtures of skew-normal distributions. Stat Pap 61:2643–2670
Willmott CJ, Ackleson SG, Davis RE, Feddema JJ, Klink KM, Legates DR, O’Donnell J, Rowe CM (1985) Statistics for the evaluation and comparison of models. J Geophys Res 90:8995–9005
Zhang J, Li J, Liu C (2014) Robust factor analysis using the multivariate \(t\)-distribution. Stat Sin 24:291–312
Acknowledgements
The authors are grateful to the Co-Editors, the Associate Editor and two anonymous referees for their valuable comments and constructive suggestions which had improved the content of this paper greatly. This project was partially supported by the Ministry of Science and Technology of Taiwan under Grant Nos. 109-2118-M-005-005-MY3 and 110-2118-M-006-006-MY3.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Lin, TI., Chen, IA. & Wang, WL. A robust factor analysis model based on the canonical fundamental skew-t distribution. Stat Papers 64, 367–393 (2023). https://doi.org/10.1007/s00362-022-01318-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-022-01318-8