Skip to main content
Log in

Missing Values and Directional Outlier Detection in Model-Based Clustering

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Model-based clustering tackles the task of uncovering heterogeneity in a data set to extract valuable insights. Given the common presence of outliers in practice, robust methods for model-based clustering have been proposed. However, the use of many methods in this area becomes severely limited in applications where partially observed records are common since their existing frameworks often assume complete data only. Here, a mixture of multiple scaled contaminated normal (MSCN) distributions is extended using the expectation-conditional maximization (ECM) algorithm to accommodate data sets with values missing at random. The newly proposed extension preserves the mixture’s capability in yielding robust parameter estimates and performing automatic outlier detection separately for each principal component. In this fitting framework, the MSCN marginal density is approximated using the inversion formula for the characteristic function. Extensive simulation studies involving incomplete data sets with outliers are conducted to evaluate parameter estimates and to compare clustering performance and outlier detection of our model to other mixtures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from the corresponding author upon request.

Code Availability

The code can be found on github at https://github.com/cristinatortora/MSCN_missing.

References

  • Aitken, A. (1926). A series formula for the roots of algebraic and transcendental equations. Proceedings of the Royal Society of Edinburgh, 45(1), 14–22.

    Article  MATH  Google Scholar 

  • Aitkin, M., & Wilson, G. T. (1980). Mixture models, outliers, and the EM algorithm. Technometrics, 22(3), 325–331.

    Article  MATH  Google Scholar 

  • Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. In: E. Parzen, K. Tanabe, & G. Kitagawa (Eds.), Selected Papers of Hirotugu Akaike (pp. 199–213). Springer New York, New York, NY

  • Akogul, S., & Erisoglu, M. (2016). A comparison of information criteria in clustering based on mixture of multivariate normal distributions. Mathematical and Computational Applications, 21(3), 34.

    Article  MathSciNet  Google Scholar 

  • Andrews, J. L., & McNicholas, P. D. (2012). Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions. Statistics and Computing, 22(5), 1021–1029.

    Article  MathSciNet  MATH  Google Scholar 

  • Bagnato, L., & Punzo, A. (2021). Unconstrained representation of orthogonal matrices with application to common principal components. Computational Statistics, 36(2), 1177–1195.

    Article  MathSciNet  MATH  Google Scholar 

  • Bagnato, L., Punzo, A., & Zoia, M. G. (2017). The multivariate leptokurtic-normal distribution and its application in model-based clustering. Canadian Journal of Statistics, 45(1), 95–119.

    Article  MathSciNet  MATH  Google Scholar 

  • Banfield, J. D., & Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803.

    Article  MathSciNet  MATH  Google Scholar 

  • Berntsen, J., Espelid, T. O., & Genz, A. (1991). An adaptive algorithm for the approximate calculation of multiple integrals. ACM Transactions on Mathematical Software, 17(4), 437–451.

    Article  MathSciNet  MATH  Google Scholar 

  • Biernacki, C., & Govaert, G. (1997). Using the classification likelihood to choose the number of clusters. Computing Science and Statistics, (pp. 451–457)

  • Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7), 719–725.

    Article  Google Scholar 

  • Bozdogan, H. (1993). Choosing the number of component clusters in the mixture-model using a new informational complexity criterion of the inverse-Fisher information matrix. In: O. Opitz, B. Lausen, & R. Klar (Eds.), Information and Classification (pp. 40–54). Berlin, Heidelberg. Springer Berlin Heidelberg

  • Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52(3), 345–370.

    Article  MathSciNet  MATH  Google Scholar 

  • Browne, R. P., & McNicholas, P. D. (2015). A mixture of generalized hyperbolic distributions. Canadian Journal of Statistics, 43(2), 176–198.

    Article  MathSciNet  MATH  Google Scholar 

  • Broyden, C. (1970). The convergence of a class of double-rank minimization algorithms. Journal of the Institute of Mathematics and its Applications, 6(2), 76–90.

    Article  MATH  Google Scholar 

  • Buck, S. F. (1960). A method of estimation of missing values in multivariate data suitable for use with an electronic computer. Journal of the Royal Statistical Society. Series B (Methodological), 22(2), 302–306.

    Article  MathSciNet  MATH  Google Scholar 

  • Buuren, S. v. (2021). Flexible imputation of missing data. Chapman & Hall/CRC interdisciplinary statistics series. Chapman & Hall/CRC, Boca Raton, 2nd ed.

  • Buuren, S. v., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software,45(3), 1–67

  • Cavanaugh, J. E. (1999). A large-sample model selection criterion based on Kullback’s symmetric divergence. Statistics & Probability Letters, 42(4), 333–343.

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.

    Article  Google Scholar 

  • Coretto, P., & Hennig, C. (2016). Robust improper maximum likelihood: Tuning, computation, and a comparison with other methods for robust Gaussian clustering. Journal of the American Statistical Association, 111(516), 1648–1659.

    Article  MathSciNet  Google Scholar 

  • Cuesta-Albertos, J., Matrán, C., & Mayo-Iscar, A. (2008). Robust estimation in the normal mixture model based on robust clustering. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70(4), 779–802.

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1–22.

    MathSciNet  MATH  Google Scholar 

  • Dooren, P. V., & Ridder, L. D. (1976). An adaptive algorithm for numerical integration over an n-dimensional cube. Journal of Computational and Applied Mathematics, 2(3), 207–217.

    Article  MATH  Google Scholar 

  • Fletcher, R. (1970). A new approach to variable metric algorithms. The Computer Journal, 13(3), 317–322.

    Article  MATH  Google Scholar 

  • Forbes, F., & Wraith, D. (2014). A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: Application to robust clustering. Statistics and Computing, 24(6), 971–984.

    Article  MathSciNet  MATH  Google Scholar 

  • Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458), 611–631.

    Article  MathSciNet  MATH  Google Scholar 

  • Franczak, B. C., Browne, R. P., & McNicholas, P. D. (2014). Mixtures of shifted asymmetric Laplace distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6), 1149–1157.

    Article  Google Scholar 

  • Franczak, B. C., Tortora, C., Browne, R. P., & McNicholas, P. D. (2015). Unsupervised learning via mixtures of skewed distributions with hypercube contours. Pattern Recognition Letters, 58, 69–76.

    Article  Google Scholar 

  • Frühwirth-Schnatter, S. (2006). Finite mixture and Markov switching models. Springer Series in Statistics. Springer New York

  • Gallegos, M. T., & Ritter, G. (2005). A robust method for cluster analysis. The Annals of Statistics, 33(1), 347–380.

    Article  MathSciNet  MATH  Google Scholar 

  • Gallegos, M. T., & Ritter, G. (2009). Trimmed ML estimation of contaminated mixtures. Sankhyā: The Indian Journal of Statistics, Series A (2008-), 71(2), 164–220.

    MathSciNet  MATH  Google Scholar 

  • Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., & Hothorn, T. (2021). mvtnorm: Multivariate normal and t distributions. R package version 1.1-3.

  • Ghahramani, Z., & Jordan, M. I. (1994). Learning from incomplete data. Technical report, Defense Technical Information Center, Fort Belvoir, VA

  • Goldfarb, D. (1970). A family of variable metric methods derived by variational means. Mathematics of Computation, 24(109), 23–26.

    Article  MathSciNet  MATH  Google Scholar 

  • Goren, E. M., & Maitra, R. (2022). Fast model-based clustering of partial records. Stat,11(1), e416. Publisher: John Wiley & Sons, Ltd.

  • Greco, L., & Agostinelli, C. (2020). Weighted likelihood mixture modeling and model-based clustering. Statistics and Computing, 30(2), 255–277.

    Article  MathSciNet  MATH  Google Scholar 

  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

    Article  MATH  Google Scholar 

  • Hurvich, C. M., & Tsai, C.-L. (1989). Regression and time series model selection in small samples. Biometrika, 76(2), 297–307.

    Article  MathSciNet  MATH  Google Scholar 

  • Johnson, R. A., & Wichern, D. W. (2007). Applied multivariate statistical analysis. Pearson Prentice Hall, Upper Saddle River, N.J, 6th ed. edition. OCLC: ocm70867129.

  • Karlis, D., & Santourian, A. (2009). Model-based clustering with non-elliptically contoured distributions. Statistics and Computing, 19(1), 73–83.

    Article  MathSciNet  Google Scholar 

  • Karlis, D., & Xekalaki, E. (2003). Choosing initial values for the EM algorithm for finite mixtures. Computational Statistics & Data Analysis, 41, 577–590.

    Article  MathSciNet  MATH  Google Scholar 

  • Kaufman, L., & Rousseeuw, P. J. (Eds.). (1990). Finding groups in data: An introduction to cluster analysis. Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken, NJ, USA.

  • Lin, T. I. (2009). Maximum likelihood estimation for multivariate skew normal mixture models. Journal of Multivariate Analysis, 100(2), 257–265.

    Article  MathSciNet  MATH  Google Scholar 

  • Lin, T.-I. (2014). Learning from incomplete data via parameterized t mixture models through eigenvalue decomposition. Computational Statistics & Data Analysis, 71, 183–195.

    Article  MathSciNet  MATH  Google Scholar 

  • Little, R. J. A., & Rubin, D. B. (2020). Statistical analysis with missing data. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ, 3rd ed.

  • Liu, C., & Rubin, D. B. (1994). The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika, 81(4), 633–648.

    Article  MathSciNet  MATH  Google Scholar 

  • Maitra, R., & Melnykov, V. (2010). Simulating data to study performance of finite mixture modeling and clustering algorithms. Journal of Computational and Graphical Statistics, 19(2), 354–376. Publisher: Taylor & Francis

  • McLachlan, G. J., & Krishnan, T. (2008). The EM algorithm and extensions. Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken, N.J.

  • McLachlan, G., & Peel, D. (2000). Finite mixture models. Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken, NJ, USA.

  • McNicholas, P. D. (2016). Model-based clustering. Journal of Classification, 33(3), 331–373.

    Article  MathSciNet  MATH  Google Scholar 

  • McNicholas, P., Murphy, T., McDaid, A., & Frost, D. (2010). Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Second Special Issue on Statistical Algorithms and Software, 54(3), 711–723.

    MathSciNet  MATH  Google Scholar 

  • Melnykov, V., Chen, W.-C., & Maitra, R. (2012). MixSim : An R package for simulating data to study performance of clustering algorithms. Journal of Statistical Software, 51(12)

  • Melnykov, V. (2013). Challenges in model-based clustering. Wiley interdisciplinary reviews: computational statistics, 5(2), 135–148.

    Article  Google Scholar 

  • Melnykov, V., & Maitra, R. (2010). Finite mixture models and model-based clustering. Statistics Surveys, 4, 80–116.

    Article  MathSciNet  MATH  Google Scholar 

  • Meng, X.-L., & Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80(2), 267–278.

    Article  MathSciNet  MATH  Google Scholar 

  • Michael, S., & Melnykov, V. (2016). An effective strategy for initializing the em algorithm in finite mixture models. Advances in Data Analysis and Classification, 10, 563–583.

    Article  MathSciNet  MATH  Google Scholar 

  • Morris, K., Punzo, A., Blostein, M., & McNicholas, P. D. (2019). Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric laplace distributions. Computational Statistics and Data Analysis, 132, 145–166.

    Article  MathSciNet  MATH  Google Scholar 

  • Narasimhan, B., Johnson, S. G., Hahn, T., Bouvier, A., & Kiêu, K. (2022). cubature: Adaptive multivariate integration over hypercubes.

  • Novi Inverardi, P. L., & Taufer, E. (2020). Outlier detection through mixtures with an improper component. Electronic Journal of Applied Statistical Analysis, 13(1), 146–163.

    Google Scholar 

  • Peel, D., & McLachlan, G. J. (2000). Robust mixture modelling using the t distribution. Statistics and Computing, 10(4), 339–348.

    Article  Google Scholar 

  • Punzo, A., Mazza, A., & McNicholas, P. D. (2018). ContaminatedMixt: An R package for fitting parsimonious mixtures of multivariate contaminated normal distributions. Journal of Statistical Software, 85(10), 1–25.

    Article  Google Scholar 

  • Punzo, A., & McNicholas, P. D. (2016). Parsimonious mixtures of multivariate contaminated normal distributions. Biometrical Journal, 58(6), 1506–1537.

    Article  MathSciNet  MATH  Google Scholar 

  • Punzo, A., & Tortora, C. (2021). Multiple scaled contaminated normal distribution and its application in clustering. Statistical Modelling, 21(4), 332–358.

    Article  MathSciNet  MATH  Google Scholar 

  • R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing.

  • Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336), 846–850.

    Article  Google Scholar 

  • Ritter, G. (2014). Robust cluster analysis and variable selection. Chapman and Hall/CRC, 1st ed.

  • Rubin, D. B. (Ed.). (1987). Multiple imputation for nonresponse in surveys. Wiley Series in Probability and Statistics. John Wiley & Sons Inc., Hoboken, NJ, USA

  • Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91(434), 473–489.

    Article  MATH  Google Scholar 

  • Sachs, J. D., Layard, R., Helliwell, J. F., et al. (2018). World happiness report 2018. Technical report.

  • Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177.

    Article  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics,6(2)

  • Seghouane, A., & Bekara, M. (2004). A small sample model selection criterion based on Kullback’s symmetric divergence. IEEE Transactions on Signal Processing, 52(12), 3314–3323.

    Article  MathSciNet  MATH  Google Scholar 

  • Serafini, A., Murphy, T. B., & Scrucca, L. (2020). Handling missing data in model-based clustering. arXiv preprint arXiv:2006.02954

  • Shanno, D. (1970). Conditioning of quasi-newton methods for function minimization. Mathematics of Computation, 24(111), 647–656.

    Article  MathSciNet  MATH  Google Scholar 

  • Shireman, E., Steinley, D., & Brusco, M. J. (2017). Examining the effect of initialization strategies on the performance of Gaussian mixture modeling. Behavior Research Methods, 49(1), 282–293.

    Article  Google Scholar 

  • Soetaert, K. (2009). rootSolve: Nonlinear root finding, equilibrium and steady-state analysis of ordinary differential equations. R package 1.6.

  • Soetaert, K., & Herman, P. M. (2009). A practical guide to ecological modelling. Using R as a Simulation Platform. Springer. ISBN 978-1-4020-8623-6

  • Steinley, D. (2004). Properties of the Hubert-Arabie adjusted Rand index. Psychological Methods, 9(3), 386–396.

    Article  Google Scholar 

  • Sugasawa, S., & Kobayashi, G. (2022). Robust fitting of mixture models using weighted complete estimating equations. Computational Statistics & Data Analysis, 174, 107526.

    Article  MathSciNet  MATH  Google Scholar 

  • Tong, H., & Tortora, C. (2022). MixtureMissing: Robust model-based clustering for data sets with missing values at random. R package version 1.0.2.

  • Tong, H., & Tortora, C. (2022). Model-based clustering and outlier detection with missing data. Advances in Data Analysis and Classification, 16(1), 5–30.

    Article  MathSciNet  MATH  Google Scholar 

  • Tortora, C., Punzo, A., & Tran, L. (2023). MSclust: Multiple-scaled clustering. R package version 1.0.3.

  • Tortora, C., Franczak, B. C., Browne, R. P., & McNicholas, P. D. (2019). A mixture of coalesced generalized hyperbolic distributions. Journal of Classification, 36(1), 26–57.

    Article  MathSciNet  MATH  Google Scholar 

  • Tran, L., & Tortora, C. (2021). How many clusters are best? Investigating model selection in robust clustering. In JSM Proceedings, Statistical Learning and Data Science Section. Alexandria, VA: American Statistical Association. 1159–1180 2021.

  • Tukey, J. W. (1960). A survey of sampling from contaminated distributions. In: I. Olkin, S. G. Ghurye, W. Hoeffding, W. G. Madow, & H. B. Mann (Eds.), Contributions to probability and statistics: Essays in Honor of Harold Hotelling (pp. 448–485). Stanford University Press, Stanford, CA

  • Wang, W.-L., & Lin, T.-I. (2015). Robust model-based clustering via mixtures of skew-t distributions with missing information. Advances in Data Analysis and Classification, 9(4), 423–445.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, H., Zhang, Q., Luo, B., & Wei, S. (2004). Robust mixture modelling using multivariate t-distribution with missing information. Pattern Recognition Letters, 25(6), 701–710.

    Article  Google Scholar 

  • Wei, Y., Tang, Y., & McNicholas, P. D. (2019). Mixtures of generalized hyperbolic distributions and mixtures of skew-t distributions for model-based clustering with incomplete data. Computational Statistics & Data Analysis, 130, 18–41.

    Article  MathSciNet  MATH  Google Scholar 

  • Wilks, S. S. (1932). Moments and distributions of estimates of population parameters from fragmentary samples. Annals of Mathematical Statistics, 3, 163–195.

    Article  MATH  Google Scholar 

  • Wolfe, J. H. (1965). A computer program for the maximum likelihood analysis of types. USNPRA Technical Bulletin 65-15, U.S. Naval Personnel Research Activity, San Diego, USA.

  • You, J., Li, Z., & Du, J. (2023). A new iterative initialization of em algorithm for gaussian mixture models. Plos one, 18(4), e0284114.

    Article  Google Scholar 

Download references

Funding

This material is based upon work supported by the National Science Foundation under Grant No. 2209974

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristina Tortora.

Ethics declarations

Ethical Approval

The authors agree to follow Springer ethical conduct.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 218 KB)

Appendices

Appendix A:   Characteristic Functions and the Inversion Formula

In statistics, characteristic functions provide a powerful tool for deriving probability density functions by means of Fourier transformations. One major advantage of this approach is that there always exists a unique characteristic function for every probability distribution.

Definition A.1

Let \(\varvec{X}= \left( X_1, \dots , X_p \right) ^\top \in \mathbb {R}^p\) be a p-variate random vector, \(\varvec{t}= \left( t_1, \dots , t_p \right) ^\top \in \mathbb {R}^p\), and i be an imaginary unit. The function

$$\begin{aligned} \phi _{\varvec{X}} (\varvec{t}) = E \left( \exp ( i\varvec{t}^\top \varvec{X}) \right) \end{aligned}$$
(39)

is called the characteristic function of \(\varvec{X}\).

From a characteristic function, the associated probability density function can be obtained using the inversion formula.

Theorem A.1

(Inversion Formula) Let \(\varvec{X}= \left( X_1, \dots , X_p \right) ^\top \in \mathbb {R}^p\) be a p-variate random vector, \(\phi _{\varvec{X}} (\varvec{t})\) be the characteristic function of \(\varvec{X}\) with \(\varvec{t}= \left( t_1, \dots , t_p \right) ^\top \in \mathbb {R}^p\), and i be an imaginary unit. The probability density function of \(\varvec{X}\) can be obtained by

$$\begin{aligned} f_{\varvec{X}} (\varvec{x})&= (2 \pi )^{-p} {\int _{-\infty }^\infty } \dots {\int _{-\infty }^\infty } \exp \left( -i \varvec{t}^\top \varvec{x}\right) \phi _{\varvec{X}} (\varvec{t}) \, d\varvec{t}\nonumber \\&= (2 \pi )^{-p} {\int _{-\infty }^\infty } \dots {\int _{-\infty }^\infty } \exp \left( -i \sum _{j = 1}^p t_j x_j \right) \phi _{\varvec{X}} (t_1, \dots , t_p) \, dt_1 \dots dt_p. \end{aligned}$$
(40)

To obtain the marginals of the MSCN distribution, the propositions describing the characteristic functions of the MN and MCN distributions are needed. The marginals of the MCN and MSCN distribution are outlined in the methodology under Sect.  3.

Proposition A.1

The characteristic function of a p-variate random vector \(\varvec{X}\!=\! \left( X_1, \dots , X_p \right) ^\top \) \(\in \mathbb {R}^p\) that follows a multivariate normal distribution with mean vector \(\varvec{\mu }\) and covariance matrix \(\varvec{\Sigma }\) is

$$\begin{aligned} \phi _{\varvec{X}} (\varvec{t}) = \exp \left( i \varvec{t}^\top \varvec{\mu }- \frac{1}{2} \varvec{t}^\top \varvec{\Sigma }\varvec{t}\right) , \end{aligned}$$
(41)

where \(\varvec{t}= \left( t_1, \dots , t_p \right) ^\top \in \mathbb {R}^p\) and i is an imaginary unit.

Appendix B:   Proofs

Proposition 3.1

Proof

For data generation purposes, the MCN random variable \(\varvec{X}\) can be represented as

$$\begin{aligned} \varvec{X}= \left( V + \frac{1 - V}{\eta } \right) ^{-1/2} \varvec{Y}, \end{aligned}$$

where V follows a Bernoulli distribution such that \(V = 1\) with probability \(\alpha \in (0.5, 1)\) and \(V = 0\) with probability \(1 - \alpha \); and \(\varvec{Y}\) follows an MN distribution with mean vector \(\varvec{\mu }\) and covariance matrix \(\varvec{\Sigma }\). By Definition A.1 and the law of total expectation, we can establish the following

$$\begin{aligned} \phi _X (t) = E ( \exp ( i\varvec{t}^\top \varvec{X}) )&= \sum _{v = 0}^1 E ( \exp ( i\varvec{t}^\top \varvec{X}) | V = v ) P (V = v) \\&= \alpha E ( \exp ( i\varvec{t}^\top \varvec{Y}) ) + (1 - \alpha ) E ( \exp ( i \eta ^{1/2} \varvec{t}^\top \varvec{Y}) ) \\&= \alpha \phi _{\varvec{Y}} (\varvec{t}) + (1 - \alpha ) \phi _{\varvec{Y}} (\eta ^{1/2} \varvec{t}) \\&= \alpha \exp \left( i \varvec{t}^\top \varvec{\mu }- \frac{1}{2} \varvec{t}^\top \varvec{\Sigma }\varvec{t}\right) + (1 - \alpha ) \exp \left( i\varvec{t}^\top \varvec{\mu }- \frac{1}{2} \eta \varvec{t}^\top \varvec{\Sigma }\varvec{t}\right) . \end{aligned}$$

\(\square \)

Proposition 3.2

Proof

From Definition A.1 and the fact that \(\tilde{\varvec{Y}}\) contains p independent univariate contaminated normal random variables, the characteristic function of the marginal variable \(\varvec{X}_1\) is given by

$$\begin{aligned} \phi _{\varvec{X}_1} (\varvec{t}) = E \left( \exp ( i\varvec{t}^\top \varvec{X}_1 ) \right) = \prod _{j = 1}^q \exp ( i t_j \mu _j ) \prod _{h = 1}^p \phi _{\tilde{Y}_h} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{jh} \right) , \end{aligned}$$

where from Proposition 3.1, for \(h = 1, \dots p\), we know that

$$\begin{aligned}{} & {} \phi _{\tilde{Y}_h} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{jh} \right) = \alpha _h \exp \left[ - \frac{1}{2} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^ {1/2}]_{jh} \right) ^2 \right] \\{} & {} + (1 - \alpha _h) \exp \left[ - \frac{1}{2} \eta _h \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{jh} \right) ^2 \right] . \end{aligned}$$

\(\square \)

Proposition 3.6

Proof

From Definition A.1 and the fact that we are dealing with linear combinations of independent random variables, we have the characteristic function of \(\varvec{X}_1 { \; \mid \; }V_r = v_r, r \in \mathcal {A}\) to be

$$\begin{aligned} \phi _{\varvec{X}_1 { \; \mid \; }V_r, r \in \mathcal {A}} (\varvec{t}) = \prod _{j = 1}^q \exp ( i t_j \mu _j ) \prod _{r \in \mathcal {A}} \phi _{\tilde{Y}_r} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{jr} \right) \prod _{s \in \mathcal {B}} \phi _{\tilde{Y}_s} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{js} \right) . \end{aligned}$$

Herein, for \(r \in \mathcal {A}\), \(\tilde{Y}_r\) follows a univariate normal distribution with mean 0 and variance 1 if \(v_r = 1\) or variance \(\eta _r\) if \(v_r = 0\). Thus,

$$\begin{aligned}{} & {} \phi _{\tilde{Y}_r} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{jr} \right) \\{} & {} = \left\{ \exp \left[ - \frac{1}{2} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^ {1/2}]_{jr} \right) ^2 \right] \right\} ^{v_r} \left\{ \exp \left[ - \frac{1}{2} \eta _r \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{jr} \right) ^2 \right] \right\} ^ {1 - v_r}. \end{aligned}$$

On the other hand, for \(s \in \mathcal {B}\), \(\tilde{Y}_s\) follows a univariate contaminated normal distribution with mean 0, variance 1, proportion of good observation \(\alpha _s\), and degree of contamination \(\eta _s\). As the result,

$$\begin{aligned}{} & {} \phi _{\tilde{Y}_s} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{js} \right) \\{} & {} = \left\{ \alpha _s \exp \left[ - \frac{1}{2} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{js} \right) ^2 \right] + (1 - \alpha _s) \exp \left[ - \frac{1}{2} \eta _s \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{js} \right) ^2 \right] \right\} . \end{aligned}$$

\(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tong, H., Tortora, C. Missing Values and Directional Outlier Detection in Model-Based Clustering. J Classif (2023). https://doi.org/10.1007/s00357-023-09450-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00357-023-09450-2

Keywords

Navigation