Missing Values and Directional Outlier Detection in Model-Based Clustering

Tong, Hung; Tortora, Cristina

doi:10.1007/s00357-023-09450-2

Missing Values and Directional Outlier Detection in Model-Based Clustering

Published: 31 October 2023

(2023)
Cite this article

Journal of Classification Aims and scope Submit manuscript

161 Accesses
1 Citation
Explore all metrics

Abstract

Model-based clustering tackles the task of uncovering heterogeneity in a data set to extract valuable insights. Given the common presence of outliers in practice, robust methods for model-based clustering have been proposed. However, the use of many methods in this area becomes severely limited in applications where partially observed records are common since their existing frameworks often assume complete data only. Here, a mixture of multiple scaled contaminated normal (MSCN) distributions is extended using the expectation-conditional maximization (ECM) algorithm to accommodate data sets with values missing at random. The newly proposed extension preserves the mixture’s capability in yielding robust parameter estimates and performing automatic outlier detection separately for each principal component. In this fitting framework, the MSCN marginal density is approximated using the inversion formula for the characteristic function. Extensive simulation studies involving incomplete data sets with outliers are conducted to evaluate parameter estimates and to compare clustering performance and outlier detection of our model to other mixtures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-based clustering and outlier detection with missing data

Article 22 January 2022

Robust model-based clustering via mixtures of skew-t distributions with missing information

Article 17 November 2015

Clustering data with non-ignorable missingness using semi-parametric mixture models assuming independence within components

Article 12 February 2023

Data Availability

The data that support the findings of this study are available from the corresponding author upon request.

Code Availability

The code can be found on github at https://github.com/cristinatortora/MSCN_missing.

References

Aitken, A. (1926). A series formula for the roots of algebraic and transcendental equations. Proceedings of the Royal Society of Edinburgh, 45(1), 14–22.
Article MATH Google Scholar
Aitkin, M., & Wilson, G. T. (1980). Mixture models, outliers, and the EM algorithm. Technometrics, 22(3), 325–331.
Article MATH Google Scholar
Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. In: E. Parzen, K. Tanabe, & G. Kitagawa (Eds.), Selected Papers of Hirotugu Akaike (pp. 199–213). Springer New York, New York, NY
Akogul, S., & Erisoglu, M. (2016). A comparison of information criteria in clustering based on mixture of multivariate normal distributions. Mathematical and Computational Applications, 21(3), 34.
Article MathSciNet Google Scholar
Andrews, J. L., & McNicholas, P. D. (2012). Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions. Statistics and Computing, 22(5), 1021–1029.
Article MathSciNet MATH Google Scholar
Bagnato, L., & Punzo, A. (2021). Unconstrained representation of orthogonal matrices with application to common principal components. Computational Statistics, 36(2), 1177–1195.
Article MathSciNet MATH Google Scholar
Bagnato, L., Punzo, A., & Zoia, M. G. (2017). The multivariate leptokurtic-normal distribution and its application in model-based clustering. Canadian Journal of Statistics, 45(1), 95–119.
Article MathSciNet MATH Google Scholar
Banfield, J. D., & Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803.
Article MathSciNet MATH Google Scholar
Berntsen, J., Espelid, T. O., & Genz, A. (1991). An adaptive algorithm for the approximate calculation of multiple integrals. ACM Transactions on Mathematical Software, 17(4), 437–451.
Article MathSciNet MATH Google Scholar
Biernacki, C., & Govaert, G. (1997). Using the classification likelihood to choose the number of clusters. Computing Science and Statistics, (pp. 451–457)
Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7), 719–725.
Article Google Scholar
Bozdogan, H. (1993). Choosing the number of component clusters in the mixture-model using a new informational complexity criterion of the inverse-Fisher information matrix. In: O. Opitz, B. Lausen, & R. Klar (Eds.), Information and Classification (pp. 40–54). Berlin, Heidelberg. Springer Berlin Heidelberg
Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52(3), 345–370.
Article MathSciNet MATH Google Scholar
Browne, R. P., & McNicholas, P. D. (2015). A mixture of generalized hyperbolic distributions. Canadian Journal of Statistics, 43(2), 176–198.
Article MathSciNet MATH Google Scholar
Broyden, C. (1970). The convergence of a class of double-rank minimization algorithms. Journal of the Institute of Mathematics and its Applications, 6(2), 76–90.
Article MATH Google Scholar
Buck, S. F. (1960). A method of estimation of missing values in multivariate data suitable for use with an electronic computer. Journal of the Royal Statistical Society. Series B (Methodological), 22(2), 302–306.
Article MathSciNet MATH Google Scholar
Buuren, S. v. (2021). Flexible imputation of missing data. Chapman & Hall/CRC interdisciplinary statistics series. Chapman & Hall/CRC, Boca Raton, 2nd ed.
Buuren, S. v., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software,45(3), 1–67
Cavanaugh, J. E. (1999). A large-sample model selection criterion based on Kullback’s symmetric divergence. Statistics & Probability Letters, 42(4), 333–343.
Article MathSciNet MATH Google Scholar
Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.
Article Google Scholar
Coretto, P., & Hennig, C. (2016). Robust improper maximum likelihood: Tuning, computation, and a comparison with other methods for robust Gaussian clustering. Journal of the American Statistical Association, 111(516), 1648–1659.
Article MathSciNet Google Scholar
Cuesta-Albertos, J., Matrán, C., & Mayo-Iscar, A. (2008). Robust estimation in the normal mixture model based on robust clustering. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70(4), 779–802.
Article MathSciNet MATH Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1–22.
MathSciNet MATH Google Scholar
Dooren, P. V., & Ridder, L. D. (1976). An adaptive algorithm for numerical integration over an n-dimensional cube. Journal of Computational and Applied Mathematics, 2(3), 207–217.
Article MATH Google Scholar
Fletcher, R. (1970). A new approach to variable metric algorithms. The Computer Journal, 13(3), 317–322.
Article MATH Google Scholar
Forbes, F., & Wraith, D. (2014). A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: Application to robust clustering. Statistics and Computing, 24(6), 971–984.
Article MathSciNet MATH Google Scholar
Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458), 611–631.
Article MathSciNet MATH Google Scholar
Franczak, B. C., Browne, R. P., & McNicholas, P. D. (2014). Mixtures of shifted asymmetric Laplace distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6), 1149–1157.
Article Google Scholar
Franczak, B. C., Tortora, C., Browne, R. P., & McNicholas, P. D. (2015). Unsupervised learning via mixtures of skewed distributions with hypercube contours. Pattern Recognition Letters, 58, 69–76.
Article Google Scholar
Frühwirth-Schnatter, S. (2006). Finite mixture and Markov switching models. Springer Series in Statistics. Springer New York
Gallegos, M. T., & Ritter, G. (2005). A robust method for cluster analysis. The Annals of Statistics, 33(1), 347–380.
Article MathSciNet MATH Google Scholar
Gallegos, M. T., & Ritter, G. (2009). Trimmed ML estimation of contaminated mixtures. Sankhyā: The Indian Journal of Statistics, Series A (2008-), 71(2), 164–220.
MathSciNet MATH Google Scholar
Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., & Hothorn, T. (2021). mvtnorm: Multivariate normal and t distributions. R package version 1.1-3.
Ghahramani, Z., & Jordan, M. I. (1994). Learning from incomplete data. Technical report, Defense Technical Information Center, Fort Belvoir, VA
Goldfarb, D. (1970). A family of variable metric methods derived by variational means. Mathematics of Computation, 24(109), 23–26.
Article MathSciNet MATH Google Scholar
Goren, E. M., & Maitra, R. (2022). Fast model-based clustering of partial records. Stat,11(1), e416. Publisher: John Wiley & Sons, Ltd.
Greco, L., & Agostinelli, C. (2020). Weighted likelihood mixture modeling and model-based clustering. Statistics and Computing, 30(2), 255–277.
Article MathSciNet MATH Google Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Article MATH Google Scholar
Hurvich, C. M., & Tsai, C.-L. (1989). Regression and time series model selection in small samples. Biometrika, 76(2), 297–307.
Article MathSciNet MATH Google Scholar
Johnson, R. A., & Wichern, D. W. (2007). Applied multivariate statistical analysis. Pearson Prentice Hall, Upper Saddle River, N.J, 6th ed. edition. OCLC: ocm70867129.
Karlis, D., & Santourian, A. (2009). Model-based clustering with non-elliptically contoured distributions. Statistics and Computing, 19(1), 73–83.
Article MathSciNet Google Scholar
Karlis, D., & Xekalaki, E. (2003). Choosing initial values for the EM algorithm for finite mixtures. Computational Statistics & Data Analysis, 41, 577–590.
Article MathSciNet MATH Google Scholar
Kaufman, L., & Rousseeuw, P. J. (Eds.). (1990). Finding groups in data: An introduction to cluster analysis. Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken, NJ, USA.
Lin, T. I. (2009). Maximum likelihood estimation for multivariate skew normal mixture models. Journal of Multivariate Analysis, 100(2), 257–265.
Article MathSciNet MATH Google Scholar
Lin, T.-I. (2014). Learning from incomplete data via parameterized t mixture models through eigenvalue decomposition. Computational Statistics & Data Analysis, 71, 183–195.
Article MathSciNet MATH Google Scholar
Little, R. J. A., & Rubin, D. B. (2020). Statistical analysis with missing data. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ, 3rd ed.
Liu, C., & Rubin, D. B. (1994). The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika, 81(4), 633–648.
Article MathSciNet MATH Google Scholar
Maitra, R., & Melnykov, V. (2010). Simulating data to study performance of finite mixture modeling and clustering algorithms. Journal of Computational and Graphical Statistics, 19(2), 354–376. Publisher: Taylor & Francis
McLachlan, G. J., & Krishnan, T. (2008). The EM algorithm and extensions. Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken, N.J.
McLachlan, G., & Peel, D. (2000). Finite mixture models. Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken, NJ, USA.
McNicholas, P. D. (2016). Model-based clustering. Journal of Classification, 33(3), 331–373.
Article MathSciNet MATH Google Scholar
McNicholas, P., Murphy, T., McDaid, A., & Frost, D. (2010). Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Second Special Issue on Statistical Algorithms and Software, 54(3), 711–723.
MathSciNet MATH Google Scholar
Melnykov, V., Chen, W.-C., & Maitra, R. (2012). MixSim : An R package for simulating data to study performance of clustering algorithms. Journal of Statistical Software, 51(12)
Melnykov, V. (2013). Challenges in model-based clustering. Wiley interdisciplinary reviews: computational statistics, 5(2), 135–148.
Article Google Scholar
Melnykov, V., & Maitra, R. (2010). Finite mixture models and model-based clustering. Statistics Surveys, 4, 80–116.
Article MathSciNet MATH Google Scholar
Meng, X.-L., & Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80(2), 267–278.
Article MathSciNet MATH Google Scholar
Michael, S., & Melnykov, V. (2016). An effective strategy for initializing the em algorithm in finite mixture models. Advances in Data Analysis and Classification, 10, 563–583.
Article MathSciNet MATH Google Scholar
Morris, K., Punzo, A., Blostein, M., & McNicholas, P. D. (2019). Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric laplace distributions. Computational Statistics and Data Analysis, 132, 145–166.
Article MathSciNet MATH Google Scholar
Narasimhan, B., Johnson, S. G., Hahn, T., Bouvier, A., & Kiêu, K. (2022). cubature: Adaptive multivariate integration over hypercubes.
Novi Inverardi, P. L., & Taufer, E. (2020). Outlier detection through mixtures with an improper component. Electronic Journal of Applied Statistical Analysis, 13(1), 146–163.
Google Scholar
Peel, D., & McLachlan, G. J. (2000). Robust mixture modelling using the t distribution. Statistics and Computing, 10(4), 339–348.
Article Google Scholar
Punzo, A., Mazza, A., & McNicholas, P. D. (2018). ContaminatedMixt: An R package for fitting parsimonious mixtures of multivariate contaminated normal distributions. Journal of Statistical Software, 85(10), 1–25.
Article Google Scholar
Punzo, A., & McNicholas, P. D. (2016). Parsimonious mixtures of multivariate contaminated normal distributions. Biometrical Journal, 58(6), 1506–1537.
Article MathSciNet MATH Google Scholar
Punzo, A., & Tortora, C. (2021). Multiple scaled contaminated normal distribution and its application in clustering. Statistical Modelling, 21(4), 332–358.
Article MathSciNet MATH Google Scholar
R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing.
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336), 846–850.
Article Google Scholar
Ritter, G. (2014). Robust cluster analysis and variable selection. Chapman and Hall/CRC, 1st ed.
Rubin, D. B. (Ed.). (1987). Multiple imputation for nonresponse in surveys. Wiley Series in Probability and Statistics. John Wiley & Sons Inc., Hoboken, NJ, USA
Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91(434), 473–489.
Article MATH Google Scholar
Sachs, J. D., Layard, R., Helliwell, J. F., et al. (2018). World happiness report 2018. Technical report.
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177.
Article Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics,6(2)
Seghouane, A., & Bekara, M. (2004). A small sample model selection criterion based on Kullback’s symmetric divergence. IEEE Transactions on Signal Processing, 52(12), 3314–3323.
Article MathSciNet MATH Google Scholar
Serafini, A., Murphy, T. B., & Scrucca, L. (2020). Handling missing data in model-based clustering. arXiv preprint arXiv:2006.02954
Shanno, D. (1970). Conditioning of quasi-newton methods for function minimization. Mathematics of Computation, 24(111), 647–656.
Article MathSciNet MATH Google Scholar
Shireman, E., Steinley, D., & Brusco, M. J. (2017). Examining the effect of initialization strategies on the performance of Gaussian mixture modeling. Behavior Research Methods, 49(1), 282–293.
Article Google Scholar
Soetaert, K. (2009). rootSolve: Nonlinear root finding, equilibrium and steady-state analysis of ordinary differential equations. R package 1.6.
Soetaert, K., & Herman, P. M. (2009). A practical guide to ecological modelling. Using R as a Simulation Platform. Springer. ISBN 978-1-4020-8623-6
Steinley, D. (2004). Properties of the Hubert-Arabie adjusted Rand index. Psychological Methods, 9(3), 386–396.
Article Google Scholar
Sugasawa, S., & Kobayashi, G. (2022). Robust fitting of mixture models using weighted complete estimating equations. Computational Statistics & Data Analysis, 174, 107526.
Article MathSciNet MATH Google Scholar
Tong, H., & Tortora, C. (2022). MixtureMissing: Robust model-based clustering for data sets with missing values at random. R package version 1.0.2.
Tong, H., & Tortora, C. (2022). Model-based clustering and outlier detection with missing data. Advances in Data Analysis and Classification, 16(1), 5–30.
Article MathSciNet MATH Google Scholar
Tortora, C., Punzo, A., & Tran, L. (2023). MSclust: Multiple-scaled clustering. R package version 1.0.3.
Tortora, C., Franczak, B. C., Browne, R. P., & McNicholas, P. D. (2019). A mixture of coalesced generalized hyperbolic distributions. Journal of Classification, 36(1), 26–57.
Article MathSciNet MATH Google Scholar
Tran, L., & Tortora, C. (2021). How many clusters are best? Investigating model selection in robust clustering. In JSM Proceedings, Statistical Learning and Data Science Section. Alexandria, VA: American Statistical Association. 1159–1180 2021.
Tukey, J. W. (1960). A survey of sampling from contaminated distributions. In: I. Olkin, S. G. Ghurye, W. Hoeffding, W. G. Madow, & H. B. Mann (Eds.), Contributions to probability and statistics: Essays in Honor of Harold Hotelling (pp. 448–485). Stanford University Press, Stanford, CA
Wang, W.-L., & Lin, T.-I. (2015). Robust model-based clustering via mixtures of skew-t distributions with missing information. Advances in Data Analysis and Classification, 9(4), 423–445.
Article MathSciNet MATH Google Scholar
Wang, H., Zhang, Q., Luo, B., & Wei, S. (2004). Robust mixture modelling using multivariate t-distribution with missing information. Pattern Recognition Letters, 25(6), 701–710.
Article Google Scholar
Wei, Y., Tang, Y., & McNicholas, P. D. (2019). Mixtures of generalized hyperbolic distributions and mixtures of skew-t distributions for model-based clustering with incomplete data. Computational Statistics & Data Analysis, 130, 18–41.
Article MathSciNet MATH Google Scholar
Wilks, S. S. (1932). Moments and distributions of estimates of population parameters from fragmentary samples. Annals of Mathematical Statistics, 3, 163–195.
Article MATH Google Scholar
Wolfe, J. H. (1965). A computer program for the maximum likelihood analysis of types. USNPRA Technical Bulletin 65-15, U.S. Naval Personnel Research Activity, San Diego, USA.
You, J., Li, Z., & Du, J. (2023). A new iterative initialization of em algorithm for gaussian mixture models. Plos one, 18(4), e0284114.
Article Google Scholar

Download references

Funding

This material is based upon work supported by the National Science Foundation under Grant No. 2209974

Author information

Authors and Affiliations

The University of Alabama, Tuscaloosa, AL, 35487, USA
Hung Tong
San José State University, San José, CA, 95192, USA
Cristina Tortora

Authors

Hung Tong
View author publications
You can also search for this author in PubMed Google Scholar
Cristina Tortora
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cristina Tortora.

Ethics declarations

Ethical Approval

The authors agree to follow Springer ethical conduct.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 218 KB)

Appendices

Appendix A: Characteristic Functions and the Inversion Formula

In statistics, characteristic functions provide a powerful tool for deriving probability density functions by means of Fourier transformations. One major advantage of this approach is that there always exists a unique characteristic function for every probability distribution.

Definition A.1

Let $\varvec{X}= \left( X_1, \dots , X_p \right) ^\top \in \mathbb {R}^p$ be a p-variate random vector, $\varvec{t}= \left( t_1, \dots , t_p \right) ^\top \in \mathbb {R}^p$, and i be an imaginary unit. The function

$$\begin{aligned} \phi _{\varvec{X}} (\varvec{t}) = E \left( \exp ( i\varvec{t}^\top \varvec{X}) \right) \end{aligned}$$

(39)

is called the characteristic function of $\varvec{X}$.

From a characteristic function, the associated probability density function can be obtained using the inversion formula.

Theorem A.1

(Inversion Formula) Let $\varvec{X}= \left( X_1, \dots , X_p \right) ^\top \in \mathbb {R}^p$ be a p-variate random vector, $\phi _{\varvec{X}} (\varvec{t})$ be the characteristic function of $\varvec{X}$ with $\varvec{t}= \left( t_1, \dots , t_p \right) ^\top \in \mathbb {R}^p$, and i be an imaginary unit. The probability density function of $\varvec{X}$ can be obtained by

$$\begin{aligned} f_{\varvec{X}} (\varvec{x})&= (2 \pi )^{-p} {\int _{-\infty }^\infty } \dots {\int _{-\infty }^\infty } \exp \left( -i \varvec{t}^\top \varvec{x}\right) \phi _{\varvec{X}} (\varvec{t}) \, d\varvec{t}\nonumber \\&= (2 \pi )^{-p} {\int _{-\infty }^\infty } \dots {\int _{-\infty }^\infty } \exp \left( -i \sum _{j = 1}^p t_j x_j \right) \phi _{\varvec{X}} (t_1, \dots , t_p) \, dt_1 \dots dt_p. \end{aligned}$$

(40)

To obtain the marginals of the MSCN distribution, the propositions describing the characteristic functions of the MN and MCN distributions are needed. The marginals of the MCN and MSCN distribution are outlined in the methodology under Sect. 3.

Proposition A.1

The characteristic function of a p-variate random vector $\varvec{X}\!=\! \left( X_1, \dots , X_p \right) ^\top $ $\in \mathbb {R}^p$ that follows a multivariate normal distribution with mean vector $\varvec{\mu }$ and covariance matrix $\varvec{\Sigma }$ is

$$\begin{aligned} \phi _{\varvec{X}} (\varvec{t}) = \exp \left( i \varvec{t}^\top \varvec{\mu }- \frac{1}{2} \varvec{t}^\top \varvec{\Sigma }\varvec{t}\right) , \end{aligned}$$

(41)

where $\varvec{t}= \left( t_1, \dots , t_p \right) ^\top \in \mathbb {R}^p$ and i is an imaginary unit.

Appendix B: Proofs

Proposition 3.1

Proof

For data generation purposes, the MCN random variable $\varvec{X}$ can be represented as

$$\begin{aligned} \varvec{X}= \left( V + \frac{1 - V}{\eta } \right) ^{-1/2} \varvec{Y}, \end{aligned}$$

where V follows a Bernoulli distribution such that $V = 1$ with probability $\alpha \in (0.5, 1)$ and $V = 0$ with probability $1 - \alpha $; and $\varvec{Y}$ follows an MN distribution with mean vector $\varvec{\mu }$ and covariance matrix $\varvec{\Sigma }$. By Definition A.1 and the law of total expectation, we can establish the following

$$\begin{aligned} \phi _X (t) = E ( \exp ( i\varvec{t}^\top \varvec{X}) )&= \sum _{v = 0}^1 E ( \exp ( i\varvec{t}^\top \varvec{X}) | V = v ) P (V = v) \\&= \alpha E ( \exp ( i\varvec{t}^\top \varvec{Y}) ) + (1 - \alpha ) E ( \exp ( i \eta ^{1/2} \varvec{t}^\top \varvec{Y}) ) \\&= \alpha \phi _{\varvec{Y}} (\varvec{t}) + (1 - \alpha ) \phi _{\varvec{Y}} (\eta ^{1/2} \varvec{t}) \\&= \alpha \exp \left( i \varvec{t}^\top \varvec{\mu }- \frac{1}{2} \varvec{t}^\top \varvec{\Sigma }\varvec{t}\right) + (1 - \alpha ) \exp \left( i\varvec{t}^\top \varvec{\mu }- \frac{1}{2} \eta \varvec{t}^\top \varvec{\Sigma }\varvec{t}\right) . \end{aligned}$$

$\square $

Proposition 3.2

Proof

From Definition A.1 and the fact that $\tilde{\varvec{Y}}$ contains p independent univariate contaminated normal random variables, the characteristic function of the marginal variable $\varvec{X}_1$ is given by

$$\begin{aligned} \phi _{\varvec{X}_1} (\varvec{t}) = E \left( \exp ( i\varvec{t}^\top \varvec{X}_1 ) \right) = \prod _{j = 1}^q \exp ( i t_j \mu _j ) \prod _{h = 1}^p \phi _{\tilde{Y}_h} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{jh} \right) , \end{aligned}$$

where from Proposition 3.1, for $h = 1, \dots p$, we know that

$$\begin{aligned}{} & {} \phi _{\tilde{Y}_h} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{jh} \right) = \alpha _h \exp \left[ - \frac{1}{2} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^ {1/2}]_{jh} \right) ^2 \right] \\{} & {} + (1 - \alpha _h) \exp \left[ - \frac{1}{2} \eta _h \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{jh} \right) ^2 \right] . \end{aligned}$$

$\square $

Proposition 3.6

Proof

From Definition A.1 and the fact that we are dealing with linear combinations of independent random variables, we have the characteristic function of $\varvec{X}_1 { \; \mid \; }V_r = v_r, r \in \mathcal {A}$ to be

$$\begin{aligned} \phi _{\varvec{X}_1 { \; \mid \; }V_r, r \in \mathcal {A}} (\varvec{t}) = \prod _{j = 1}^q \exp ( i t_j \mu _j ) \prod _{r \in \mathcal {A}} \phi _{\tilde{Y}_r} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{jr} \right) \prod _{s \in \mathcal {B}} \phi _{\tilde{Y}_s} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{js} \right) . \end{aligned}$$

Herein, for $r \in \mathcal {A}$, $\tilde{Y}_r$ follows a univariate normal distribution with mean 0 and variance 1 if $v_r = 1$ or variance $\eta _r$ if $v_r = 0$. Thus,

$$\begin{aligned}{} & {} \phi _{\tilde{Y}_r} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{jr} \right) \\{} & {} = \left\{ \exp \left[ - \frac{1}{2} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^ {1/2}]_{jr} \right) ^2 \right] \right\} ^{v_r} \left\{ \exp \left[ - \frac{1}{2} \eta _r \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{jr} \right) ^2 \right] \right\} ^ {1 - v_r}. \end{aligned}$$

On the other hand, for $s \in \mathcal {B}$, $\tilde{Y}_s$ follows a univariate contaminated normal distribution with mean 0, variance 1, proportion of good observation $\alpha _s$, and degree of contamination $\eta _s$. As the result,

$$\begin{aligned}{} & {} \phi _{\tilde{Y}_s} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{js} \right) \\{} & {} = \left\{ \alpha _s \exp \left[ - \frac{1}{2} \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{js} \right) ^2 \right] + (1 - \alpha _s) \exp \left[ - \frac{1}{2} \eta _s \left( \sum _{j = 1}^q t_j [\varvec{\Gamma }\varvec{\Lambda }^{1/2}]_{js} \right) ^2 \right] \right\} . \end{aligned}$$

$\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tong, H., Tortora, C. Missing Values and Directional Outlier Detection in Model-Based Clustering. J Classif (2023). https://doi.org/10.1007/s00357-023-09450-2

Download citation

Accepted: 06 September 2023
Published: 31 October 2023
DOI: https://doi.org/10.1007/s00357-023-09450-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Missing Values and Directional Outlier Detection in Model-Based Clustering

Abstract

Access this article

Similar content being viewed by others

Model-based clustering and outlier detection with missing data

Robust model-based clustering via mixtures of skew-t distributions with missing information

Clustering data with non-ignorable missingness using semi-parametric mixture models assuming independence within components

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Conflict of Interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 218 KB)

Appendices

Appendix A: Characteristic Functions and the Inversion Formula

Definition A.1

Theorem A.1

Proposition A.1

Appendix B: Proofs

Proof

Proof

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Missing Values and Directional Outlier Detection in Model-Based Clustering

Abstract

Access this article

Similar content being viewed by others

Model-based clustering and outlier detection with missing data

Robust model-based clustering via mixtures of skew-t distributions with missing information

Clustering data with non-ignorable missingness using semi-parametric mixture models assuming independence within components

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Conflict of Interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 218 KB)

Appendices

Appendix A: Characteristic Functions and the Inversion Formula

Definition A.1

Theorem A.1

Proposition A.1

Appendix B: Proofs

Proof

Proof

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation