Abstract
Unless data are missing completely at random (MCAR), proper methodology is crucial for the analysis of incomplete data. Consequently, methods for effectively testing the MCAR mechanism become important, and procedures were developed via testing the homogeneity of means and variances–covariances across the observed patterns (e.g., Kim & Bentler in Psychometrika 67:609–624, 2002; Little in J Am Stat Assoc 83:1198–1202, 1988). The current article shows that the population counterparts of the sample means and covariances of a given pattern of the observed data depend on the underlying structure that generates the data, and the normal-distribution-based maximum likelihood estimates for different patterns of the observed sample can converge to the same values even when data are missing at random or missing not at random, although the values may not equal those of the underlying population distribution. The results imply that statistics developed for testing the homogeneity of means and covariances cannot be safely used for testing the MCAR mechanism even when the population distribution is multivariate normal.
Similar content being viewed by others
Notes
At the concluding section on page 23, Thoemmes and Enders (2007) stated “One possible explanation for these results is that the sample size conditions we investigated (\(N = 100\) and 500) were not large enough for the test’s asymptotic properties to be evidenced."
The formulations in Eq. (10) are for simplicity in presentation. They can be replaced by \(x_1=\mu _1+\sigma _1z_1\) and \(x_2=\sigma _2[\rho z_1+(1-\rho ^2)^{1/2}z_2]+\mu _2\), and the results in the stated theorems still hold, because maximum likelihood estimates as well as sample means and covariances are equivariant.
Thoemmes and Enders (2007) excluded the jth variable \(x_{ij}\) in predicting \(r_{ij}\), i.e., only let \(x_{i1}\), \(x_{i2}\), \(\ldots \), \(x_{i(j-1)}\), \(x_{i(j+1)}\), \(\ldots \), \(x_{ip}\) be the covariates.
References
Anderson, T. W. (1957). Maximum likelihood estimates for the multivariate normal distribution when some observations are missing. Journal of the American Statistical Association, 52, 200–203.
Bentler, P. M. (2006). EQS 6 structural equations program manual. Encino, CA: Multivariate Software.
Blanca, M. J., Arnau, J., Löpez-Montiel, D., Bono, R., & Bendayan, R. (2015). Skewness and kurtosis in real data samples. Methodology, 9, 78–84.
Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144–152.
Chen, H. Y., & Little, R. (1999). A test of missing completely at random for generalised estimating equations with missing data. Biometrika, 86, 1–13.
Enders, C. K. (2010). Applied missing data analysis. New York: Guilford.
Galati, J. C., & Seaton, K. A. (2016). MCAR is not necessary for the complete cases to constitute a simple random subsample of the target sample. Statistical Methods in Medical Research, 25, 1527–1534.
Hawkins, D. M. (1981). A new test for multivariate normality and homoscedasticity. Technometrics, 23, 105–110.
Jamshidian, M., & Jalal, S. (2010). Tests of homoscedasticity, normality and missing completely at random for incomplete multivariate data. Psychometrika, 75, 649–674.
Jamshidian, M., Jalal, S., & Jansen, C. (2014). MissMech: An R Package for testing homoscedasticity, multivariate normality, and missing completely at random (MCAR). Journal of Statistical Software, 56, 1–31.
Jöreskog, K. G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 36, 409–426.
Kano, Y., & Takai, K. (2011). Analysis of NMAR missing data without specifying missing-data mechanisms in a linear latent variate model. Journal of Multivariate Analysis, 102, 1241–1255.
Kim, K. H., & Bentler, P. M. (2002). Tests of homogeneity of means and covariance matrices for multivariate incomplete data. Psychometrika, 67, 609–624.
Li, J., & Yu, Y. (2015). A nonparametric test of missing completely at random for incomplete multivariate data. Psychometrika, 80(3), 707–726.
Little, R. J. A. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 83, 1198–1202.
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). New York: Wiley.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166.
Park, T., & Davis, C. S. (1993). A test of the missing data mechanism for repeated categorical data. Biometrics, 49, 631–638.
Park, T., & Lee, S.-Y. (1997). A test of missing completely at random for longitudinal data with missing observations. Statistics in Medicine, 16, 1859–1871.
Qu, A., & Song, P. X. K. (2002). Testing ignorable missingness in estimating equation approaches for longitudinal data. Biometrika, 89, 841–850.
Rubin, D. B. (1976). Inference and missing data (with discussions). Biometrika, 63, 581–592.
Sörbom, D. (1974). A general method for studying differences in factor means and factor structures between groups. British Journal of Mathematical and Statistical Psychology, 27, 229–239.
Tang, M., & Bentler, P. M. (1998). Theory and method for constrained estimation in structural equation models with incomplete data. Computational Statistics and Data Analysis, 27, 257–270.
Thoemmes, F., & Enders, C. K. (2007). A structural equation model for testing whether data are missing completely at random. In Paper Presented at the Annual Meeting of the American Educational Research Association. IL: Chicago.
Yuan, K.-H. (2009). Normal distribution based pseudo ML for missing data: With applications to mean and covariance structure analysis. Journal of Multivariate Analysis, 100, 1900–1918.
Yuan, K.-H., Chan, W., & Tian, Y. (2016). Expectation-robust algorithm and estimating equations for means and dispersion matrix with missing data. Annals of the Institute of Statistical Mathematics, 68, 329–351.
Author information
Authors and Affiliations
Corresponding author
Additional information
The research was supported by the National Science Foundation under Grant No. SES-1461355.
Rights and permissions
About this article
Cite this article
Yuan, KH., Jamshidian, M. & Kano, Y. Missing Data Mechanisms and Homogeneity of Means and Variances–Covariances. Psychometrika 83, 425–442 (2018). https://doi.org/10.1007/s11336-018-9609-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-018-9609-x