Skip to main content
Log in

Missing Data Mechanisms and Homogeneity of Means and Variances–Covariances

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Unless data are missing completely at random (MCAR), proper methodology is crucial for the analysis of incomplete data. Consequently, methods for effectively testing the MCAR mechanism become important, and procedures were developed via testing the homogeneity of means and variances–covariances across the observed patterns (e.g., Kim & Bentler in Psychometrika 67:609–624, 2002; Little in J Am Stat Assoc 83:1198–1202, 1988). The current article shows that the population counterparts of the sample means and covariances of a given pattern of the observed data depend on the underlying structure that generates the data, and the normal-distribution-based maximum likelihood estimates for different patterns of the observed sample can converge to the same values even when data are missing at random or missing not at random, although the values may not equal those of the underlying population distribution. The results imply that statistics developed for testing the homogeneity of means and covariances cannot be safely used for testing the MCAR mechanism even when the population distribution is multivariate normal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. At the concluding section on page 23, Thoemmes and Enders (2007) stated “One possible explanation for these results is that the sample size conditions we investigated (\(N = 100\) and 500) were not large enough for the test’s asymptotic properties to be evidenced."

  2. The formulations in Eq. (10) are for simplicity in presentation. They can be replaced by \(x_1=\mu _1+\sigma _1z_1\) and \(x_2=\sigma _2[\rho z_1+(1-\rho ^2)^{1/2}z_2]+\mu _2\), and the results in the stated theorems still hold, because maximum likelihood estimates as well as sample means and covariances are equivariant.

  3. Thoemmes and Enders (2007) excluded the jth variable \(x_{ij}\) in predicting \(r_{ij}\), i.e., only let \(x_{i1}\), \(x_{i2}\), \(\ldots \), \(x_{i(j-1)}\), \(x_{i(j+1)}\), \(\ldots \), \(x_{ip}\) be the covariates.

References

  • Anderson, T. W. (1957). Maximum likelihood estimates for the multivariate normal distribution when some observations are missing. Journal of the American Statistical Association, 52, 200–203.

    Article  Google Scholar 

  • Bentler, P. M. (2006). EQS 6 structural equations program manual. Encino, CA: Multivariate Software.

    Google Scholar 

  • Blanca, M. J., Arnau, J., Löpez-Montiel, D., Bono, R., & Bendayan, R. (2015). Skewness and kurtosis in real data samples. Methodology, 9, 78–84.

    Article  Google Scholar 

  • Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144–152.

    Article  Google Scholar 

  • Chen, H. Y., & Little, R. (1999). A test of missing completely at random for generalised estimating equations with missing data. Biometrika, 86, 1–13.

    Article  Google Scholar 

  • Enders, C. K. (2010). Applied missing data analysis. New York: Guilford.

    Google Scholar 

  • Galati, J. C., & Seaton, K. A. (2016). MCAR is not necessary for the complete cases to constitute a simple random subsample of the target sample. Statistical Methods in Medical Research, 25, 1527–1534.

    Article  PubMed  Google Scholar 

  • Hawkins, D. M. (1981). A new test for multivariate normality and homoscedasticity. Technometrics, 23, 105–110.

    Article  Google Scholar 

  • Jamshidian, M., & Jalal, S. (2010). Tests of homoscedasticity, normality and missing completely at random for incomplete multivariate data. Psychometrika, 75, 649–674.

    Article  PubMed  PubMed Central  Google Scholar 

  • Jamshidian, M., Jalal, S., & Jansen, C. (2014). MissMech: An R Package for testing homoscedasticity, multivariate normality, and missing completely at random (MCAR). Journal of Statistical Software, 56, 1–31.

    Article  Google Scholar 

  • Jöreskog, K. G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 36, 409–426.

    Article  Google Scholar 

  • Kano, Y., & Takai, K. (2011). Analysis of NMAR missing data without specifying missing-data mechanisms in a linear latent variate model. Journal of Multivariate Analysis, 102, 1241–1255.

    Article  Google Scholar 

  • Kim, K. H., & Bentler, P. M. (2002). Tests of homogeneity of means and covariance matrices for multivariate incomplete data. Psychometrika, 67, 609–624.

    Article  Google Scholar 

  • Li, J., & Yu, Y. (2015). A nonparametric test of missing completely at random for incomplete multivariate data. Psychometrika, 80(3), 707–726.

    Article  PubMed  Google Scholar 

  • Little, R. J. A. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 83, 1198–1202.

    Article  Google Scholar 

  • Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). New York: Wiley.

    Book  Google Scholar 

  • Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166.

    Article  Google Scholar 

  • Park, T., & Davis, C. S. (1993). A test of the missing data mechanism for repeated categorical data. Biometrics, 49, 631–638.

    Article  PubMed  Google Scholar 

  • Park, T., & Lee, S.-Y. (1997). A test of missing completely at random for longitudinal data with missing observations. Statistics in Medicine, 16, 1859–1871.

    Article  PubMed  Google Scholar 

  • Qu, A., & Song, P. X. K. (2002). Testing ignorable missingness in estimating equation approaches for longitudinal data. Biometrika, 89, 841–850.

    Article  Google Scholar 

  • Rubin, D. B. (1976). Inference and missing data (with discussions). Biometrika, 63, 581–592.

    Article  Google Scholar 

  • Sörbom, D. (1974). A general method for studying differences in factor means and factor structures between groups. British Journal of Mathematical and Statistical Psychology, 27, 229–239.

    Article  Google Scholar 

  • Tang, M., & Bentler, P. M. (1998). Theory and method for constrained estimation in structural equation models with incomplete data. Computational Statistics and Data Analysis, 27, 257–270.

    Article  Google Scholar 

  • Thoemmes, F., & Enders, C. K. (2007). A structural equation model for testing whether data are missing completely at random. In Paper Presented at the Annual Meeting of the American Educational Research Association. IL: Chicago.

  • Yuan, K.-H. (2009). Normal distribution based pseudo ML for missing data: With applications to mean and covariance structure analysis. Journal of Multivariate Analysis, 100, 1900–1918.

    Article  Google Scholar 

  • Yuan, K.-H., Chan, W., & Tian, Y. (2016). Expectation-robust algorithm and estimating equations for means and dispersion matrix with missing data. Annals of the Institute of Statistical Mathematics, 68, 329–351.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke-Hai Yuan.

Additional information

The research was supported by the National Science Foundation under Grant No. SES-1461355.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, KH., Jamshidian, M. & Kano, Y. Missing Data Mechanisms and Homogeneity of Means and Variances–Covariances. Psychometrika 83, 425–442 (2018). https://doi.org/10.1007/s11336-018-9609-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-018-9609-x

Keywords

Navigation