An evaluation of methods to handle missing data in the context of latent variable interaction analysis: multiple imputation, maximum likelihood, and random forest algorithm

Shin, Tacksoo; Long, Jeffrey D.; Davison, Mark L.

doi:10.1007/s42081-022-00176-w

An evaluation of methods to handle missing data in the context of latent variable interaction analysis: multiple imputation, maximum likelihood, and random forest algorithm

Original Paper
Published: 11 August 2022

Volume 5, pages 629–659, (2022)
Cite this article

Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

Tacksoo Shin¹,
Jeffrey D. Long² &
Mark L. Davison³

282 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The objective of this study was to examine the performance of the two most popular missing data methods (i.e., multiple imputation and maximum likelihood), as well as newly developed machine learning framework based on random forest algorithm for missing data under various reserach conditions. The design of the simulation study included random and non-random missingness (i.e., MCAR, MAR, and MNAR), small samples, and different levels of missing rates. All statistical inferences were investigated using latent variable interaction modeling. Consistent with the missing data literature, the combined effects of small sample sizes, higher missing rates, and non-ignorable missingness along with complicated modeling structure adversely affected the accuracy of statistical inferences. Although there is a possibility for overparameterization, it is a good way to select MI when convergence is concerned. If the primary goal of research is to investigate the relationship between variables as in many studies, ML would be attractive. MF presented similar performance compared to MI and ML across all research conditions and outperformed when estimating the variability of parameter estimates. Other practical issues pertaining to the missing data methods were also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sampling Techniques for Quantitative Research

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Notes

Denoting complete data as Y_com and it partitioned as Y_obs and Y_mis. In missing completely at random (MCAR), the probability of an observation being missing (R) does not depend on observed (Y_obs) and unobserved (Y_mis) measurements (\(P\left(R|{Y}_{\mathrm{com}}\right)=P(R)\)), whereas missing at random (MAR) indicates that missingness depends on observed characteristics of the individuals but not on the missing values (\(P\left(R|{Y}_{\mathrm{com}}\right)=P(R|{Y}_{\mathrm{obs}})\)) (Enders, 2010; Rubin, 1996; Shin et al., 2017).
Data are missing not at random (MNAR) when the probability of missing data on a variable Y can depend on other variables (i.e., \({Y}_{\mathrm{obs}}\)) as well as on the underlying values of Y itself (i.e., \({Y}_{\mathrm{mis}}\)) \(p(R|{Y}_{\mathrm{obs}}, {Y}_{\mathrm{mis}})\).

References

Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Sage.
Google Scholar
Aittokallio, T. (2009). Dealing with missing values in large-scale studies: Microarray data imputation and beyond. Briefings in Bioinformatics, 2(2), 253–264.
Article Google Scholar
Ajzen, I. (1987). Attitudes, traits, and actions: Dispositional prediction of behavior in personality and social psychology. In L. Berkowitz (Ed.), Advances in experimental social psychology (Vol. 20, pp. 1–63). Academic Press.
Google Scholar
Algina, J., & Moulder, B. C. (2001). A note on estimating the Jöreskog-Yang model for latent variable interaction using LISREL 8.3. Structural Equation Modeling, 8, 40–52.
Article Google Scholar
Alkasawneh, Pan, & Green. (2007) Multiple imputation for missing data. A caution tale. Sociological Methods and Research, 28(3), 301–309.
Allison, P. D. (2003). Missing data techniques for structural equation models. Journal of Abnormal Psychology, 112, 545–557.
Article Google Scholar
Allison, P. D. (2006). Multiple imputation of categorical variables under the multivariate normal model. In Paper presented at the annual meeting of the American Sociological Association, Montreal Convention Center, Montreal, Quebec, Canada, Aug. 11, 2006.
Allison, P. D. (2010). Missing data. In J. D. Wright & P. V. Marsden (Eds.), Handbook of survey research (pp. 631–657). Emerald Group Publishing Ltd.
Google Scholar
Anderson, J. C., & Gerbing, D. W. (1984). The effect of sampling error on convergence, improper solutions, and goodness-of-fit indices for maximum likelihood confirmatory factor analysis. Psychometrika, 49(2), 155–173.
Article Google Scholar
Anderson, T. W. (1957). Maximum likelihood estimates for the multivariate normal distribution when some observations are missing. Journal of the American Statistical Association, 52, 200–203.
Article MathSciNet MATH Google Scholar
Arbuckle, J. (1996). AMOS-Analysis of moment structures. Small Waters Corporation.
MATH Google Scholar
Arminger, G., & Sobel, M. E. (1990). Pseudo-maximum likelihood estimation of mean and covariance structures with missing data. Journal of the American Statistical Association, 85, 195–203.
Article MathSciNet Google Scholar
Asparouhov, T. & Muthén, B. (2010). Bayesian analysis using Mplus: Technical implementation. http://statmodel.com/download/Bayes3.pdf
Asparouhov, T., & Muthén, B. (2008). Auxiliary variables predicting missing data. Technical appendix. Muthén & Muthén.
Google Scholar
Baraldi, A. N., & Enders, C. K. (2010). An introduction to modern missing data analyses. Journal of School Psychology, 48, 5–37.
Article Google Scholar
Baraldi, A. N., & Enders, C. K. (2013). Missing data methods. In T. D. Little (Ed.), Oxford library of psychology. The Oxford handbook of quantitative methods: Statistical analysis (pp. 635–664). Oxford University Press.
Google Scholar
Black, A. C., Harel, O., & McCoach, D. B. (2011). Missing data techniques for multilevel data: Implications of model misspecification. Journal of Applied Statistics, 38(9), 1845–1865.
Article MathSciNet MATH Google Scholar
Boomsma, A. (1985). Nonconvergence, improper solutions, and starting values in LISREL maximum likelihood estimation. Psychometrika, 50, 229–242.
Article Google Scholar
Brand, J. (1999). Development, implementation and evaluation of multiple imputation strategies for the statistical analysis of incomplete data sets. University of Medical Center, Rotterdam.
Breiman, L. (2003). Manual for setting up, using, and understanding random forest V4.0. https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
Article MATH Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Article MATH Google Scholar
Breiman, L., & Cutler, A. (2002). Manual on setting up, using, and understanding random forests V3.1. Berkeley: University of California, Berkeley. http://oz.berkeley.edu/users/breiman/Using_random_forests_V3.1.pdf
Cham, H., Baraldi, A. N., & Enders, C. K. (2013). Applying maximum likelihood estimation and multiple imputation to moderated regression models with incomplete predictor variables. Multivariate Behavioral Research, 45, 153–154.
Article Google Scholar
Chiarella, C., Kang, B., Meyer, G., & Ziogas, A. (2014). Computational methods for derivatives with early exercise features. In K. Schmedders & K. L. Judd (Eds.), Handbook of computational economics (3rd ed., chap. 5). Elsevier.
Cho, S. J., & Rabe-Hesketh, S. (2011). Alternating imputation posteriors estimation of models with crossed random effects. Computational Statistics & Data Analysis, 55, 12–25.
Article MathSciNet MATH Google Scholar
Coenders, G., Batista-Foguet, J. M., & Saris, W. E. (2008). Simple, efficient and distribution-free approach to interaction effects in complex structural equation models. Quality & Quantity, 42, 369–396.
Article Google Scholar
Cohen, J., & Cohen, P. (1975). Applied multiple regression/correlation analyses for the behavioral sciences. Erlbaum.
Google Scholar
Collins, L. M., Schafer, J. L., & Kam, C. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6(4), 330–351.
Article Google Scholar
Copas, J. B., & Li, H. G. (1997). Inference for non-random samples (with discussion). Journal of Royal Statistical Society (series b), 59, 55–96.
Cronbach, L. J., & Snow, R. E. (1977). Aptitudes and instructional methods: A handbook for research on interactions. Irvington.
Google Scholar
Croy, C. D., & Novins, D. K. (2005). Methods for addressing missing data in psychiatric and developmental research. Journal of the American Academy of Child and Adolescent Psychiatry, 44, 1230–1240.
Article Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistics Society (series b), 39, 1–38.
MATH Google Scholar
Díaz-Uriarte, R & de Andrés, A. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7(1), 3. Retrieved from http://www.biomedcentral.com/1471-2105/7/3
Didelez, V. (2002). ML- and semiparametric estimation in logistic models with incomplete covariate data. Statistica Neerlandica, 56, 330–345.
Article MathSciNet Google Scholar
Dong, F., & Yin, G. (2017). Maximum likelihood estimation for incomplete multinomial data via the weaver algorithm. Statistics and Computing (published on-line).
Dong, Y., & Peng, C.-Y.J. (2013). Principled missing data methods for researchers. Springer plus, 2, 222. https://doi.org/10.1186/2193-1801-2-222
Article Google Scholar
Doove, L., Van Buuren, S., & Dusseldorp, E. (2014). Recursive partitioning for missing data imputation in the presence of interaction effects. Computational Statistics & Data Analysis, 72, 92–104.
Article MathSciNet MATH Google Scholar
Dubow, E. F., & Ullman, D. G. (1989). Assessing social support in elementary school children: The survey of children’s social support. Journal of Clinical Child Psychology, 18(1), 52–64.
Article Google Scholar
Duncan, S., & Duncan, T. (1994). Modeling incomplete longitudinal substance use data using latent variable growth curve methodology. Multivariate Behavioral Research, 29, 313–338.
Article Google Scholar
Edwards, S. L., Berzofsky, M. E., & Biemer, P. P. (2018). Addressing nonresponse for categorical data items using full information maximum likelihood with latent GOLD 5.0. RTI Press Publication No. MR-0038-1809. RTI Press.
Enders, C. K. (2001a). A Primer on maximum likelihood algorithms available for use with missing data. Structural Equation Modeling, 8, 128–141.
Article Google Scholar
Enders, C. K. (2001b). The impact of nonnormality on full information maximum-likelihood estimation for structural equation models with missing data. Psychological Methods, 6, 352–370.
Article Google Scholar
Enders, C. K. (2006). Analyzing structural equation models with missing data. In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (pp. 313–342). Information Age Publishing.
Google Scholar
Enders, C. K. (2010). Applied missing data analysis. The Guilford Press.
Google Scholar
Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling, 8, 430–457.
Article MathSciNet Google Scholar
Finkbeiner, C. (1979). Estimation for the multiple factor model when data are missing. Psychometrika, 44, 409–420.
Article MATH Google Scholar
Ganzach, Y. (1997). Misleading interaction and curvilinear terms. Psychological Methods, 2, 235–247.
Article Google Scholar
Gelman, A., & Rubin, D. (1992). A single series from the Gibbs sampler provides a false sense of security. In J. M. Bernardo, J. O. Berger, A. P. Dawid, & A. F. M. Smith (Eds.), Bayesian statistics (pp. 625–631). Oxford University Press.
Google Scholar
Gold, M. S., & Bentler, P. M. (2000). Treatments of missing data: A Monte Carlo comparison of RBHDI, iterative stochastic regression imputation, and expectation-maximization. Structural Equation Modeling, 7, 319–355.
Article Google Scholar
Gold, M. S., Bentler, P. M., & Kim, K. H. (2003). A comparison of maximum-likelihood and asymptotically distribution-free methods of treating incomplete nonnormal data. Structural Equation Modeling, 10(1), 47–79.
Article MathSciNet Google Scholar
Graham, J. W., Hofer, S. M., & MacKinnon, D. P. (1996). Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research, 31, 197–218.
Article Google Scholar
Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarification of multiple imputation theory. Prevention Science, 8, 206–213.
Article Google Scholar
Hallquist, M. N., & Wiley, J. F. (2017). MplusAutomation: an R package for facilitating large-scale latent variable analyses in Mplus. Structural Equation Modeling, 25(4), 621–638.
Article Google Scholar
Hapfelmeier, A. (2012). Analysis of missing data with random forests (Doctoral dissertation. Ludwig Maximilian University of Munich, Munich, Germany). Retrieved from https://edoc.ub.uni-muenchen.de/15058/
Hartley, H. O., & Hocking, R. (1971). The analysis of incomplete data. Biometrics, 27, 783–808.
Article Google Scholar
Herzog, W., & Boomsma, A. (2009). Small-sample robust estimators of noncentrality-based and incremental model fit. Structural Equation Modeling, 16(1), 1–27.
Article MathSciNet Google Scholar
Ho, P., Silva, M., & Hogg, T. (2001). Multiple imputation and maximum likelihood principal component analysis of incomplete multivariate data from a study of the ageing of port. Chemometrics and Intelligent Laboratory Systems, 55(2), 1–11.
Article Google Scholar
Ishioka, T. (2013). Imputation of missing values for unsupervised data using the proximity in random forests. In The fifth international conference on mobile, hybrid, and on-line learning (pp. 30–36). The National Center for University Entrance Examinations.
Ishwaran, H., Kogalur, U. B., Blackstone, E. H., & Lauer, M. S. (2008). Random survival forests. Annals of Applied Statistics, 2, 841–860.
Article MathSciNet MATH Google Scholar
Jaccard, J., Turrisi, R., & Wan, C. K. (1990). Interaction effects in multiple regression (Sage university papers series. Quantitative applications in the social sciences; Vol. no. 07-072). Sage Publications.
Jackman, S. (2000). Estimation and inference via Bayesian simulation: an introduction to Markov Chain Monte Carlo. American Journal of Political Science, 44(2), 375–404.
Article Google Scholar
Jansen, I., Hens, N., Molenberghs, G., Aerts, M., Verbeke, G., & Kenward, M. G. (2006). The nature of sensitivity in monotone missing not at random models. Computational Statistics and Data Analysis, 50, 830–858.
Article MathSciNet MATH Google Scholar
Jeon, M., & Rijmen, F. (2014). Recent developments in maximum likelihood estimation of MTMM models for categorical data. Frontier in Psychology, 5(1), 1–7.
Google Scholar
Ji, L., Chow, S.-M., Schermerhorn, A. C., Jacobson, N. C., & Cummings, E. M. (2018). Handling missing data in the modeling of intensive longitudinal data. Structural Equation Modeling, 25(5), 715–736.
Article MathSciNet Google Scholar
Jöreskog, K. G., & Yang, F. (1996). Nonlinear structural equation models: The Kenny-Judd model with interaction effects. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling: Issues and techniques (pp. 57–87). Lawrence Erlbaum Associates.
MATH Google Scholar
Kang, J., & Shin, T. (2015). The effects of adolescents’ stress on suicidal ideation: Focusing on the moderating and mediating effects of depression and social support. Korean Journal of Youth Studies, 22(5), 27–51.
Google Scholar
Karasek, R. A. (1979). Job demands, job decision latitude, and mental strain: Implication for job redesign. Administrative Science Quarterly, 24, 285–308.
Article Google Scholar
Kelava, A. (2009). Multicollinearity in nonlinear structural equation models. (Doctoral dissertation, Goethe University, Frankfurt, Germany). Retrieved from http://publikationen.ub.uni-frankfurt.de/volltexte/2009/6336/
Kelava, A., & Brandt, H. (2009). Estimation of nonlinear latent structural equation models using the extended unconstrained approach. Review of Psychology, 16, 123–131.
Google Scholar
Kelava, A., Werner, C. S., Schermelleh-Engel, K., Moosbrugger, H., Zapf, D., Ma, Y., Cham, H., Aiken, L. S., & West, S. G. (2011). Advanced nonlinear latent variable modeling: distribution analytic LMS and QML estimators of interaction and quadratic effects. Structural Equation Modeling, 18(3), 465–491.
Article MathSciNet Google Scholar
Kenny, D., & Judd, C. M. (1984). Estimating the nonlinear and interactive effects of latent variables. Psychological Bulletin, 96, 201–210.
Article Google Scholar
Kenward, M. G., & Carpenter, J. (2007). Multiple imputation: Current perspectives. Statistical Methods in Medical Research, 16, 199–218.
Article MathSciNet MATH Google Scholar
King, G., Honaker, J., Joseph, A., & Scheve, K. (2001). Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review, 95(1), 49–69.
Article Google Scholar
Klein, A. G., & Moosbrugger, H. (2000). Maximum likelihood estimation of latent interaction effects with the LMS method. Psychometrika, 65, 457–474.
Article MathSciNet MATH Google Scholar
Klein, A. G., & Muthén, B. O. (2007). Quasi maximum likelihood estimation of structural equation models with multiple interaction and quadratic effects. Multivariate Behavioral Research, 42, 647–674.
Article Google Scholar
Klein, A. G., Schermelleh-Engel, K., Moosbrugger, H., & Kelava, A. (2009). Assessing spurious interaction effects. In T. Teo & M. S. Khine (Eds.), Structural equation modeling in educational research: Concepts and applications (pp. 13–28). Sense.
Google Scholar
Korean Youth Policy Institute. (2007). Korean children and youth panel survey, Sejong-si.
Kovacs, M. (1983). The children's depression inventory: A self-rated depression scale for school-aged youngsters. University of Pittsburgh school of medicine, Department of Psychiatry, Western Psychiatric Institute and Clinic.
Kroll, C. N., & Stedinger, J. R. (1996). Estimation of moments and quantiles using censored data. Water Resource Research, 32(4), 1005–1012.
Article Google Scholar
Larsen, R. (2011). Missing data imputation versus full information maximum likelihood with second-level dependencies. Structural Equation Modeling, 18, 649–662.
Article MathSciNet Google Scholar
Lee, L. E. (1993). Asymptotic distribution of the maximum likelihood estimator for a stochastic frontier function model with a singular information matrix. Econometric Theory, 9, 413–430.
Article MathSciNet Google Scholar
Lee, S. Y., & Song, X. Y. (2004). Bayesian model comparison of nonlinear structural equation models with missing continuous and ordinal data. British Journal of Mathematical and Statistical Psychology, 57, 131–150.
Article MathSciNet Google Scholar
Liao, S. G., Lin, Y., Kang, D., Chandra, D., Bon, J., Kaminski, N., Sciurba, F. C., & Tseng, G. C. (2014). Missing value imputation in high-dimensional phenomic data: Imputable or not, and how? BMC Bioniformatics, 5(15), 346. https://doi.org/10.1186/s12859-014-0346-6
Article Google Scholar
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22.
Google Scholar
Lin, G.-C., Wen, Z., Marsh, H., & Lin, H.-S. (2010). Structural equation models of latent interactions: Clarification of orthogonalizing and double-mean-centering strategies. Structural Equation Modeling, 17(3), 374–391.
Article MathSciNet Google Scholar
Little, R. J. A. (1992). Regression with missing X’s: a review. Journal of the American Statistical Association, 87, 1227–1237.
Google Scholar
Little, R. J., & Rubin, D. B. (2002). Statistical analysis with missing data. Wiley.
Book MATH Google Scholar
Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of orthogonalizing\powered and product terms: Implications for modeling interactions among latent variables. Structural Equation Modeling, 13(4), 497–519.
Article MathSciNet Google Scholar
Loh, P. L., & Wainwright, M. J. (2011). High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Advances in Neural Information Processing Systems, 24, 2726–2734.
Google Scholar
Lusch, R. F., & Brown, J. R. (1996). Interdependency, contracting, and relational behavior in marketing channels. Journal of Marketing, 60, 19–38.
Article Google Scholar
Marsh, H. W., Wen, Z., & Hau, K. T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9, 275–300.
Article Google Scholar
Marsh, H. W., Wen, Z., & Hau, K. T. (2006). Structural equation models of latent interaction and quadratic effects. In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (pp. 225–265). Information Age.
Google Scholar
Miccéri, T. (1989). The unicorn, the normal curve, and other improbably creatures. Psychological Bulletin, 105, 156–166.
Article Google Scholar
Moosbrugger, H., Schermelleh-Engel, K., Kelava, A., & Klein, A. G. (2009). Testing multiple nonlinear effects in structural equation modeling: A comparison of alternative estimation approaches. In T. Teo & M. Khine (Eds.), Structural equation modeling in educational research: Concepts and applications (pp. 103–136). Sense Publishers.
Google Scholar
Moosbrugger, H., Schermelleh-Engel, K., & Klein, A. G. (1997). Methodological problems of estimating latent interaction effects. Methods of Psychological Research Online, 2, 95–111.
Google Scholar
Muthén, B., & Kaplan, D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171–189.
Article Google Scholar
Muthén, L. K., & Muthén, B. O. (2018). Mplus version 8.2 [Computer software]. Muthén & Muthén.
Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K., & Ishii, S. (2003). A Bayesian missing value estimation method for gene expression profile data. Bioinformatics, 19(16), 2088–2096.
Article Google Scholar
Pantanowitz, A., & Marwala, T. (2008). Evaluating the impact of missing data imputation through the use of the random forest algorithm. School of Electrical and Information Engineering. University of the Witwatersrand Private Bag x3. Wits. 2050. Republic of South Africa. Retrieved from http://arxiv.org/ftp/arxiv/papers/0812/0812.2412.pdf
Peng, C.-Y.J., & Zhu, J. (2008). Comparison of two approaches for handling missing covariates in logistic regression. Educational and Psychological Measurement, 68(1), 58–77.
Article MathSciNet Google Scholar
Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: a review of reporting practices and suggestions for improvement. Review of Educational Research, 74, 525–556.
Article Google Scholar
Pigott, T. D. (2001). A review of methods for missing data. Educational Research and Evaluation, 7(4), 353–383.
Article Google Scholar
Raghunathan, T. E. (2004). What do we do with missing data? Some options for analysis of incomplete data. Annual Review of Public Health, 25, 99–117.
Article Google Scholar
Rotnitzky, A., Cox, D. R., Bottai, M., & Robins, J. (2000). Likelihood-based inference with singular information matrix. Bernoulli, 6, 243–284.
Article MathSciNet MATH Google Scholar
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. Wiley.
Book MATH Google Scholar
Rubin, D. B. (1996). Multiple imputation after 18+ years (with discussion). Journal of the American Statistical Association, 91, 473–489.
Article MATH Google Scholar
Rubin, D. B. (2004). Multiple imputation for nonresponse in surveys. Wiley.
MATH Google Scholar
Satorra, A., & Bentler, P. M. (1988). Scaling corrections for chi-square statistics in covariance structure analysis. In ASA proceedings of the business and economic section (pp. 308–313).
Savalei, V., & Bentler, P. M. (2005). A statistically justified pairwise ML method for incomplete nonnormal data: A comparison with direct ML and pairwise ADF. Structural Equation Modeling, 12, 183–214.
Article MathSciNet Google Scholar
Savalei, V., & Bentler, P. M. (2009). A two-stage approach to missing data: Theory and application to auxiliary variables. Structural Equation Modeling, 16(3), 477–497.
Article MathSciNet Google Scholar
Savalei, V., & Rhemtulla, M. (2012). On obtaining estimates of the fraction of missing information from full information maximum likelihood. Structural Equation Modeling, 19(3), 37–62.
Article MathSciNet Google Scholar
Schafer, J. L. (1997). Analysis of incomplete multivariate data. Chapman & Hall.
Book MATH Google Scholar
Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8, 3–15.
Article Google Scholar
Schafer, J. L., & Graham, J. W. (2002). Missing data: our view of the state of the art. Psychological Methods, 7, 147–177.
Article Google Scholar
Schafer, J. L., & Olsen, M. K. (1998). Multiple imputation for multivariate missing-data problems: A data analyst’s perspective. Multivariate Behavioral Research, 33, 545–571.
Article Google Scholar
Schmitt, M. (1990). Konsistenz als Persönlichkeitseigenschaft? Moderatorvariablen in der Persönlichkeits- und Einstellungsforschung [Consistency as a personality trait? Moderator variables in personality and attitude research]. Springer.
Google Scholar
Schouten, R. M., Lugtig, P., & Vink, G. (2018). Generating missing values for simulation purpose: A multivariate amputation procedure. Journal of Statistical Computation and Simulation, 88(15), 2909–2930.
Article MathSciNet MATH Google Scholar
Shah, A. D., Bartlett, J. W., Carpenter, J., Nicholas, O., & Hemingway, H. (2014). Comparison of random forest and parametric imputation models for imputing missing data using MICE: A caliber study. American Journal of Epidemiology, 179(6), 764–774.
Article Google Scholar
Shin, T., Davison, M. L., & Long, J. D. (2009). Effects of missing data methods in structural equation modeling with nonnormal longitudinal data. Structural Equation Modeling, 16, 70–98.
Article MathSciNet Google Scholar
Shin, T., Davison, M. L., & Long, J. D. (2017). Maximum likelihood versus multiple imputation for missing data in small longitudinal samples with nonnormality. Psychological Methods, 22(3), 426–449.
Article Google Scholar
Sinharay, S., Stern, H. S., & Russell, D. (2001). The use of multiple imputation for the analysis of missing data. Psychological Methods, 6, 317–329.
Article Google Scholar
Snyder, M., & Tanke, E. D. (1976). Behavior and attitude: some people are more consistent than others. Journal of Personality, 44, 501–517.
Article Google Scholar
Stekhoven, J. D. (2016). missForest: Nonparametric missing value imputation using random forest. R package version 1.4.
Stekhoven, D. J., & Bühlmann, P. (2012). MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112–118.
Article Google Scholar
Tang, F. (2017). Random forest missing data approaches. Open Access Dissertations. Retrieved from https://scholarlyrepository.miami.edu/oa_dissertations/1852
Tang, F., & Ishwaran, H. (2017). Random forest missing data algorithms. Statistical Analysis and Data Mining, 10, 363–377.
Article MathSciNet MATH Google Scholar
Taylor, L., & Zhou, X. H. (2009). Multiple imputation methods for treatment noncompliance and nonresponse in randomized clinical trials. Biometrics, 65(1), 88–95.
Article MathSciNet MATH Google Scholar
van Brand, J., Buuren, S., & Groothuis-Oudshoorn, C. (2003). A toolkit in SAS for the evaluation of multiple imputation methods. Statist Neerlandica, 57(1), 36–45.
Article MathSciNet Google Scholar
van Burren, S. (2012). Flexible imputation of missing data. Chapman & Hall/CRC.
Book Google Scholar
van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, K., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76(12), 1049–1064.
Article MathSciNet MATH Google Scholar
von Hipple, P. (2007). Regression with missing y’s: An improved method for analyzing multiple imputed data. Sociological Methodology, 37, 83–117.
Article Google Scholar
Von Hipple, P. (2016). New confidence intervals and bias comparisons show that maximum likelihood can be multiple imputation in small samples. Structural Equation Modeling, 23(3), 422–437.
Article MathSciNet Google Scholar
Waljee, A. K., Mukherjee, A., Singal, A. G., Zhang, Y., Warren, J., Balis, U., Marrero, J., Zhu, J., & Higgins, P. D. R. (2013). Comparison of imputation methods for missing laboratory data in medicine. British Medical Journal Open, 3, 1–7.
Google Scholar
Wall, M. M., & Amemiya, Y. (2000). Estimation for polynomial structural equation models. Journal of the American Statistical Association, 95, 929–940.
Article MathSciNet MATH Google Scholar
Wilks, S. S. (1932). Moments and distributions of estimates of population parameters from fragmentary samples. Annals of Mathematical Statistics, 3, 163–195.
Article MATH Google Scholar
Wothke, W. (2000). Longitudinal and multi-group modeling with missing data. In T. D. Little, K. U. Schnabel, & J. Baumert (Eds.), Modeling longitudinal and multiple group data: Practical issues, applied approaches, and specific examples (pp. 219–240). Erlbaum.
Google Scholar
Yuan, K. H., & Bentler, P. M. (1998). Normal theory based test statistics in structural equation modeling. British Journal of Mathematical and Statistical Psychology, 51, 289–309.
Article Google Scholar
Yuan, K. H., & Bentler, P. M. (2000). Three likelihood-based methods for mean and covariance structure analysis with nonnormal missing data. Sociological Methodology, 30, 165–200.
Article Google Scholar
Yuan, K. H., Fan, Y., & Bentler, P. M. (2012a). ML versus MI for missing data with violation of distribution conditions. Sociological Methods & Research, 41(4), 598–629.
Article MathSciNet Google Scholar
Yuan, K. H., Yang-Wallentin, F., & Bentler, P. M. (2012b). ML versus MI for missing data with violation of distribution conditions. Social Methods Research, 4(4), 598–629.
Article MathSciNet Google Scholar
Yuan, K. H., Tong, X., & Zhang, Z. (2015). Bias and efficiency for SEM with missing data and auxiliary variables: Two-stage robust method versus two-stage ML. Structural Equation Modeling, 22(2), 178–192.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Youth Education and Leadership, Myongji University, Seoul, South Korea
Tacksoo Shin
Department of Psychiatry, University of Iowa, Iowa City, USA
Jeffrey D. Long
Department of Educational Psychology, University of Minnesota at Twin-Cities, Minneapolis, USA
Mark L. Davison

Authors

Tacksoo Shin
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey D. Long
View author publications
You can also search for this author in PubMed Google Scholar
Mark L. Davison
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tacksoo Shin.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shin, T., Long, J.D. & Davison, M.L. An evaluation of methods to handle missing data in the context of latent variable interaction analysis: multiple imputation, maximum likelihood, and random forest algorithm. Jpn J Stat Data Sci 5, 629–659 (2022). https://doi.org/10.1007/s42081-022-00176-w

Download citation

Received: 03 January 2022
Revised: 14 July 2022
Accepted: 25 July 2022
Published: 11 August 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s42081-022-00176-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An evaluation of methods to handle missing data in the context of latent variable interaction analysis: multiple imputation, maximum likelihood, and random forest algorithm

Abstract

Access this article

Similar content being viewed by others

Sampling Techniques for Quantitative Research

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An evaluation of methods to handle missing data in the context of latent variable interaction analysis: multiple imputation, maximum likelihood, and random forest algorithm

Abstract

Access this article

Similar content being viewed by others

Sampling Techniques for Quantitative Research

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation